Skip to content

ML-STAC Collection Specification 📜

An ML-STAC Collection is defined as a cohesive entity that brings together three distinct Catalogs (train, validation, and test) with comprehensive metadata. While the inclusion of metadata is optional, it is strongly encouraged for dataset providers to define it, allowing users to effectively utilize the dataset.

In this directory 📁

Specification 📄: The ML-STAC Collection specification is in specification.md. It includes an overview and an explanation of the required and optional fields.

Data model 🧊: The dataclass.py contains the ML-STAC Collection datamodel definition. This can be used to generate JSON Schemas and validate ML-STAC Collections.

Schema 🧩: The schema.json file contains the ML-STAC Catalog JSON Schema. This can be used to validate ML-STAC Catalogs.

example.py 🐍: The example.py file contains an example of a minimal, valid ML-STAC Collection.

Back to top