ML STAC Sample Specification 📑

Overview 📋

This document explains the structure and content of a Machine Learning SpatioTemporal Asset Catalog (STAC) Sample. An Sample is basically a safetensor file with both metadata and a tensor that adheres to a set of predefined rules defined by the Task. Unlike conventional STAC, the 'id' field is automatically defined considering the number of assets within the specific ML-STAC catalog. An Sample contains both the tensor information (input and reference or extra if it exists), as well as the associated spatio-temporal metadata.

About the tensors 🧮

Tensors need to be contiguous and dense. The tensors are stored in a dictionary (Dict[str, torch.Tensor]) with the following keys:

Key	Type	Description
input	torch.Tensor	Optional. The input tensor. Depending on the Task, it can be more or less complex. For example, for semantic segmentation, the input tensor can be a 4D tensor of shape (batch_size, channels, height, width), while a 5D tensor of shape (batch_size, time, time, channels, height, width) is necessary for spatiotemporal tasks. In case the input tensor is not provided, the `input` metadata must be defined. This functionality is particularly valuable for tasks related to Tensor to Text transformations.
target	torch.Tensor	Optional. The target tensor is the ground truth against which the model's predictions are compared during training. It plays a crucial role in supervised learning tasks where the objective is to minimize the difference (or loss) between the predictions and these target values. Similar to the `input` field the shape of the `target` tensor largely depends on the type of task to perform.
extra	torch.Tensor	Optional. The extra tensor serves as a versatile container for auxiliary or supplementary data that might not be directly involved in the primary computations but can be crucial for various side tasks or analyses. Whether it is side data or any intermediate computations, the extra tensor offers a convenient way to bundle extra data together with the main tensors.

About the metadata 📜

ML-STAC enforces strict guidelines for metadata within an Sample. Each Sample must always incorporate the following fields:

Field Name	Type	Description
input	string	Optional. The `input metadata` parameter is pivotal for text-centric ML tasks. Beyond its use in typical text analysis, it plays a crucial role in tasks like generating images from text descriptions, guiding image modifications through textual cues, and producing visual content inspired by narrative prompts. Depending on the task at hand, the input string may undergo preprocessing — from lowercasing and tokenization to more advanced embedding techniques. Importantly, neither the input metadata nor the tensor can be set to None simultaneously.
target	string	Optional. The `target metadata` parameter is crucial for image-to-text ML tasks. It encapsulates the expected textual representation or description of an input tensor. This could be an image caption, a detailed description, annotations, or any text that appropriately describes the visual content. Depending on the complexity of the task, the target string can range from simple labels to full-fledged narrative descriptions. As with any textual data, preprocessing steps such as tokenization, stemming, or embedding might be applied to the target string. It's imperative to note that neither the target metadata nor its corresponding tensor can be set to None simultaneously.
geotransform	string	REQUIRED. Mathematical transformation that maps pixel coordinates in an image to real-world geographic coordinates. It is formatted according to GDAL GeoTransform. The string is a concatenation of the 6 parameters of the GeoTransform, separated by commas. For example, '0.0, 1.0, 0.0, 0.0, 0.0, 1.0'. The geotransform string should be generated directly from the input tensor field. If it is not provided (i.e., None), set the geotransform to the default value '0.0, 1.0, 0.0, 0.0, 0.0, 1.0'.
crs	string	REQUIRED. This parameter specifies the spatial reference system used to interpret the geographic coordinates provided in the dataset. The CRS is typically defined using codes from standards such as EPSG, ESRI, or SR-ORG. The format for specifying the CRS should follow the pattern 'Authority:Code'. The `ML-STAC` supports three different CRS code standards. The 'EPSG' (European Petroleum Survey Group) authority employs numeric codes for defining Coordinate Reference Systems (CRS) and is widely recognized in the geospatial community. 'ESRI' is closely associated with ESRI's ArcGIS software and provides CRS definitions using a combination of numeric and alphanumeric codes. Another authority, 'SR-ORG' (Spatial Reference Organization), exclusively offers CRS definitions using numeric codes. Selecting the appropriate CRS is crucial for accurately geolocating and aligning `SampleTensor` objects. Please refer to authoritative sources and documentation for specific CRS codes and their definitions.
id	string	REQUIRED. The provider identifier is a unique ID within a Catalog, representing an Sample. This ID is automatically generated through a base 61 conversion (1-9a-zA-Z) and consists of 7 fixed digits. This consistent number of digits allows support for datasets with more than 10^12 images. For example, in a hypothetical dataset with one million images, image number 954562 would have an '0005DXZ'.
start_datetime	string	REQUIRED. The searchable start date and time of the assets, which must be in UTC. It is formatted according to RFC 3339, section 5.6
end_datetime	string	Optional. The searchable end date of the assets. Useful when tensors are multi-temporal or representing a composite. Otherwise, it is the same as start_datetime. It is formatted according to RFC 3339, section 5.6