Skip to article frontmatterSkip to article content

Author(s): Gisela Romero Candanedo, Julia Wagemann
Affiliation: thriveGEO
Contact: gisela@thrivegeo.com


Introduction

In this section, we will dive into the programmatic access of EOPF Zarr Collections available in the EOPF Sentinel Zarr Sample Service STAC Catalog. We will introduce Python libraries that enable us to effectively access and search through STAC.

Objectives:
  • 🔍 How to **programmatically browse** through available collections available via the EOPF Zarr STAC Catalog

  • 📊 Understanding **collection metadata** in user-friendly terms

  • 🎯 **Searching for specific data** with help of the `pystac` and `pystac-client` libraries

Disclaimer
This notebook demonstrates the use of open source software and is intended for educational and illustrative purposes only. All software used is subject to its respective licenses. The authors and contributors of this notebook make no guarantees about the accuracy, reliability, or suitability of the content or included code. Use at your own discretion and risk. No warranties are provided, either express or implied.


Setup

For this tutorial, we will make use of the pystac and pystac_client Python libraries that facilitate the programmatic access and efficient search of a STAC Catalog.


Import libraries

from pystac_client import Client, CollectionClient
from pystac import Collection, MediaType
import datetime
from dateutil.tz import tzutc
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np
import os

Helper functions

list_found_elements

As we are expecting to visualise several elements that will be stored in lists, we define a function that will allow us retrieve item id’s and collections id’s for further retrieval.

def list_found_elements(search_result):
    id = []
    coll = []
    for item in search_result.items(): #retrieves the result inside the catalogue.
        id.append(item.id)
        coll.append(item.collection_id)
    return id , coll
normalisation_str_gm()

Applies percentile-based contrast stretching and gamma correction to a band.

Recieves:

  • band_array : the extracted xarray for the selected band

  • p_min : percentile min value

  • p_max: percentile max value

  • gamma_val: gamma correction

def normalisation_str_gm(band_array, p_min, p_max, gamma_val):

    # Calculate min and max values based on percentiles for stretching
    min_val = np.percentile(band_array[band_array > 0], p_min) if np.any(band_array > 0) else 0
    max_val = np.percentile(band_array[band_array > 0], p_max) if np.any(band_array > 0) else 1

    # Avoid division by zero if min_val equals max_val
    if max_val == min_val:
        stretched_band = np.zeros_like(band_array, dtype=np.float32)
    else:
        # Linear stretch to 0-1 range
        stretched_band = (band_array - min_val) / (max_val - min_val)

    # Clip values to ensure they are within [0, 1] after stretching
    stretched_band[stretched_band < 0] = 0
    stretched_band[stretched_band > 1] = 1

    # Apply gamma correction
    gamma_corrected_band = np.power(stretched_band, 1.0 / gamma_val)

    # Returns the corrected array:
    return gamma_corrected_band

Establish a connection to the EOPF Zarr STAC Catalog

Our first step is to establish a connection to the EOPF Sentinel Zarr Sample Service STAC Catalog. For this, you need the Catalog’s base URL, which you can find on the web interface under the API & URL tab. By clicking on 🔗Source, you will get the address of the STAC metadata file - which is available here.

EOPF API url for connection

Copy paste the URL: https://stac.core.eopf.eodc.eu/.

With the Client.open() function, we can create the access to the starting point of the Catalog by providing the specific url. If the connection was successful, you will see the description of the STAC catalog and additional information.

eopf_stac_api_root_endpoint = 'https://stac.core.eopf.eodc.eu/' #root starting point
eopf_stac_catalog = Client.open(url=eopf_stac_api_root_endpoint) # calls the selected url
eopf_stac_catalog
Loading...

Congratulations! We successfully connected to the EOPF Zarr STAC Catalog, and we can now start exploring its content.

eopf_stac_catalog.description
'STAC catalog of the EOPF Sentinel Zarr Samples Service'

Explore available collections

Once a connection established, the next logical step is to get an overview of all the collections the STAC catalog offers.

Please note: Since the EOPF Zarr STAC Catalog is still in active development, we need to explore each collection

You see, that so far, we can browse through 11 available collections

sentinel_collections = sorted([x.id for x in list(eopf_stac_catalog.get_all_collections())])
print(sentinel_collections)
['sentinel-1-l1-grd', 'sentinel-1-l1-slc', 'sentinel-1-l2-ocn', 'sentinel-2-l1c', 'sentinel-2-l2a', 'sentinel-3-olci-l1-efr', 'sentinel-3-olci-l1-err', 'sentinel-3-olci-l2-lfr', 'sentinel-3-olci-l2-lrr', 'sentinel-3-slstr-l1-rbt', 'sentinel-3-slstr-l2-lst']

We can select any collection and retrieve certain metadata that allow us to get more information about it, such as keywords, the ID and useful links for resources.
For this, we use the .get_collection() argument, and add the corresponding collection id we got earlier.

Sentinel-1 Level-1 GRD

s1_l1_grd = eopf_stac_catalog.get_collection(sentinel_collections[0])
print('Keywords:        ', s1_l1_grd.keywords)
print('Catalog ID:      ', s1_l1_grd.id)
print('Available Links: ', s1_l1_grd.links)
print('Temporal extent: ', [[dt.strftime('%Y-%m-%d %H:%M:%S') for dt in interval] for interval in s1_l1_grd.extent.temporal.intervals])
Keywords:         ['Copernicus', 'Sentinel', 'EU', 'ESA', 'Satellite', 'SAR', 'C-Band', 'GRD']
Catalog ID:       sentinel-1-l1-grd
Available Links:  [<Link rel=items target=https://stac.core.eopf.eodc.eu/collections/sentinel-1-l1-grd/items>, <Link rel=parent target=https://stac.core.eopf.eodc.eu/>, <Link rel=root target=<Client id=eopf-sample-service-stac-api>>, <Link rel=self target=https://stac.core.eopf.eodc.eu/collections/sentinel-1-l1-grd>, <Link rel=license target=https://sentinel.esa.int/documents/247904/690755/Sentinel_Data_Legal_Notice>, <Link rel=http://www.opengis.net/def/rel/ogc/1.0/queryables target=https://stac.core.eopf.eodc.eu/collections/sentinel-1-l1-grd/queryables>]
Temporal extent:  [['2022-09-06 13:54:39', '2025-09-25 11:11:06']]

Sentinel-2 Level-2A

s2_l2a_c = eopf_stac_catalog.get_collection(sentinel_collections[4])
print('Keywords:        ', s2_l2a_c.keywords)
print('Catalog ID:      ', s2_l2a_c.id)
print('Available Links: ', s2_l2a_c.links)
print('Temporal extent: ', [[dt.strftime('%Y-%m-%d %H:%M:%S') for dt in interval] for interval in s2_l2a_c.extent.temporal.intervals])
Keywords:         ['Copernicus', 'Sentinel', 'EU', 'ESA', 'Satellite', 'Global', 'Imagery', 'Reflectance']
Catalog ID:       sentinel-2-l2a
Available Links:  [<Link rel=items target=https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items>, <Link rel=parent target=https://stac.core.eopf.eodc.eu/>, <Link rel=root target=<Client id=eopf-sample-service-stac-api>>, <Link rel=self target=https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a>, <Link rel=license target=https://sentinel.esa.int/documents/247904/690755/Sentinel_Data_Legal_Notice>, <Link rel=cite-as target=https://doi.org/10.5270/S2_-znk9xsj>, <Link rel=http://www.opengis.net/def/rel/ogc/1.0/queryables target=https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/queryables>]
Temporal extent:  [['2015-10-22 10:10:52', '2025-09-24 13:42:31']]

Sentinel-3 SLSTR Level-2 LST

s3_slstr_lst = eopf_stac_catalog.get_collection(sentinel_collections[10])
print('Keywords:        ', s3_slstr_lst.keywords)
print('Catalog ID:      ', s3_slstr_lst.id)
print('Available Links: ', s3_slstr_lst.links)
print('Temporal extent: ', [[dt.strftime('%Y-%m-%d %H:%M:%S') for dt in interval] for interval in s3_slstr_lst.extent.temporal.intervals])
Keywords:         ['Copernicus', 'Sentinel', 'EU', 'ESA', 'Satellite', 'Global', 'Earth']
Catalog ID:       sentinel-3-slstr-l2-lst
Available Links:  [<Link rel=items target=https://stac.core.eopf.eodc.eu/collections/sentinel-3-slstr-l2-lst/items>, <Link rel=parent target=https://stac.core.eopf.eodc.eu/>, <Link rel=root target=<Client id=eopf-sample-service-stac-api>>, <Link rel=self target=https://stac.core.eopf.eodc.eu/collections/sentinel-3-slstr-l2-lst>, <Link rel=license target=https://sentinel.esa.int/documents/247904/690755/Sentinel_Data_Legal_Notice>, <Link rel=http://www.opengis.net/def/rel/ogc/1.0/queryables target=https://stac.core.eopf.eodc.eu/collections/sentinel-3-slstr-l2-lst/queryables>]
Temporal extent:  [['2025-04-08 09:40:04', '2025-09-25 08:17:22']]

Searching inside the EOPF STAC API

With the .search() function of the pystac-client library, we can search inside a STAC catalog we established a connection with. We can filter based on a series of parameters to tailor the search for available data for a specific time period and geographic bounding box.

For this, we specify the datetime argument for a time period we are interested in, e.g. from 1st March 2025 to 31st May 2025, over Riga.

riga_bbox = [24.0018,56.9117,24.2131,56.9999]
start_d = '2025-03-01'
end_d   = '2025-05-31'

Filter for temporal extent

Let us search on the datetime parameter. In addition, we also specify the collection parameter indicating that we only want to search for the Sentinel-1 Level-1 GRD collection.

Sentinel-1 Level-1 GRD

We apply the helper function list_found_elements which constructs a list from the search result. If we check the length of the final list, we can see that for the specified time period:

inter = start_d + 'T00:00:00Z/' + end_d + 'T23:59:59.999999Z' # interest period
print(inter)
2025-03-01T00:00:00Z/2025-05-31T23:59:59.999999Z
s1_weekend_search = eopf_stac_catalog.search(
    collections= sentinel_collections[0], # interest Collection
    datetime = inter # interest period
)

found_w_s = list_found_elements(s1_weekend_search)
print("Search Results:")
print('Total Items Found for the Sentinel-1 L1 GRD Collection: ',len(found_w_s[0]))
/opt/anaconda3/envs/eopf_env_wf/lib/python3.11/site-packages/pystac/item.py:481: DeprecatedWarning: The item 'S1A_IW_GRDH_1SDV_20250531T235957_20250601T000022_059445_076109_F4C7' is deprecated.
  warnings.warn(
Search Results:
Total Items Found for the Sentinel-1 L1 GRD Collection:  21435

Filter for spatial extent

Now, let us filter based on a specific area of interest. We can use the bbox argument, which is composed by providing the top-left and bottom-right corner coordinates. It is similar to drawing the extent in the interactive map of the EOPF browser interface.

Sentinel-1 Level-1 GRD

For example, we defined a bounding box over Riga. We then again apply the helper function list_found_elements and see:

s1_riga_weekend_search = eopf_stac_catalog.search(
    bbox= riga_bbox,
    collections= sentinel_collections[0], # interest Collection
    datetime = inter # interest period
)

found_r_w_s = list_found_elements(s1_riga_weekend_search)
print("Search Results:")
print('Total Items Found for the Sentinel-1 L1 GRD Collection over Riga: ',len(found_r_w_s[0]))
Search Results:
Total Items Found for the Sentinel-1 L1 GRD Collection over Riga:  24
Sentinel-2 Level-2A
s2_riga_cloud_search = eopf_stac_catalog.search(
    bbox= riga_bbox,
    collections= sentinel_collections[4], # interest Collection
    datetime = inter, # interest period
    query = {'eo:cloud_cover': {'lte': 30}}
)

found_r_w_s_s2 = list_found_elements(s2_riga_cloud_search)
print("Search Results:")
print('Total Items Found for the Sentinel-2 L2A Collection over Riga: ',len(found_r_w_s_s2[0]))
Search Results:
Total Items Found for the Sentinel-2 L2A Collection over Riga:  12
Sentinel-3 SLSTR L2 LST
s3_riga= eopf_stac_catalog.search(
    bbox= riga_bbox,
    collections= sentinel_collections[10], # interest Collection
    datetime = inter, # interest period
    query = {"product:timeliness_category": {'eq':'NR'}}
)

found_r_s3 = list_found_elements(s3_riga)
print("Search Results:")
print('Total Items Found for the Sentinel-3 SLSTR L2 LST Collection over Riga: ',len(found_r_s3 [0]))
Search Results:
Total Items Found for the Sentinel-3 SLSTR L2 LST Collection over Riga:  7

Retrieve Asset URLs for accessing the data

So far, we have made a search among the STAC catalog and browsed over the general metadata of the collections. To access the actual EOPF Zarr Items, we need to get their storage location in the cloud.

The relevant information we can find inside the .items argument by the .get_assets() function. Inside, it allows us to specify the .MediaType we are interested in. In our example, we want to obtain the location of the .zarr file.

Let us retrieve the url form the available found items form the Sentinel-2 L2A Collection for s2_riga_cloud_search. The resulting URL we can then use to directly access an asset in our workflow.

#Lets access the first S2 L2A image we found over Riga with less than 30% clouds:
f_i_rigA_s2 = found_r_w_s_s2[0][3]
print(f_i_rigA_s2)
S2A_MSIL2A_20250525T094051_N0511_R036_T34VFJ_20250525T151112
c_sentinel2 = eopf_stac_catalog.get_collection(sentinel_collections[4])
#Choosing the first item available to be opened:
item= c_sentinel2.get_item(id=f_i_rigA_s2)
item_assets = item.get_assets(media_type=MediaType.ZARR)

cloud_storage = item_assets['product'].href

print('Item cloud storage URL for retrieval:', cloud_storage)
Item cloud storage URL for retrieval: https://objectstore.eodc.eu:2222/e05ab01a9d56408d82ac32d69a5aae2a:202505-s02msil2a/25/products/cpm_v256/S2A_MSIL2A_20250525T094051_N0511_R036_T34VFJ_20250525T151112.zarr

Examining Dataset Structure

In the following step, we open the cloud-optimised Zarr dataset using xarray.open_datatree supported by the zarr engine.

The subsequent loop then prints out all the available groups within the opened DataTree, providing a comprehensive overview of the hierarchical structure of the EOPF Zarr products.

dt = xr.open_datatree(
    cloud_storage,        # the cloud storage url from the Item we are interested in
    engine='zarr',        # xarray-eopf defined engine 
    chunks= {})           # retrives the default chunking structure

for dt_group in sorted(dt.groups):
    print("DataTree group {group_name}".format(group_name=dt_group)) # getting the available groups
/var/folders/hr/vlzgj7wj51l7ps0dxp4wp08r0000gn/T/ipykernel_9001/689046739.py:1: FutureWarning: In a future version, xarray will not decode timedelta values based on the presence of a timedelta-like units attribute by default. Instead it will rely on the presence of a timedelta64 dtype attribute, which is now xarray's default way of encoding timedelta64 values. To continue decoding timedeltas based on the presence of a timedelta-like units attribute, users will need to explicitly opt-in by passing True or CFTimedeltaCoder(decode_via_units=True) to decode_timedelta. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  dt = xr.open_datatree(
/var/folders/hr/vlzgj7wj51l7ps0dxp4wp08r0000gn/T/ipykernel_9001/689046739.py:1: FutureWarning: In a future version, xarray will not decode timedelta values based on the presence of a timedelta-like units attribute by default. Instead it will rely on the presence of a timedelta64 dtype attribute, which is now xarray's default way of encoding timedelta64 values. To continue decoding timedeltas based on the presence of a timedelta-like units attribute, users will need to explicitly opt-in by passing True or CFTimedeltaCoder(decode_via_units=True) to decode_timedelta. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  dt = xr.open_datatree(
DataTree group /
DataTree group /conditions
DataTree group /conditions/geometry
DataTree group /conditions/mask
DataTree group /conditions/mask/detector_footprint
DataTree group /conditions/mask/detector_footprint/r10m
DataTree group /conditions/mask/detector_footprint/r20m
DataTree group /conditions/mask/detector_footprint/r60m
DataTree group /conditions/mask/l1c_classification
DataTree group /conditions/mask/l1c_classification/r60m
DataTree group /conditions/mask/l2a_classification
DataTree group /conditions/mask/l2a_classification/r20m
DataTree group /conditions/mask/l2a_classification/r60m
DataTree group /conditions/meteorology
DataTree group /conditions/meteorology/cams
DataTree group /conditions/meteorology/ecmwf
DataTree group /measurements
DataTree group /measurements/reflectance
DataTree group /measurements/reflectance/r10m
DataTree group /measurements/reflectance/r20m
DataTree group /measurements/reflectance/r60m
DataTree group /quality
DataTree group /quality/atmosphere
DataTree group /quality/atmosphere/r10m
DataTree group /quality/atmosphere/r20m
DataTree group /quality/atmosphere/r60m
DataTree group /quality/l2a_quicklook
DataTree group /quality/l2a_quicklook/r10m
DataTree group /quality/l2a_quicklook/r20m
DataTree group /quality/l2a_quicklook/r60m
DataTree group /quality/mask
DataTree group /quality/mask/r10m
DataTree group /quality/mask/r20m
DataTree group /quality/mask/r60m
DataTree group /quality/probability
DataTree group /quality/probability/r20m

Visualising the RGB quicklook composite

EOPF Zarr Assets include a quick-look RGB composite, which we now want to open and visuliase. We open the Zarr dataset again, but this time, we specifically target the quality/l2a_quicklook/r20m/tci.

This group contains a true colour (RGB) quick-look composite, which is a readily viewable representation of the satellite image. By adding .plot.imshow() we can load the quick-look.

dt['quality/l2a_quicklook/r20m/tci'].plot.imshow()
plt.title('RGB Quicklook')
plt.xlabel('X-coordinate')
plt.ylabel('Y-coordinate')
plt.grid(False) # Turn off grid for image plots
plt.axis('tight') # Ensure axes fit the data tightly
(np.float64(600000.0), np.float64(709800.0), np.float64(6290220.0), np.float64(6400020.0))
<Figure size 640x480 with 1 Axes>

Creating our own RGB

For the composite creation, zarr_meas contains the assets we are interested in. The TCI composite makes use of the red (B04), green (B03), and blue (B02) bands to create a view that looks natural to the human eye.

# True colour channels we are interested to retrieve coposite:
tc_red  = 'b04'
tc_green= 'b03'
tc_blue = 'b02'
zarr_meas = dt.measurements.reflectance.r20m

# The tc_red, tc_green, and tc_blue variables are inputs specifying the band names
red = zarr_meas[tc_red]
gre = zarr_meas[tc_green]
blu = zarr_meas[tc_blue]

# Visualising the red band:
plt.imshow(gre)
plt.title('Red Reflectance (b04)')
<Figure size 640x480 with 1 Axes>

By this point, we have not accessed the data on disk. For doing so, we need to call the .values function from xarray

# The tc_red, tc_green, and tc_blue variables are inputs specifying the band names
red = zarr_meas[tc_red].values
gre = zarr_meas[tc_green].values
blu = zarr_meas[tc_blue].values

To create the composite image, we need to normalise each of the input assets. Normalisation ensures that the bands have a consistent and predictable range of values. This supports optimal data processing and removes the influence of external factors (like changing light conditions) allowing for a meaningful comparison among generated composites.

The normalisation_str_gm() function achieves this by scaling the reflectance values to a standard range (0-255) using the percentile-based method.

Once the values for our three bands have been normalised, they can be stacked in an RGB format to generate the initial True Colour Image (TCI).

# Input: percentile range for contrast stretching
contrast_stretch_percentile=(2, 98)
# Input: gamma correction value
gamma=1.8

# Apply normalisation to the red, green and blue bands using the specified percentile and gamma values
red_processed = normalisation_str_gm(red, *contrast_stretch_percentile, gamma)
green_processed = normalisation_str_gm(gre, *contrast_stretch_percentile, gamma)
blue_processed = normalisation_str_gm(blu, *contrast_stretch_percentile, gamma)

# We stack the processed red, green, and blue arrays
rgb_composite_sm = np.dstack((red_processed, green_processed, blue_processed)).astype(np.float32)

plt.imshow(rgb_composite_sm)
plt.title('RGB Composite')
<Figure size 640x480 with 1 Axes>

Chunking structure

We can have a look at the reflectance values at 20m resolution chunking strategies by adding .chunksizes to the r20m group.

zarr_chunks_r20 = dt.measurements.reflectance.r20m.chunksizes
print(zarr_chunks_r20)
Frozen({'/measurements/reflectance/r20m': Frozen({'y': (915, 915, 915, 915, 915, 915), 'x': (915, 915, 915, 915, 915, 915)})})

For exploring each of the x and y dimensions, we retrieve the list for each:

y_values_20 = zarr_chunks_r20['/measurements/reflectance/r20m']['y']
x_values_20 = zarr_chunks_r20['/measurements/reflectance/r20m']['x']

print("Chunking Strategies for r20m:")
print('x # of chunks: ',len(x_values_20))
print('x chunk size : ',x_values_20[0])
print('y # of chunks: ',len(y_values_20))
print('y chunk size : ',y_values_20[0])
Chunking Strategies for r20m:
x # of chunks:  6
x chunk size :  915
y # of chunks:  6
y chunk size :  915

We can also explore 10m resolution:

zarr_chunks_r10 = dt.measurements.reflectance.r10m.chunksizes
y_values_10 = zarr_chunks_r10['/measurements/reflectance/r10m']['y']
x_values_10 = zarr_chunks_r10['/measurements/reflectance/r10m']['x']

print("Chunking Strategies for r10m:")
print('x # of chunks: ',len(x_values_10))
print('x chunk size : ',x_values_10[0])
print('y # of chunks: ',len(y_values_10))
print('y chunk size : ',y_values_10[0])
Chunking Strategies for r10m:
x # of chunks:  6
x chunk size :  1830
y # of chunks:  6
y chunk size :  1830

And the same for 60m resolution:

zarr_chunks_r60 = dt.measurements.reflectance.r60m.chunksizes
y_values_60 = zarr_chunks_r60['/measurements/reflectance/r60m']['y']
x_values_60 = zarr_chunks_r60['/measurements/reflectance/r60m']['x']

print("Chunking Strategies for r60m:")
print('x # of chunks: ',len(x_values_60))
print('x chunk size : ',x_values_60[0])
print('y # of chunks: ',len(y_values_60))
print('y chunk size : ',y_values_60[0])
Chunking Strategies for r60m:
x # of chunks:  6
x chunk size :  305
y # of chunks:  6
y chunk size :  305

💪 Now it is your turn

  • Go to the Jupyter Hub

  • Clone the GitHub Repo

  • Select the 01_intro_stac_task.ipynb version of this notebook.

  • Explore this two collections:

Task 1

Sentinel-1 Level-1 GRD over Sicily

  • Starting and ending date:
    2025-05-01 to 2025-05-31

  • Bounding box search:
    12.217097,36.537123,15.710750,38.497568

Task 2

Sentinel-2 L-2A over Vienna

  • Starting and ending date:
    2025-04-01 to 2025-04-30

  • Bounding box search:
    16.126524,48.096899,16.660573,48.316752

Explore...

  • How many items are available?

  • Filter the cloud percentage... How many items are still available?

  • Recreate an RGB composite for Sentinel-2 L-2A over Vienna.


Conclusion

In this section we established a connection to the EOPF Sentinel Zarr Sample Service STAC Catalog and directly accessed an EOPF Zarr item with xarray. In the tutorial you are guided through the process of opening hierarchical EOPF Zarr products using xarray’s DataTree, a library designed for accessing complex hierarchical data structures.
We also brought data to disk and processed it to obtain an RGB composite, always highlighting the flexibility of data retrieval through STAC and the zarr format.


What’s Next?

In the following section we will explore the application of an RBG composite creation a True Colour Image from Sentinel-2 L2A data for the day of the fire.

To obtain a more detailed overview of a natural event, we will integrate an additional dataset into our workflow: Sentinel-3 data. This will enable us to analyse thermal information and pinpoint the active fire’s location.