Accessing Zarr Data

Table of Contents¶

Introduction
Explore Data in STAC Browser
Access the Data
- Using s5cmd
- Using Python
  - pystac-client
  - boto3

🚀 Launch in JupyterHub

Run this notebook interactively with all dependencies pre-installed

Introduction¶

This notebook demonstrates how to access Zarr-formatted data using:

s5cmd: A fast command-line tool for working with S3 object storage.
Python libraries like pystac-client, xarray, and boto3.

Explore Data in STAC Browser¶

You can browse the available Sentinel datasets interactively using our STAC Browser. Use it to filter by collection, date, or location before programmatically accessing data.

Access the Data¶

Explore, interact with and retrieve Sentinel 1, 2 and 3 data in EOPF Zarr format.

Direct download is not the recommended approach. We encourage users to interact with the data directly through cloud-based resources for better efficiency and scalability.

Using `s5cmd`¶

s5cmd is a very fast S3 and local filesystem execution tool. It comes with support for a multitude of operations including tab completion and wildcard support for files, which can be very handy for your object storage workflow while working with large number of files.

Install `s5cmd`¶

You can install s5cmd from the official GitHub repository.

List Files in a bucket¶

s5cmd --endpoint-url https://objects.eodc.eu -no-sign-request ls \
"s3://e05ab01a9d56408d82ac32d69a5aae2a:202507-s03olclfr/*"

Download Files Locally¶

s5cmd --endpoint-url https://objects.eodc.eu -no-sign-request cp \
"s3://e05ab01a9d56408d82ac32d69a5aae2a:202507-s03olclfr/06/../products/cpm_v256/S3A_OL_2_LFR____20250706T083947_20250706T084247_20250707T125430_0179_128_007_2160_PS1_O_NT_003.zarr/*" \
./dummy.zarr/

Using Python¶

You can programmatically discover and load data with Python using the following libraries:

pystac-client¶

Search for available data and load it directly from object storage using pystac-client and xarray.

from pystac_client import Client

catalog = Client.open("https://stac.core.eopf.eodc.eu")
results = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[7.2, 44.5, 7.4, 44.7],
    datetime=["2025-04-30", "2025-05-01"],
)

items = results.item_collection()[0]

Items found can be directly loaded with the xarray library.

import xarray as xr

ds = xr.open_datatree(
    items.assets["product"].href + "/measurements",
    engine="zarr",
    chunks={},
    decode_timedelta=True,
    consolidated=False,
)
ds

boto3¶

If you prefer working directly with the S3 API, use boto3 to list and access objects.

EOPF products are organised on the object storage per month and product, depicted by the following bucket naming convention.

e05ab01a9d56408d82ac32d69a5aae2a:<year><month>-<product-id>

In the provided code snipped this results in the bucket name:

e05ab01a9d56408d82ac32d69a5aae2a:202506-s02msil1c

Furthermore, data is organised by day and the given software version used for conversion as indicated in the “prefix” variable in code.

import boto3
from botocore.config import Config
from botocore import UNSIGNED
import botocore.handlers
import re

botocore.handlers.VALID_BUCKET = re.compile(r"^[:a-zA-Z0-9.\-_]{1,255}$")

# Setup
endpoint_url = "https://objects.eodc.eu"
tenant = "e05ab01a9d56408d82ac32d69a5aae2a"

s3_client = boto3.client(
    "s3", endpoint_url=endpoint_url, config=Config(signature_version=UNSIGNED)
)

# Define bucket and prefix
bucket = f"{tenant}:202506-s02msil1c"
prefix = "20/products/cpm_v256/"

# List objects under the prefix
response = s3_client.list_objects_v2(
    Bucket=bucket,
    Prefix=prefix,
)

# Print object keys
for obj in response.get("Contents", []):
    key = obj["Key"]
    match = re.search(r"\.zarr/", key)
    if match:
        store_root = key[: match.end() - 1]
        url = f"{endpoint_url}/{bucket}/{store_root}"
        print(url)

You can then load the .zarr. dataset using the same approach as shown earlier with .xarray.open_datatree()..