Skip to article frontmatterSkip to article content
earth and related environmental sciences

Accessing Zarr Data

Explore how to access Data in Zarr format.

eodc
ESA EOPF Zarr Logo

🚀 Launch in JupyterHub

Run this notebook interactively with all dependencies pre-installed

Introduction

This notebook demonstrates how to access Zarr-formatted data using:

  • s5cmd: A fast command-line tool for working with S3 object storage.
  • Python libraries like pystac-client, xarray, and boto3.

Explore Data in STAC Browser

You can browse the available Sentinel datasets interactively using our STAC Browser. Use it to filter by collection, date, or location before programmatically accessing data.

Access the Data

Explore, interact with and retrieve Sentinel 1, 2 and 3 data in EOPF Zarr format.

Direct download is not the recommended approach. We encourage users to interact with the data directly through cloud-based resources for better efficiency and scalability.

Using s5cmd

s5cmd is a very fast S3 and local filesystem execution tool. It comes with support for a multitude of operations including tab completion and wildcard support for files, which can be very handy for your object storage workflow while working with large number of files.

Install s5cmd

You can install s5cmd from the official GitHub repository.

List Files in a bucket

s5cmd --endpoint-url https://objects.eodc.eu -no-sign-request ls \
"s3://e05ab01a9d56408d82ac32d69a5aae2a:202507-s03olclfr/*"

Download Files Locally

s5cmd --endpoint-url https://objects.eodc.eu -no-sign-request cp \
"s3://e05ab01a9d56408d82ac32d69a5aae2a:202507-s03olclfr/06/../products/cpm_v256/S3A_OL_2_LFR____20250706T083947_20250706T084247_20250707T125430_0179_128_007_2160_PS1_O_NT_003.zarr/*" \
./dummy.zarr/

Using Python

You can programmatically discover and load data with Python using the following libraries:

pystac-client

Search for available data and load it directly from object storage using pystac-client and xarray.

from pystac_client import Client

catalog = Client.open("https://stac.core.eopf.eodc.eu")
results = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[7.2, 44.5, 7.4, 44.7],
    datetime=["2025-04-30", "2025-05-01"],
)

items = results.item_collection()[0]

Items found can be directly loaded with the xarray library.

import xarray as xr

ds = xr.open_datatree(
    items.assets["product"].href + "/measurements",
    engine="zarr",
    chunks={},
    decode_timedelta=True,
    consolidated=False,
)
ds

boto3

If you prefer working directly with the S3 API, use boto3 to list and access objects.

EOPF products are organised on the object storage per month and product, depicted by the following bucket naming convention.

e05ab01a9d56408d82ac32d69a5aae2a:<year><month>-<product-id>

In the provided code snipped this results in the bucket name:

e05ab01a9d56408d82ac32d69a5aae2a:202506-s02msil1c

Furthermore, data is organised by day and the given software version used for conversion as indicated in the “prefix” variable in code.

import boto3
from botocore.config import Config
from botocore import UNSIGNED
import botocore.handlers
import re

botocore.handlers.VALID_BUCKET = re.compile(r"^[:a-zA-Z0-9.\-_]{1,255}$")

# Setup
endpoint_url = "https://objects.eodc.eu"
tenant = "e05ab01a9d56408d82ac32d69a5aae2a"

s3_client = boto3.client(
    "s3", endpoint_url=endpoint_url, config=Config(signature_version=UNSIGNED)
)

# Define bucket and prefix
bucket = f"{tenant}:202506-s02msil1c"
prefix = "20/products/cpm_v256/"

# List objects under the prefix
response = s3_client.list_objects_v2(
    Bucket=bucket,
    Prefix=prefix,
)

# Print object keys
for obj in response.get("Contents", []):
    key = obj["Key"]
    match = re.search(r"\.zarr/", key)
    if match:
        store_root = key[: match.end() - 1]
        url = f"{endpoint_url}/{bucket}/{store_root}"
        print(url)

You can then load the .zarr. dataset using the same approach as shown earlier with .xarray.open_datatree()..