Skip to article frontmatterSkip to article content
earth and related environmental sciences

EOPF Zarr Data Access with openEO and STAC

Learn how to use the EOPF Zarr data through STAC and openEO

Authors
Affiliations
German Aerospace Center
Eurac Research
ESA EOPF Zarr Logo

🚀 Launch in JupyterHub

Run this notebook interactively with all dependencies pre-installed

Introduction

This is a practical introduction to showcasing the interoperability of EOPF Zarr with openEO by accessing and processing EOPF Zarr products via STAC and the openEO API. OpenEO is a unified interface that makes earth observation workflows more standardised, portable, and reproducible.

Setup

Start importing the necessary libraries

pip install openeo-processes-dask[implementations]
pip install openeo-pg-parser-networkx
from openeo.local import LocalConnection
Did not load machine learning processes due to missing dependencies: Install them like this: `pip install openeo-processes-dask[implementations, ml]`
# import openeo # Un-Comment once pull-requests are merged.
import rioxarray as rxr
import dask
from xcube_eopf.utils import reproject_bbox

Connect to openEO

Connect to an openEO backend.

connection = LocalConnection("./")
# Un-Comment once pull-requests are merged and deployed on EODC backend!
# connection_url = "https://openeo.eodc.eu/" # this is throwing an 504 failed to parse error message in the first .download
# connection = openeo.connect(connection_url).authenticate_oidc()

More information on openEO and how to use openEO with some tutorials.

Load a virtual data cube

First, we’ll define the extents in time and space we are interested in

bbox = [
    9.669372670305636,
    53.64026948239441,
    9.701345402674315,
    53.66341039786631,
]  # ~250x250 px Northern Germany, Moorrege.
time_range = ["2025-05-01", "2025-06-01"]

Define the URL to the STAC resource you want to query. And the band names - they are often specific to the STAC Catalogs used.

stac_url = "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a"
bands = ["b02", "b08"]  # bandnames are specific to the STAC Catalogs

Reproject the bbox to the native crs of the STAC items, so that xcube-eopf(which is used under the hood in load_stac()) does not trigger reprojection internally. See detailed behavior here.

crs_reprj = "EPSG:32632"
bbox_reprj = reproject_bbox(bbox, "EPSG:4326", crs_reprj)

spatial_extent_reprj = {
    "west": bbox_reprj[0],
    "south": bbox_reprj[1],
    "east": bbox_reprj[2],
    "north": bbox_reprj[3],
    "crs": crs_reprj,
}

Create a virtual data cube via the funtion load_stac() according to the chosen extents. This also includes the search on the STAC API. Data is not loaded into the working environment here.

cube = connection.load_stac(
    url=stac_url,
    spatial_extent=spatial_extent_reprj,
    temporal_extent=time_range,
    bands=bands,
)

Inspect the load_stac() process visually.

cube
Loading...

Create a processing workflow

Apply processing. We’re calculating the NDVI. And getting a monthly mean.

red = cube.band(bands[0])
nir = cube.band(bands[1])
cube_ndvi = (nir - red) / (nir + red)
cube_mnth = cube_ndvi.reduce_dimension(dimension="time", reducer="mean")

Inspect the worflow visually.

cube_mnth
Loading...

Finalize the workflow via .execute() for the locally executed openEO version or .download() when connected to an openeo backend. The data is not loaded into the working environment now!

Note: Currently we are using the local processing version of openEO. It is not sending any requests to an actual backend. It is converting the openEO process graph to an executable Python code. It’s interacting only with the STAC API and the data buckets.

Note: .execute() also redoes the STAC API search currently.

res = cube_mnth.execute()
dask.is_dask_collection(res)  # check wether data is still lazy.
True

But we can inspect the dimensionality of the virtual cube.

res
Loading...

Process and load the data into the working environment.

res = res.compute()
dask.is_dask_collection(
    res
)  # now the data is loaded, it's not a dask collection anymore.
False
res  # the actual values are accessible in the working environment.
Loading...

Plot the map.

res.plot()
<Figure size 640x480 with 2 Axes>

Compare to other sources

We’ll reuse the process graph we have created and only change the URL to the STAC API to use other sources of input. This is a good example of portability and reproducibility.

def load_ndvi_mean(stac_url, bbox, time_range, bands, time_dim):
    cube = connection.load_stac(
        url=stac_url,
        spatial_extent={
            "west": bbox[0],
            "south": bbox[1],
            "east": bbox[2],
            "north": bbox[3],
        },
        temporal_extent=time_range,
        bands=bands,
    )

    red = cube.band(bands[0])
    nir = cube.band(bands[1])
    cube_ndvi = (nir - red) / (nir + red)
    cube_mnth = cube_ndvi.reduce_dimension(dimension=time_dim, reducer="mean")
    return cube_mnth

MS Planetary Computer

MS Planetary Computer is a stable source of satellite data made available via a STAC API. The Sentinel-2 L2A data has been transformed to Cloud Optimized GeoTiffs (COG) in this collection.

stac_url = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a"  # cog
bands = ["B02", "B08"]
cube_mnth = load_ndvi_mean(stac_url, bbox, time_range, bands, time_dim="time")

Finalize the reused workflow via .execute(). The data is not loaded into the working environment yet.

res_mspc = cube_mnth.execute()
dask.is_dask_collection(res_mspc)
True

But we can inspect the dimensionality of the virtual cube already. The spatial extents are slightly different than what we have received from the EOPF STAC API. This is probably due to the internal reprojection of the bbox. The data is not loaded into the working environment yet.

res_mspc
Loading...

Now we are processing and loading the data into the working enviornment.

res_mspc = res_mspc.compute()
res_mspc
Loading...
res_mspc.plot()
<Figure size 640x480 with 2 Axes>

Let’s check the differences of the two results. Note: Differences occur mainly due to the older processing version (2.11) of Sentinel-2 used on MS Planetary Computer.

diff = res - res_mspc
diff.to_series().describe()
count 55426.000000 mean 0.055157 std 0.025907 min -0.036408 25% 0.037149 50% 0.054587 75% 0.071017 max 0.172499 dtype: float64

CDSE - via openEO

Connect to openEO on CDSE. This also authorizes us to request data from the CDSE STAC API.

import openeo
connection_url = "openeo.dataspace.copernicus.eu"
connection = openeo.connect(connection_url).authenticate_oidc()
Loading...
stac_url = (
    "https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a/"  # safe format
)
bands = ["B02", "B08"]
cube_mnth = load_ndvi_mean(stac_url, bbox, time_range, bands, time_dim="t")

Since we’re connected to the CDSE openEO backend we need to download the result (only JSON can be received via .exectue()). This triggers the processing on the openEO backend as a synchronous call. To register a batch job on the backend and execute it asynchronously use .execute_batch()

res_cdse_backend = cube_mnth.download("res_cdse_backend.tif")

And load it into the working environment.

res_cdse_backend = rxr.open_rasterio("res_cdse_backend.tif")

Here we get the same extents as from EOPF. By loading via rioxarray the ndvi values are encoded into a band dimension.

res_cdse_backend
Loading...
res_cdse_backend.plot()
<Figure size 640x480 with 2 Axes>

Let’s check the differences. Very marginal, likely due to different software used in the different openEO implementations used.

diff2 = res - res_cdse_backend
diff2.to_series().describe()
count 55426.000000 mean 0.078158 std 0.023538 min -0.006268 25% 0.062287 50% 0.077005 75% 0.092643 max 0.174986 dtype: float64

Takeaway

This tutorial has shown how to:

  • Access EOPF Zarr Data via STAC and openEO
  • Apply reusable processing workflows via openEO
  • Compared the EOPF Zarr Data to other available Sentinel-2 sources