Build Sentinel-2 Data Cubes¶
xcube-eopf is a Python package that extends xcube with a new data store called "eopf-zarr". This plugin enables the creation of analysis-ready data cubes (ARDC) from multiple Sentinel products published by the EOPF Sentinel Zarr Sample Service.
In this notebook, we demonstrate how to use the xcube EOPF data store to access multiple Sentinel-2 EOPF Zarr products and generate 3D analysis-ready data cubes (ARDC).
For a general introduction to the xcube EOPF Data Store, see the introduction notebook.
- 🐙 GitHub: EOPF Sample Service – xcube-eopf
- ❗ Issue Tracker: Submit or view issues
- 📘 Documentation: xarray-eopf Docs
Main Features of the xcube-eopf Data Store for Sentinel-2¶
Sentinel-2 provides multi-spectral imagery at different native spatial resolutions:
- 10 m: B02, B03, B04, B08
- 20 m: B05, B06, B07, B8A, B11, B12
- 60 m: B01, B09, B10
Sentinel-2 products are organized as STAC Items, each representing a single tile stored in its native UTM coordinate reference system (CRS).

Data Cube Generation Workflow¶
- STAC Query: Relevant STAC Items (tiles) are retrieved via the STAC API based on the specified spatial and temporal extent (
bboxandtime_range). - Sorting: Items are ordered by acquisition time and tile ID.
- Native Alignment: Within each UTM zone, tiles from the same acquisition day are aligned in their native CRS without reprojection. Overlapping pixels are resolved by selecting the first non-NaN value according to item order.
- Cube Assembly: The cube generation strategy depends on the request, as summarized below:
| Scenario | Native Resolution Preservation | Reprojected or Resampled Cube |
|---|---|---|
| Condition | Bounding box lies within a single UTM zone, the native CRS is requested, and the spatial resolution matches the native resolution. | Data spans multiple UTM zones, a different CRS is requested (e.g., EPSG:4326), or a custom spatial resolution is specified. |
| Processing | Only upsampling or downsampling is applied to align spectral bands with different resolutions. The cube is cropped to the requested bounding box, preserving original pixel values. The spatial extent may slightly adjust to the native grid. | A target grid is defined based on the bounding box, spatial resolution, and CRS. Data from each UTM zone is reprojected/resampled to this grid. Overlaps are resolved using the first non-NaN values. |
Users can specify any spatial resolution and coordinate reference system (CRS) when calling open_data. Depending on the request, spectral bands may be resampled (upsampled or downsampled) and/or reprojected to match the target grid using xcube-resampling.
📚 More info: xcube-eopf Sentinel-2 Documentation
Import Modules¶
The xcube-eopf data store is provided as a plugin for xcube. Once installed, it registers automatically, allowing you to import xcube just like any other xcube data store:
import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
import numpy as np
from xcube.core.store import new_data_store
from xcube_resampling.utils import reproject_bbox
Access Sentinel-2 Level-2A ARDC¶
In this section, we demonstrate the available features and options for opening and generating spatio-temporal data cubes from multiple Sentinel-2 tiles.
To initialize an eopf-zarr data store, run the cell below:
store = new_data_store("eopf-zarr")
The data IDs point to STAC collections. In the following cell we can list the available data IDs.
store.list_data_ids()
['sentinel-2-l1c', 'sentinel-2-l2a', 'sentinel-3-olci-l1-efr', 'sentinel-3-olci-l2-lfr', 'sentinel-3-slstr-l1-rbt', 'sentinel-3-slstr-l2-lst']
Below, you can explore the parameters of the open_data() method for each supported data product. The following cell generates a JSON schema listing all available opening parameters for Sentinel-2 Level-2A products.
store.get_open_data_params_schema(data_id="sentinel-2-l2a")
<xcube.util.jsonschema.JsonObjectSchema at 0x7fb5f69b0f30>
Spatio-Temporal Selection and Reprojection¶
We now generate a data cube from the Sentinel-2 L2A product by setting data_id to "sentinel-2-l2a".
The bounding box is defined to cover Mount Etna in Sicily, and the time range is set to the first half of May 2025.
We begin by creating the data cube in its native UTM projection to avoid any reprojection or sub-pixel resampling.
bbox = [14.8, 37.45, 15.3, 37.85]
crs_utm = "EPSG:32633"
bbox_utm = reproject_bbox(bbox, "EPSG:4326", crs_utm)
%%time
ds = store.open_data(
data_id="sentinel-2-l2a",
bbox=bbox_utm,
time_range=["2025-05-01", "2025-05-15"],
spatial_res=20,
crs=crs_utm,
variables=["b02", "b03", "b04", "scl"],
)
ds
CPU times: user 2.42 s, sys: 126 ms, total: 2.55 s Wall time: 37 s
<xarray.Dataset> Size: 860MB
Dimensions: (time: 7, y: 2222, x: 2212)
Coordinates:
* time (time) datetime64[s] 56B 2025-05-01T09:50:29 ... 2025-05-13T...
* y (y) float64 18kB 4.189e+06 4.189e+06 ... 4.145e+06 4.145e+06
* x (x) float64 18kB 4.823e+05 4.823e+05 ... 5.265e+05 5.265e+05
spatial_ref int64 8B ...
Data variables:
b02 (time, y, x) float64 275MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b03 (time, y, x) float64 275MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b04 (time, y, x) float64 275MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
scl (time, y, x) uint8 34MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
Attributes: (5)Note that the 3D datacube generation is fully lazy. Actual data download and processing (e.g., mosaicking, stacking) are performed on demand and are only triggered when the data is written or visualized.
As an example, the next cell plots a single timestamp of the red band and the Scene Classification Layer (SCL). The plot uses the color map information provided in the attributes of the SCL data array.
%%time
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ds.b04.isel(time=1).plot(ax=ax[0], vmin=0.0, vmax=0.3)
ax[0].set_title("B04 - red")
cmap = mcolors.ListedColormap(ds.scl.attrs["flag_colors"].split(" "))
nb_colors = len(ds.scl.attrs["flag_values"])
norm = mcolors.BoundaryNorm(
boundaries=np.arange(nb_colors + 1) - 0.5, ncolors=nb_colors
)
im = ds.scl.isel(time=1).plot.imshow(ax=ax[1], cmap=cmap, norm=norm, add_colorbar=False)
cbar = fig.colorbar(im, ax=ax[1], ticks=ds.scl.attrs["flag_values"])
cbar.ax.set_yticklabels(ds.scl.attrs["flag_meanings"].split(" "))
cbar.set_label("Class")
ax[1].set_title("Scene classification layer")
plt.tight_layout()
/home/konstantin/micromamba/envs/xcube-eopf/lib/python3.13/site-packages/numpy/_core/numeric.py:475: RuntimeWarning: invalid value encountered in cast multiarray.copyto(res, fill_value, casting='unsafe')
CPU times: user 835 ms, sys: 171 ms, total: 1.01 s Wall time: 2.51 s
The user can also set crs="native" which allows specifying the bounding box in latitude/longitude, but returns the data in UTM. Note if the data request covers multiple UTM zones, an error will be raised.
%%time
ds = store.open_data(
data_id="sentinel-2-l2a",
bbox=bbox,
time_range=["2025-05-01", "2025-05-07"],
spatial_res=10,
crs="native",
variables=["b02", "b03", "b04", "scl"],
)
ds
CPU times: user 1.12 s, sys: 36.9 ms, total: 1.16 s Wall time: 15 s
<xarray.Dataset> Size: 2GB
Dimensions: (time: 4, y: 4443, x: 4423)
Coordinates:
* time (time) datetime64[s] 32B 2025-05-01T09:50:29 ... 2025-05-06T...
* y (y) float64 36kB 4.189e+06 4.189e+06 ... 4.145e+06 4.145e+06
* x (x) float64 35kB 4.823e+05 4.823e+05 ... 5.265e+05 5.265e+05
spatial_ref int64 8B ...
Data variables:
b02 (time, y, x) float64 629MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b03 (time, y, x) float64 629MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b04 (time, y, x) float64 629MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
scl (time, y, x) uint8 79MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
Attributes: (5)We can request the same data cube but in geographic projection ("EPSG:4326") as well. The xcube EOPF data store can reproject the datacube to any projection requested by the user.
%%time
ds = store.open_data(
data_id="sentinel-2-l2a",
bbox=bbox,
time_range=["2025-05-01", "2025-05-07"],
spatial_res=20 / 111320, # meters converted to degrees (approx.)
crs="EPSG:4326",
variables=["b02", "b03", "b04", "scl"],
)
ds
CPU times: user 3.06 s, sys: 140 ms, total: 3.2 s Wall time: 15.6 s
<xarray.Dataset> Size: 2GB
Dimensions: (time: 4, lon: 5566, lat: 4453)
Coordinates:
* time (time) datetime64[s] 32B 2025-05-01T09:50:29 ... 2025-05-06T...
* lon (lon) float64 45kB 14.8 14.8 14.8 14.8 ... 15.3 15.3 15.3 15.3
* lat (lat) float64 36kB 37.85 37.85 37.85 ... 37.45 37.45 37.45
spatial_ref int64 8B 0
Data variables:
b02 (time, lat, lon) float64 793MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b03 (time, lat, lon) float64 793MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b04 (time, lat, lon) float64 793MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
scl (time, lat, lon) uint8 99MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
Attributes: (5)%%time
ds.b04.isel(time=1).plot(vmin=0, vmax=0.2)
CPU times: user 13.4 s, sys: 2.49 s, total: 15.9 s Wall time: 5.89 s
<matplotlib.collections.QuadMesh at 0x7fb5c414b110>
Support for Common Band Names¶
Support has been added for common band names from the STAC EO extension in Sentinel-2 analysis mode.
The variables parameter now accepts standard spectral names such as blue, green, red, nir, and others.
The next cell demonstrates an example where we filter spectral bands using these common names.
%%time
ds = store.open_data(
data_id="sentinel-2-l2a",
bbox=bbox,
time_range=["2025-05-01", "2025-05-07"],
spatial_res=20 / 111320, # meters converted to degrees (approx.)
crs="EPSG:4326",
variables=["blue", "green", "red", "scl"],
)
ds
CPU times: user 3.47 s, sys: 85.3 ms, total: 3.55 s Wall time: 18.7 s
<xarray.Dataset> Size: 2GB
Dimensions: (time: 4, lon: 5566, lat: 4453)
Coordinates:
* time (time) datetime64[s] 32B 2025-05-01T09:50:29 ... 2025-05-06T...
* lon (lon) float64 45kB 14.8 14.8 14.8 14.8 ... 15.3 15.3 15.3 15.3
* lat (lat) float64 36kB 37.85 37.85 37.85 ... 37.45 37.45 37.45
spatial_ref int64 8B 0
Data variables:
blue (time, lat, lon) float64 793MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
green (time, lat, lon) float64 793MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
red (time, lat, lon) float64 793MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
scl (time, lat, lon) uint8 99MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
Attributes: (5)Access Sentinel-2 Level-1C ARDC¶
In this section, we demonstrate how to open and generate spatio-temporal data cubes from multiple Sentinel-2 Level-1C tiles.
The Level-1C collection uses the same opening parameters as the Level-2A collection, as shown below.
store.get_open_data_params_schema(data_id="sentinel-2-l1c")
<xcube.util.jsonschema.JsonObjectSchema at 0x7fb5607f7d90>
We set data_id to "sentinel-2-l1c" and leave variables unspecified, so all spectral bands are retrieved. We also do not set a crs, so the default "EPSG:4326" is used.
%%time
ds = store.open_data(
data_id="sentinel-2-l1c",
bbox=bbox,
time_range=["2025-05-01", "2025-05-07"],
spatial_res=20 / 111320, # meters converted to degrees (approx.)
)
ds
CPU times: user 5.74 s, sys: 250 ms, total: 5.99 s Wall time: 21.1 s
<xarray.Dataset> Size: 3GB
Dimensions: (time: 4, lon: 2783, lat: 2227)
Coordinates:
* time (time) datetime64[s] 32B 2025-05-01T09:50:29 ... 2025-05-06T...
* lon (lon) float64 22kB 14.8 14.8 14.8 14.8 ... 15.3 15.3 15.3 15.3
* lat (lat) float64 18kB 37.85 37.85 37.85 ... 37.45 37.45 37.45
spatial_ref int64 8B 0
Data variables: (12/13)
b02 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b03 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b04 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b08 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b05 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b06 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
... ...
b11 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b12 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b8a (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b01 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b09 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
b10 (time, lat, lon) float64 198MB dask.array<chunksize=(1, 1830, 1830), meta=np.ndarray>
Attributes: (5)And we can plot a spectral band for a time stamp, which triggers loading the data for this slice.
%%time
ds.b04.isel(time=1).plot(vmin=0, vmax=0.2)
CPU times: user 23.1 s, sys: 3.06 s, total: 26.2 s Wall time: 28.1 s
<matplotlib.collections.QuadMesh at 0x7fb36c48fed0>
Conclusion¶
This notebook highlighted the main features of the xcube EOPF Data Store for Sentinel-2, which enables seamless access to multiple EOPF Zarr products as analysis-ready data cubes (ARDCs).
Key takeaways:
- 3D spatio-temporal data cubes can be generated from multiple EOPF Sentinel Zarr samples.
- Supports access to both Sentinel-2 Level-2A and Level-1C collections.
- Data cubes can be requested with any CRS, spatial extent, temporal range, and spatial resolution.
- Setting
crs="native"preserves the original UTM grid; an error is raised if the requested spatial extent spans multiple UTM zones. - Common spectral bands, as defined by the STAC EO extension, are supported for variable selection.