Skip to article frontmatterSkip to article content
earth and related environmental sciences

Dask Intro

Explore how to connect to the dask gateway, spin up and scale a cluster and submit a simple workload.

EODC
ESA EOPF Zarr Logo

Sentinel-1 GRD Zarr Product Exploration

🚀 Launch in JupyterHub

Run this notebook interactively with all dependencies pre-installed

Introduction

This notebook demonstrates how to open, explore, and visualize Sentinel-1 GRD products stored in EOPF Zarr format, including accessing, geocoding, subsetting and plotting a single polarization.

Import Modules

from dask_gateway import Gateway

Setup

The setup is very simple, you will just need the following lines. Each of our JupyterLab instances starts with the gatway data preconfigured.

"""
gate = Gateway("https://dask.user.eopf.eodc.eu",
               proxy_address="tls://dask.user.eopf.eodc.eu:10000",
               auth="jupyterhub")
"""

# Dask Gateway automatically loads the correct configuration.
# So the code below is identical to the comment above
gate = Gateway()
cluster = gate.new_cluster()
cluster

Cluster Object

When running the lines above, you’ll be presented with the following interface:

Cluster UI

This UI allows you to quickly scale the amount of workers you need.

Alternatively you can also use the following lines of code for scaling:

cluster.scale(2)

cluster.adapt(minimum=1, maximum=2)

Next, you’ll have to create a dask client object, this is the last step before we can start working with our cluster.

client = cluster.get_client()

client

This should also give you an url giving you access to your Dask Dashboard.
This gives you an overview of the workers and workloads in your dask cluster.
There’s also a JupyterHub Extension giving you the same overview inside of your JupyterLab.

Dask Dashboard

The following code-blocks outline a rudimentary workflow to get you going with Dask.

Here we create an array of random numbers, add it to its own transposition and then calculate the mean across a slice of the data.

import dask.array as da

x = da.random.random((10000, 10000), chunks=(1000, 1000))

y = x + x.T
z = y[::2, 5000:].mean(axis=1)
z

Until now this has all been done lazily, in order to actually get our results

result = z.compute()

In your dashboard you will see the task stream filling up, and once the computation is done you can keep using the resulting data locally.

Task Stream

Cleanup

Make sure to shut down your cluster when you’re done working with it.

cluster.shutdown()

Summary

This notebook explained how to interact with the Dask services available through the EOPF Sample Service.