Introduction to CuPy-Xarray#
Negin Sobhani, Deepak Cherian, and Max Jones
negins@ucar.edu, dcherian@ucar.edu, and max@carbonplan.org
Overview#
In this tutorial, you learn:#
Basics of Cupy-Xarray
Creating and handling Xarray DataArrays on GPUs
Data Transfer Between Host and Device
Prerequisites#
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
||
Necessary |
||
Necessary |
Time to learn: 10 minutes
Introduction#
Xarray is a powerful library for working with labeled multi-dimensional arrays in Python. It provides a convenient and intuitive way to manipulate large and complex datasets, and is built on top of NumPy. CuPy, on the other hand, is a library that allows for GPU-accelerated computing with Python and is compatible with NumPy.
When used together, Xarray and CuPy can provide an easy way to take advantage of GPU acceleration for scientific computing tasks.
Xarray can wrap custom duck array objects (i.e. NumPy-like arrays) that follow specific protocols.
CuPy-Xarray provides an interface for using CuPy in Xarray, providing accessors on the Xarray objects.
This tutorial showcases the use of cupy-xarray
, which offers a cupy
accessor that allows access to cupy-specific features.
First, let’s import our packages
## Import NumPy and CuPy
import cupy as cp
import numpy as np
import xarray as xr
import cupy_xarray # Adds .cupy to Xarray objects
Creating Xarray DataArray with CuPy#
In the previous tutorial, we learned how to create a NumPy and CuPy array:
arr_cpu = np.random.rand(10, 10, 10)
arr_gpu = cp.random.rand(10, 10, 10)
We can create the Xarray DataArray using the CuPy array or NumPy array as the data source in a similar fashion:
# create a DataArray using CuPy array with three dimensions and 10 elements along each dimension
da_np = xr.DataArray(arr_cpu, dims=["x", "y", "time"])
da_np
<xarray.DataArray (x: 10, y: 10, time: 10)> array([[[9.86345808e-01, 4.17660665e-01, 1.88327552e-01, 8.90280942e-01, 2.21689274e-01, 3.17943609e-01, 5.00880587e-01, 7.53337533e-01, 3.59031996e-01, 1.91030893e-01], [1.33144043e-02, 5.02596284e-01, 5.42913172e-01, 5.23846968e-01, 4.05313585e-01, 9.46118837e-01, 1.06548298e-01, 3.47524404e-01, 1.52843324e-01, 8.48253778e-01], [4.33649929e-01, 6.23728979e-01, 6.47445402e-01, 9.03623126e-01, 9.05012105e-01, 4.06989322e-03, 4.19896664e-01, 8.60406673e-02, 7.41788571e-01, 6.22665340e-01], [1.74267952e-01, 6.14897148e-01, 5.01242328e-01, 6.66759345e-01, 8.44182632e-01, 3.19620076e-01, 7.90701915e-01, 2.43897985e-01, 8.25956047e-01, 6.06534832e-01], [5.28111326e-01, 7.42343565e-01, 8.05094324e-02, 8.84691476e-01, 1.69856723e-02, 3.32512453e-01, 6.67738160e-01, 7.06905069e-01, 5.16369945e-01, 2.71965903e-01], [2.81638568e-01, 2.89389278e-01, 8.19006807e-01, 3.53878654e-01, 9.21084664e-02, 5.69411698e-01, 8.56797393e-01, 3.24107223e-01, 8.15087813e-01, 4.70350855e-01], [5.70124339e-01, 7.92088214e-01, 9.33540441e-01, 9.88027072e-01, 9.05585677e-01, 5.28417548e-01, 4.40169554e-01, 1.40924601e-01, ... 5.29797743e-01, 9.87589722e-01, 9.18635655e-01, 8.68580278e-01, 4.71548324e-01, 3.64257635e-01], [6.42229124e-01, 2.33643023e-02, 5.85033551e-01, 8.80436137e-02, 7.07996956e-01, 4.40586920e-01, 3.10391741e-01, 1.22763638e-02, 8.02412664e-01, 4.33761051e-01], [1.24780820e-01, 3.53875474e-01, 8.36031716e-01, 2.84138174e-02, 3.57476794e-01, 2.44890794e-02, 1.47504786e-02, 3.19465404e-01, 2.91984584e-01, 3.39490525e-01], [2.04021642e-01, 4.71267303e-01, 9.03187717e-02, 3.83928128e-01, 5.96265409e-01, 3.17287239e-01, 3.22413673e-01, 8.38235070e-01, 9.58316554e-01, 9.73589612e-01], [6.13802208e-01, 8.70356525e-01, 5.17350919e-01, 7.72374828e-03, 5.35340510e-01, 8.89268388e-01, 6.93943330e-01, 6.29953006e-01, 1.70230716e-01, 7.16573680e-01], [8.44214598e-01, 3.35186917e-01, 8.78891352e-01, 1.98027834e-01, 6.36005433e-01, 1.21753118e-01, 6.48103717e-01, 8.68341345e-01, 7.81023406e-01, 4.45064620e-01], [3.85731750e-01, 8.02230895e-01, 6.41415045e-01, 7.60886886e-01, 2.00746550e-01, 3.76787007e-01, 6.68073723e-01, 7.87222270e-01, 6.75273015e-01, 8.63705777e-01]]]) Dimensions without coordinates: x, y, time
# create a DataArray using NumPy array with three dimensions and 10 elements along each dimension
da_cp = xr.DataArray(arr_gpu, dims=["x", "y", "time"])
da_cp
<xarray.DataArray (x: 10, y: 10, time: 10)> array([[[2.88196731e-01, 3.71102840e-01, 8.22413516e-01, 7.61603373e-01, 2.14247694e-01, 6.08972260e-01, 6.35605124e-01, 4.51735394e-02, 3.56580833e-01, 3.33245593e-01], [9.56686233e-02, 3.09634487e-01, 5.72034429e-01, 8.64203361e-01, 5.44551902e-01, 4.54445926e-01, 1.21606888e-01, 2.30160410e-01, 6.14639953e-01, 7.73246535e-01], [8.03011705e-01, 2.69969912e-01, 2.03781951e-01, 6.64806547e-01, 4.93709552e-01, 2.60248353e-01, 6.82195033e-01, 6.75837492e-01, 5.07293067e-01, 6.45924343e-01], [1.03968071e-01, 1.31787260e-01, 2.31666523e-02, 2.90727455e-01, 6.22514068e-02, 9.54996781e-01, 1.38868633e-01, 3.18043546e-01, 9.94141764e-01, 6.52825114e-01], [6.72144360e-01, 9.25109790e-01, 9.24907616e-01, 9.97835547e-01, 1.30089788e-01, 3.28381980e-01, 9.47761645e-01, 2.15451004e-01, 1.55072912e-01, 2.84564825e-01], [5.32157180e-01, 4.05812774e-01, 6.65152077e-01, 1.62793186e-01, 8.38375837e-01, 4.38498164e-01, 3.93970103e-01, 3.25181026e-01, 8.43314943e-01, 6.37218468e-01], [9.47935236e-01, 1.39071514e-01, 3.34994498e-01, 7.42907508e-01, 1.13865457e-01, 3.69531071e-01, 6.58907523e-01, 4.10997683e-01, ... 5.01101857e-01, 6.76530919e-01, 6.01550513e-01, 1.91761020e-01, 2.01591335e-01, 3.73443454e-01], [8.72935075e-01, 9.28175014e-01, 7.03819938e-01, 4.25757273e-01, 6.80355431e-01, 1.22351044e-01, 8.22086635e-03, 9.23118431e-01, 8.00040998e-02, 3.51963004e-01], [5.30917733e-01, 1.73025731e-03, 5.46551386e-01, 3.41904305e-01, 6.11276326e-01, 7.83903426e-01, 7.67650251e-01, 9.27383669e-02, 5.99146336e-01, 1.44674661e-02], [9.32478257e-02, 6.51279678e-01, 3.40032365e-01, 6.66761485e-02, 3.88243075e-01, 3.06181721e-02, 5.58666002e-01, 3.10356676e-01, 6.46523629e-01, 1.19013418e-01], [1.81940990e-01, 3.89650142e-01, 9.98204973e-01, 4.39178186e-02, 6.88137446e-02, 7.61541679e-02, 6.26075251e-01, 9.14708720e-01, 4.45414011e-01, 5.16678456e-01], [8.51618677e-01, 6.81900815e-01, 6.66821786e-01, 8.75685884e-01, 2.90499242e-01, 3.25977864e-01, 3.67627054e-01, 3.93770674e-01, 7.40898577e-01, 3.50451112e-02], [7.06374026e-01, 7.19519511e-01, 1.79160522e-01, 8.81425785e-01, 3.51431945e-01, 4.11507382e-01, 6.86088790e-01, 3.04671156e-01, 5.70729870e-01, 7.76584760e-01]]]) Dimensions without coordinates: x, y, time
But how are these two DataArrays different from each other? How do we know which array is on CPU vs. GPU?
Checking for CuPy Arrays#
The cupy
accessor provides the is_cupy
method to check if these arrays are on the host or device. For example:
da_np.cupy.is_cupy
False
da_cp.cupy.is_cupy
True
Accessing Device Information of the DataArray#
To access the underlying CuPy array, use the data
property of the DataArray. It returns the CuPy array:
cupy_array = da_cp.data
type(cupy_array)
cupy.ndarray
In the previous tutorial, we learned about CuPy’s introduction of the notion of a current device. We also learned that to identify the device assigned to a CuPy array, the cupy.ndarray.device
attribute can be used. Similar concept can be applied to a DataArray:
da_cp.data.device
<CUDA Device 0>
Data Transfer#
Transferring DataArrays to another Device#
In the previous lesson we learned that by default, code execution is carried out on Device 0. However, with CuPy, we have the ability to transfer arrays to other devices using cp.cuda.Device(). This feature becomes particularly valuable when your code is designed to leverage the capabilities of multiple GPUs. Similar concept applies to DataArrays that include Cupy Arrays:
with cp.cuda.Device(1):
x_on_gpu1 = cp.array([5, 7, 8, 5, 5])
da_cp1 = xr.DataArray(x_on_gpu1, dims=['time'])
da_cp1.data.device
Transferring Data between Host and Device#
Xarray provides DataArray.as_numpy to convert all kinds of arrays to NumPy arrays.
# Move data to host
da_np = da_cp.as_numpy()
da_np
<xarray.DataArray (x: 10, y: 10, time: 10)> array([[[2.88196731e-01, 3.71102840e-01, 8.22413516e-01, 7.61603373e-01, 2.14247694e-01, 6.08972260e-01, 6.35605124e-01, 4.51735394e-02, 3.56580833e-01, 3.33245593e-01], [9.56686233e-02, 3.09634487e-01, 5.72034429e-01, 8.64203361e-01, 5.44551902e-01, 4.54445926e-01, 1.21606888e-01, 2.30160410e-01, 6.14639953e-01, 7.73246535e-01], [8.03011705e-01, 2.69969912e-01, 2.03781951e-01, 6.64806547e-01, 4.93709552e-01, 2.60248353e-01, 6.82195033e-01, 6.75837492e-01, 5.07293067e-01, 6.45924343e-01], [1.03968071e-01, 1.31787260e-01, 2.31666523e-02, 2.90727455e-01, 6.22514068e-02, 9.54996781e-01, 1.38868633e-01, 3.18043546e-01, 9.94141764e-01, 6.52825114e-01], [6.72144360e-01, 9.25109790e-01, 9.24907616e-01, 9.97835547e-01, 1.30089788e-01, 3.28381980e-01, 9.47761645e-01, 2.15451004e-01, 1.55072912e-01, 2.84564825e-01], [5.32157180e-01, 4.05812774e-01, 6.65152077e-01, 1.62793186e-01, 8.38375837e-01, 4.38498164e-01, 3.93970103e-01, 3.25181026e-01, 8.43314943e-01, 6.37218468e-01], [9.47935236e-01, 1.39071514e-01, 3.34994498e-01, 7.42907508e-01, 1.13865457e-01, 3.69531071e-01, 6.58907523e-01, 4.10997683e-01, ... 5.01101857e-01, 6.76530919e-01, 6.01550513e-01, 1.91761020e-01, 2.01591335e-01, 3.73443454e-01], [8.72935075e-01, 9.28175014e-01, 7.03819938e-01, 4.25757273e-01, 6.80355431e-01, 1.22351044e-01, 8.22086635e-03, 9.23118431e-01, 8.00040998e-02, 3.51963004e-01], [5.30917733e-01, 1.73025731e-03, 5.46551386e-01, 3.41904305e-01, 6.11276326e-01, 7.83903426e-01, 7.67650251e-01, 9.27383669e-02, 5.99146336e-01, 1.44674661e-02], [9.32478257e-02, 6.51279678e-01, 3.40032365e-01, 6.66761485e-02, 3.88243075e-01, 3.06181721e-02, 5.58666002e-01, 3.10356676e-01, 6.46523629e-01, 1.19013418e-01], [1.81940990e-01, 3.89650142e-01, 9.98204973e-01, 4.39178186e-02, 6.88137446e-02, 7.61541679e-02, 6.26075251e-01, 9.14708720e-01, 4.45414011e-01, 5.16678456e-01], [8.51618677e-01, 6.81900815e-01, 6.66821786e-01, 8.75685884e-01, 2.90499242e-01, 3.25977864e-01, 3.67627054e-01, 3.93770674e-01, 7.40898577e-01, 3.50451112e-02], [7.06374026e-01, 7.19519511e-01, 1.79160522e-01, 8.81425785e-01, 3.51431945e-01, 4.11507382e-01, 6.86088790e-01, 3.04671156e-01, 5.70729870e-01, 7.76584760e-01]]]) Dimensions without coordinates: x, y, time
Let’s confirm this isn’t a CuPy array anymore:
da_np.cupy.is_cupy
False
We also can convert an Xarray DataArray that include NumPy array to a CuPy array (move data to Device) use cupy.as_cupy()
:
# Move data to GPU
da_cp = da_np.cupy.as_cupy()
da_cp.as_cupy().cupy.is_cupy
True
Plotting#
Plotting DataArrays with underlying data as CuPy arrays work in the same way as DataArrays with Numpy Arrays; however, data is first transferred to CPU before being plotted.
da_cp.plot()
(array([111., 93., 96., 112., 88., 93., 115., 88., 98., 106.]),
array([5.01237631e-05, 9.98656087e-02, 1.99681094e-01, 2.99496578e-01,
3.99312063e-01, 4.99127548e-01, 5.98943033e-01, 6.98758518e-01,
7.98574003e-01, 8.98389488e-01, 9.98204973e-01]),
<BarContainer object of 10 artists>)
Summary#
In this notebook, we have learned about:
CuPy-Xarray Basics
Data Transfer between Device to Host