Apply Custom Kernels with xarray.apply_ufunc
#
Negin Sobhani, Deepak Cherian, and Max Jones
negins@ucar.edu, dcherian@ucar.edu, and max@carbonplan.org
Overview#
In this tutorial, you learn:#
What
apply_ufunc
is and its importance in the xarray Python library.The basic usage of
apply_ufunc
to apply your function to a DataArray.Applying custom kernels to DataArray with CuPy
Prerequisites#
Concepts |
Importance |
Notes |
---|---|---|
Necessary |
||
Necessary |
Time to learn: 20 minutes
What is apply_ufunc
?#
apply_ufunc
is a powerful function provided by the xarray library, which is commonly used for data manipulation in the Python programming language. This function allows users to apply universal functions (ufuncs) on xarray data structures, including DataArray, Dataset, or Variable objects. With apply_ufunc, users can apply arbitrary functions that are compatible with raw numpy arrays, and the function will take care of aligning the input data, looping over dimensions, and maintaining metadata. Since apply_ufunc
operates on lower-level NumPy objects, it skips the overhead of using Xarray objects making it a good choice for performance-critical functions.
See the Xarray tutorial material on apply_ufunc for more.
Simple functions that act independently on each value should work without any additional arguments.
Simple Example#
In the example below, we calculate the saturation vapor pressure by using apply_ufunc()
.
But first, let’s create some sample data to work with.
We’ll use a 3-dimensional dataset (time, latitude, longitude) with random values:
import time
import numpy as np
import xarray as xr
import pandas as pd
import cupy as cp
import cupy_xarray # Adds .cupy to Xarray objects
np.random.seed(0)
# Create the time range.
date = pd.date_range('2010-01-01', '2020-12-31', freq='M')
# Create the latitude range.
lat = np.arange(-90, 90, 1)
# Create the longitude range.
lon = np.arange(-180, 180, 1)
# Create random data
data_np = np.random.rand(len(date), len(lat), len(lon))
data_cp = cp.array(data_np)
# -- Create DataArray with Numpy data
data_xr_np = xr.DataArray(data_np,
dims=['time', 'lat', 'lon'],
coords=[date, lat, lon],)
# -- Create DataArray with CuPy data
data_xr_cp = xr.DataArray(data_cp,
dims=['time', 'lat', 'lon'],
coords=[date, lat, lon],)
Now, let’s define our function that calculate the saturation vapor pressure using Clausius-Clapeyron equation:
def sat_p(t):
# return saturation vapor pressure
# using Clausius-Clapeyron equation
return 0.611*np.exp(17.67*(t-273.15)*((t-29.65)**(-1)))
start_time_np = time.time()
es_np=xr.apply_ufunc(sat_p,data_xr_np)
end_time_np = time.time()
time_np = end_time_np - start_time_np
type(es_np.data)
numpy.ndarray
start_time_cp = time.time()
es_cp=xr.apply_ufunc(sat_p,data_xr_cp)
end_time_cp = time.time()
time_cp = end_time_cp - start_time_cp
print("GroupBy with Xarray DataArrays using CuPy provides a", round(time_np / time_cp,2), "x speedup over NumPy.\n")
GroupBy with Xarray DataArrays using CuPy provides a 0.22 x speedup over NumPy.
Now, what is the output type? Does apply_ufunc
preserve the underlying data type?
type(es_cp.data)
cupy.ndarray
In the timing test, you might notice not much speed-up when using CuPy. But let’s run this cell another time:
start_time_cp = time.time()
es_cp=xr.apply_ufunc(sat_p,data_xr_cp)
end_time_cp = time.time()
time_cp = end_time_cp - start_time_cp
print("GroupBy with Xarray DataArrays using CuPy provides a", round(time_np / time_cp,2), "x speedup over NumPy.\n")
GroupBy with Xarray DataArrays using CuPy provides a 97.87 x speedup over NumPy.
Now, we can see much more speed-up using CuPy. The reason also explained in the first lesson.
When running these functions for the first time, you may experience a brief pause. This occurs as CuPy compiles the CUDA functions for the first time and cached them on disk for future use.
Elementwise Kernel with CuPy#
Elementwise Kernels in CuPy allow for operations to be performed on an element-by-element basis on CuPy arrays.
To create an elementwise kernel in CuPy, you need to use cupy.ElementwiseKernel
class. This class defines a CUDA kernel which can be invoked by the __call__
method of the instance.
This elementwise kernel takes three arguments:
a string defining the input type(s),
a string defining the output type(s),
and a string representing the operation to be performed, written in C syntax.
In this example, we want to calculate Relative Humidity using Revised Magnus coefficients by Alduchov and Eskridge.
Revised Magnus Coefficients by Alduchov and Eskridge: $\( RH = \left(\frac{{6.112 \cdot \exp\left(\frac{{17.67 \cdot (T_d - 273.15)}}{{T_d - 29.65}}\right)}}{{6.112 \cdot \exp\left(\frac{{17.67 \cdot (T - 273.15)}}{{T - 29.65}}\right)}}\right) \times 100 \% \)$
Here is the python code for calculating the relative humidity from temperature and dew point temperature:
def calculate_relative_humidity(temp, dew_point):
"""
Calculate Relative Humidity using Revised Magnus coefficients by Alduchov and Eskridge.
Args:
temp (float): Temperature in Celsius.
dew_point (float): Dew Point Temperature in Celsius.
Returns:
float: Relative Humidity in percentage.
"""
temp += 273.15 # Convert temperature to Kelvin
dew_point += 273.15 # Convert dew point temperature to Kelvin
es_temp = 6.112 * np.exp((17.67 * (dew_point - 273.15)) / (dew_point - 29.65)) # Saturation vapor pressure at dew point
es_dew = 6.112 * np.exp((17.67 * (temp - 273.15)) / (temp - 29.65)) # Saturation vapor pressure at temperature
relative_humidity = (es_dew / es_temp) * 100.0 # Calculate relative humidity in percentage
return relative_humidity
But for Elementwise
kernels we need to write it in C syntax:
step 1: Set the list of input and output arguments and their data types:
input arguments :
float32 temp
,float32 d_temp
output arguments :
float32 rh
step 2: Write the code body:
temp += 273.15;
dew_point += 273.15;
// Calculate saturation vapor pressure at dew point
float es_temp = 6.112 * exp((17.67 * (dew_point - 273.15)) / (dew_point - 29.65));
// Calculate saturation vapor pressure at temperature
float es_dew = 6.112 * exp((17.67 * (temp - 273.15)) / (temp - 29.65));
// Calculate relative humidity in percentage
float relative_humidity = (es_dew / es_temp) * 100.0;
step 3: Define the element-wise class and set the kernel name:
compute_call = cp.ElementwiseKernel(input_list, output_list, code_body, 'RH')
Now let’s test to see how this works in a real example:
The example above showed a 1 degree DataArray. What if we increase the data size to 0.5 degree?
# Create random data
data_cp = 20*(cp.random.rand(len(date), len(lat), len(lon)))
# -- Create Temp DataArray with CuPy data
temp = xr.DataArray(data_cp,
dims=['time', 'lat', 'lon'],
coords=[date, lat, lon],)
offset = 20* cp.random.rand(len(date), len(lat), len(lon))
# -- Create Wet Bulb Temp DataArray with CuPy data
temp_wet = xr.DataArray(data_cp-offset,
dims=['time', 'lat', 'lon'],
coords=[date, lat, lon],)
input_list = 'float64 temp, float64 dew_temp'
output_list = 'float64 rh'
code_body = '''
// Calculate saturation vapor pressure at dew point
float es_temp = 6.112 * exp((17.67 * (dew_temp)) / (dew_temp - 29.65));
// Calculate saturation vapor pressure at temperature
float es_dew = 6.112 * exp((17.67 * (temp)) / (temp - 29.65));
// Calculate relative humidity in percentage
rh = (es_dew / es_temp) * 100.0;
'''
## -- define the elementwise kernel:
compute_call = cp.ElementwiseKernel(input_list, output_list, code_body, 'RH')
kernel = compute_call(data_cp, data_cp-offset)
%%time
result = xr.apply_ufunc(
compute_call,
temp,
temp_wet,
)
##result
CPU times: user 1.24 ms, sys: 0 ns, total: 1.24 ms
Wall time: 1.24 ms
How much this computation took if we wanted to use pure Python?
%%time
relative_humidity = calculate_relative_humidity(temp, temp_wet)
CPU times: user 3.97 ms, sys: 0 ns, total: 3.97 ms
Wall time: 3.98 ms
We can see using the custom kernel method, we removed the pure Python overhead in between calculations, by creating a custom “elementwise” kernel that will run the entire computations on the GPU device.
Congratulations! You have now uncovered how to use apply_ufunc
with custom CUDA kernels.
Summary#
In this notebook, we have learned about:
What apply_ufunc is and its importance in the xarray Python library.
The basic usage of apply_ufunc to apply your function to a DataArray.
Applying custom kernels to DataArray with CuPy
Additional Resources#
Xarray apply_ufunc
CuPy User Guide
Xarray User Guide
Cupy-Xarray Github