Some xarray functions don't finish


#1

Hi everyone,

I’m using xarray functions on the ESDL cube (Precipitation variable), and I encounter some problems with the ‘rolling’ and ‘chunk’ functions, which don’t finish, if I use more than a few years of data.

I’ve tried to find solutions online, but as I don’t get any error message, I haven’t been successful. Do you have an idea about how I could work around this?

The notebook that I’m using can be found under think link: https://jupyterhub.earthsystemdatalab.net/user/matheino/lab/tree/calculate_spi_test3.ipynb.

Thanks in advance,
Matias


#2

Hi Matias,

soon the ESDL Team will provide rechunked ESDL cubes which are optimized for time operations, then the rolling function should be working properly.
For the moment you can find an working example of how the solution could look like below.

Hope, this helps!

Best,
Alicja from ESDL Team

import numpy as np
import sys
import os
import time
import xarray as xr
import multiprocessing as mp
import matplotlib.pyplot as plt
from dask.diagnostics import ProgressBar
ESDC = xr.open_zarr("/home/jovyan/work/datacube/ESDCv2.0.0/esdc-8d-0.25deg-1x720x1440-2.0.0.zarr")
precip_monthly = ESDC['precipitation'].resample(time = '1M',keep_attrs = True).sum('time')
precip_monthly = precip_monthly.loc[dict(time=slice('1991-01-01', '2010-12-31'))]
print(sys.getsizeof(precip_monthly))
print(precip_monthly)

56
<xarray.DataArray ‘precipitation’ (time: 240, lat: 720, lon: 1440)>
dask.array<shape=(240, 720, 1440), dtype=float32, chunksize=(1, 720, 1440)>
Coordinates:
* time (time) datetime64[ns] 1991-01-31 1991-02-28 1991-03-31 …
* lat (lat) float32 89.875 89.625 89.375 89.125 88.875 88.625 88.375 …
* lon (lon) float32 -179.875 -179.625 -179.375 -179.125 -178.875 …

with ProgressBar():
    precip_monthly.sel(time='2011-11-01',method='nearest').plot.imshow(vmax = 50)

[########################################] | 100% Completed | 0.1s
[########################################] | 100% Completed | 0.1s

output_3_1

precip_monthly_rechunked = precip_monthly.chunk({'lat': 120, 'lon': 120,'time': 240})
precip_monthly_rechunked

<xarray.DataArray ‘precipitation’ (time: 240, lat: 720, lon: 1440)>
dask.array<shape=(240, 720, 1440), dtype=float32, chunksize=(240, 120, 120)>
Coordinates:
* time (time) datetime64[ns] 1991-01-31 1991-02-28 1991-03-31 …
* lat (lat) float32 89.875 89.625 89.375 89.125 88.875 88.625 88.375 …
* lon (lon) float32 -179.875 -179.625 -179.375 -179.125 -178.875 …

precip_monthly_rechunked.sel(lat=53,lon=7., method='nearest').compute().sum()

<xarray.DataArray ‘precipitation’ ()>
array(1975.270263671875)
Coordinates:
lat float32 53.125
lon float32 7.125

precip_monthly_rechunked.sel(lat=53,lon=7., method='nearest').plot()

[<matplotlib.lines.Line2D at 0x7f7f65506940>]

output_7_1

precip_3month = precip_monthly_rechunked.rolling(time=3,min_periods=1, center=True).mean(skipna=True)
fig, ax = plt.subplots(figsize = [14,5], ncols=1)
precip_3month.sel(lat=53,lon=7., method='nearest').plot(ax=ax, color = 'red')
precip_monthly_rechunked.sel(lat=53,lon=7., method='nearest').plot(ax=ax, color = 'blue')

[<matplotlib.lines.Line2D at 0x7f7f64224630>]


#3

Thank you for the quick reply, and information. This was very helpful!

Best,
Matias