Data availability + dropping layers with missing values

another probably simple question. After subsetting the cube and calculating mean values for the extracted time period using:

using ESDL
using DelimitedFiles
using Statistics

c = Cube("/home/jovyan/work/datacube/ESDCv2.0.0/esdc-8d-0.083deg-184x270x270-2.0.0.zarr/")

d = subsetcube(c, Lon = (73, 105),
Lat = (25, 40),
time = (Date(2000,1,1), Date(2018,12,31))
)

skipmax(x) = mean(skipmissing(x))

d2 = mapslices(mean, d, dims=(“Time”))

I would like to drop all layers not containing any values (all missing values). While it is easy to subset the cube using for example:

d3 = d2[:,:,1:5]

I fail to build a subset of not continuous elements like for example:

d3 = d2[:,:,[1, 3, 4]]

I don’t understand how Julia allows to drop/subset variables from such a multidimensional cube. For example in R, I could do this by running:

d3 = d2[,-c(3,5,8:10)]

with or without the “-”. All examples I found in Julia so far were only considering two-dimensional arrays and were dropping continuous entries.

As I was mainly interested in dropping entries/layers that contain only missing values (which in my example were a lot!) I also tried out this:

using Missings
d3 = dropmissing(d2)

However, this is also not working because apparently the dropmissing function is not available in Julia 1.1.0.

As mentioned, with the code above, many of the ESDL variables seem to only have missing values (throughout the time series). Is this due to the high resolution of the data-cube? Sorry, maybe this is somewhere in the documentation but I did not find it with a quick search. Will search again tomorrow.

If the resolution is the problem, wouldn’t it be possible to just fill the datacube by resampling the data to a higher spatial resolution? Just keep the same data value for neighboring pixels but at least make this data available?

First of all I think there is a bug in your code, I think when you want to apply your defined skipmax function instead of doing d2 = mapslices(mean, d, dims=(“Time”)), where you apply the normal mean function which will return missing if any of the vector elements are missing.

A short way to write this would be d2 = mapslices(mean ∘ skipmissing, d, dims=(“Time”)) where the ∘ is an operator for chaining functions.

If you do this there should be much less variables containing missing values only.

Regarding the actual question, you should be able to use the subsetcube function to select only the layers that contain data. The advantage here would be that you don’t have to read the data into memory as is done when you do d2[:,:,1:5]

So for example

#To select by variable name
d3 = subsetcube(d2,variable=["gross_primrary","net_ecosystem"])
#To select by variable index
d3 = subsetcube(d2,variable=CartesianIndex.([1,3,5,9]))

However, currently there is not yet a Julia syntax for removing columns or rows from an array, but it is discussed to introduce one at some point.

I will also fix the use case that this does not work:

d3 = d2[:,:,[1, 3, 4]]

but you should be able to work around it using subsetcube for now.