Reshape A Dask Array (obtained From A Dask Dataframe Column)
I am new to dask and am trying to figure out how to reshape a dask array that I've obtained from a single column of a dask dataframe and am running into errors. Wondering if anyone
Solution 1:
Also:
ddf['x'].to_dask_array(lengths=True).reshape([-1,1])
Solution 2:
Unfortunately, then length of a dataframe and its pieces is generally lazy in Dask, and only computed on explicit request. That means that the array doesn't know its length or partitioning either, and so you can't reshape. The following clunky code gets around this, but I feel there should be a simpler way.
Find the chunks:
chunks = tuple(ddf['x'].map_partitions(len).compute())
size = sum(chunks)
Create a new array object with the now-known chunks and size:
a = ddf['x'].values
arr = da.Array(a.dask, a.name, chunks, a.dtype, (size,))
Post a Comment for "Reshape A Dask Array (obtained From A Dask Dataframe Column)"