How To Use Numpy To Calculate Mean And Standard Deviation Of An Irregular Shaped Array
Solution 1:
Don't make ragged arrays. Just don't. Numpy
can't do much with them, and any code you might make for them will always be unreliable and slow because numpy
doesn't work that way. It turns them into object
dtypes:
Sample
array([[1, 2, 3], [1, 2]], dtype=object)
Which almost no numpy
functions work on. In this case those objects are list
objects, which makes your code even more confusing as you either have to switch between list
and ndarray
methods, or stick to list-safe numpy
methods. This a recipe for disaster as anyone noodling around with the code later (even yourself if you forget) will be dancing in a minefield.
There's two things you can do with your data to make things work better:
First method is to index and flatten.
i = np.cumsum(np.array([len(x) for x in Sample]))
flat_sample = np.hstack(Sample)
This preserves the index of the end of each sample in i
, while keeping the sample as a 1D array
The other method is to pad one dimension with np.nan
and use nan
-safe functions
m = np.array([len(x) for x in Sample]).max()
nan_sample = np.array([x + [np.nan] * (m - len(x)) for x in Sample])
So to do your calculations, you can use flat_sample
and do similar to above:
new_flat_sample = (flat_sample - np.mean(flat_sample)) / np.std(flat_sample)
and use i
to recreate your original array (or list of arrays, which I recommend:, see np.split
).
new_list_sample = np.split(new_flat_sample, i[:-1])
[array([-1.06904497, 0.26726124, 1.60356745]),
array([-1.06904497, 0.26726124])]
Or use nan_sample
, but you will need to replace np.mean
and np.std
with np.nanmean
and np.nanstd
new_nan_sample = (nan_sample - np.nanmean(nan_sample)) / np.nanstd(nan_sample)
array([[-1.06904497, 0.26726124, 1.60356745],
[-1.06904497, 0.26726124, nan]])
Solution 2:
@MichaelHackman (following the comment remark). That's weird because when I compute the overall std and mean then apply it, I obtain different result (see code below).
import numpy as np
Samples = np.array([[1, 2, 3],
[1, 2]])
c = np.hstack(Samples) # Will gives [1,2,3,1,2]
mean, std = np.mean(c), np.std(c)
newSamples = np.asarray([(np.array(xi)-mean)/std for xi in Samples])
print newSamples
# [array([-1.06904497, 0.26726124, 1.60356745]), array([-1.06904497, 0.26726124])]
edit: Add np.asarray(), put mean,std
computation outside loop following Imanol Luengo's excellent comments (Thanks!)
Post a Comment for "How To Use Numpy To Calculate Mean And Standard Deviation Of An Irregular Shaped Array"