Skip to content Skip to sidebar Skip to footer

How To Use Numpy To Calculate Mean And Standard Deviation Of An Irregular Shaped Array

I have a numpy array that has many samples in it of varying length Samples = np.array([[1001, 1002, 1003], ... , [1001, 1002]]) I want to (

Solution 1:

Don't make ragged arrays. Just don't. Numpy can't do much with them, and any code you might make for them will always be unreliable and slow because numpy doesn't work that way. It turns them into object dtypes:

Sample
array([[1, 2, 3], [1, 2]], dtype=object)

Which almost no numpy functions work on. In this case those objects are list objects, which makes your code even more confusing as you either have to switch between list and ndarray methods, or stick to list-safe numpy methods. This a recipe for disaster as anyone noodling around with the code later (even yourself if you forget) will be dancing in a minefield.

There's two things you can do with your data to make things work better:

First method is to index and flatten.

i = np.cumsum(np.array([len(x) for x in Sample]))
flat_sample = np.hstack(Sample)

This preserves the index of the end of each sample in i, while keeping the sample as a 1D array

The other method is to pad one dimension with np.nan and use nan-safe functions

m = np.array([len(x) for x in Sample]).max()
nan_sample = np.array([x + [np.nan] * (m - len(x)) for x in Sample])

So to do your calculations, you can use flat_sample and do similar to above:

new_flat_sample = (flat_sample - np.mean(flat_sample)) / np.std(flat_sample) 

and use i to recreate your original array (or list of arrays, which I recommend:, see np.split).

new_list_sample = np.split(new_flat_sample, i[:-1])

[array([-1.06904497,  0.26726124,  1.60356745]),
 array([-1.06904497,  0.26726124])]

Or use nan_sample, but you will need to replace np.mean and np.std with np.nanmean and np.nanstd

new_nan_sample = (nan_sample - np.nanmean(nan_sample)) / np.nanstd(nan_sample)

array([[-1.06904497,  0.26726124,  1.60356745],
       [-1.06904497,  0.26726124,         nan]])

Solution 2:

@MichaelHackman (following the comment remark). That's weird because when I compute the overall std and mean then apply it, I obtain different result (see code below).

import numpy as np

Samples = np.array([[1, 2, 3],
                   [1, 2]])
c = np.hstack(Samples)  # Will gives [1,2,3,1,2]
mean, std = np.mean(c), np.std(c)
newSamples = np.asarray([(np.array(xi)-mean)/std for xi in Samples])
print newSamples
# [array([-1.06904497,  0.26726124,  1.60356745]), array([-1.06904497,  0.26726124])]

edit: Add np.asarray(), put mean,std computation outside loop following Imanol Luengo's excellent comments (Thanks!)

Post a Comment for "How To Use Numpy To Calculate Mean And Standard Deviation Of An Irregular Shaped Array"