Skip to content Skip to sidebar Skip to footer

Read String Representation Of 2d Array From Csv Column Into A 2d Numpy Array

I have a pandas dataframe, for which one of the columns holds 2D numpy arrays corresponding to pixel data from grayscale images. These 2D numpy arrays have the shape (480, 640) or

Solution 1:

Construct a csv with array strings:

In [385]: arr = np.empty(1, object)                                             
In [386]: arr[0]=np.arange(12).reshape(3,4)                                     
In [387]: S = pd.Series(arr,name='x')                                           
In [388]: S                                                                     
Out[388]: 
0    [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Name: x, dtype: object
In [389]: S.to_csv('series.csv')                                                
/usr/local/bin/ipython3:1: FutureWarning: The signature of `Series.to_csv` was aligned to that of `DataFrame.to_csv`, and argument 'header' will change its default value fromFalse to True: please pass an explicit value to suppress this warning.
  #!/usr/bin/python3
In [390]: cat series.csv                                                        
0,"[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]"

load it:

In [391]: df = pd.read_csv('series.csv',header=None)                            
In [392]: df                                                                    
Out[392]: 
   0100[[ 0  1  2  3]\n [ 4  5  6  7]\n [ 8  9 10 11]]

In [394]: astr=df[1][0]                                                         
In [395]: astr                                                                  
Out[395]: '[[ 0  1  2  3]\n [ 4  5  6  7]\n [ 8  9 10 11]]'

parse the string representation of the array:

In[396]: astr.split('\n')                                                      
Out[396]: ['[[ 0  1  2  3]', ' [ 4  5  6  7]', ' [ 8  9 10 11]]']In[398]: astr.replace('[','').replace(']','').split('\n')                      
Out[398]: [' 0  1  2  3', '  4  5  6  7', '  8  9 10 11']In[399]: [i.split() for i in _]Out[399]: [['0', '1', '2', '3'], ['4', '5', '6', '7'], ['8', '9', '10', '11']]
In[400]: np.array(_, int)                                                      
Out[400]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

No guarantee that that's the prettiest cleanest parsing, but it gives an idea of the work you have to do. I'm reinventing the wheel, but searching for a duplicate was taking too long.

If possible try to avoid saving such a dataframe as csv. csv format is meant for a clean 2d table, simple consistent columns separated by a delimiter.

And for the most part avoid dataframes/series like this. A Series can have object dtype. And each object element can be complex, such as a list, dictionary, or array. But I don't think pandas has special functions to handle those cases.

numpy also has object dtypes (as my arr), but a list is often just as good, if not better. Constructing such an array can be tricky. Math on such an array is hit or miss. Iteration on an object array is slower than iteration on a list.

===

re might work as well. For example replacing whitespace with comma:

In [408]: re.sub('\s+',',',astr)                                                
Out[408]: '[[,0,1,2,3],[,4,5,6,7],[,8,9,10,11]]'

Still not quite right. There are leading commas that will choke eval.

Solution 2:

data = pd.read_csv('new_dataset.csv')

Method1: data.values

Method2: data.to_numpy()

If data.shape is 2D DataFrame, then the above two methods will give your 2D numpy array. Have a try!


Here is a demo:

df = pd.DataFrame(data={"A": [np.random.randn(480, 640), np.random.randn(490, 640)], "B": np.arange(5, 7)})

print(type(df.to_numpy()[0, 0]))  # <class 'numpy.ndarray'>print(df.to_numpy()[0, 0].shape)  # (480, 640)print(type(df.to_numpy()[1, 0]))  # <class 'numpy.ndarray'>print(df.to_numpy()[1, 0].shape)  # (490, 640)

I'm going to work in a while, you can try it first, and ask again if you have any questions.

Solution 3:

Add two columns to the data dataframe : the grayscale image to converted to bytes using np.tostring() and the original shape.

grayscale_images = []
grayscale_shapes = []

for index, row in data.iterrows():
  img_path = row['Image path']
  cv_image = cv2.imread(img_path)
  gray = grayscale(cv_image)
  grayscale_images.append(gray.tostring())
  grayscale_shapes.append(gray.shape)

Read the CSV, then recover the 2d np array using 'np.fromstring()` and reset the correct shape.

imagedata = np.fromstring(df.loc(...))   # index the image cellimagedata.shape = df.loc(...)            # index the corresponding shape

Post a Comment for "Read String Representation Of 2d Array From Csv Column Into A 2d Numpy Array"