Read String Representation Of 2d Array From Csv Column Into A 2d Numpy Array
Solution 1:
Construct a csv with array strings:
In [385]: arr = np.empty(1, object)
In [386]: arr[0]=np.arange(12).reshape(3,4)
In [387]: S = pd.Series(arr,name='x')
In [388]: S
Out[388]:
0 [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Name: x, dtype: object
In [389]: S.to_csv('series.csv')
/usr/local/bin/ipython3:1: FutureWarning: The signature of `Series.to_csv` was aligned to that of `DataFrame.to_csv`, and argument 'header' will change its default value fromFalse to True: please pass an explicit value to suppress this warning.
#!/usr/bin/python3
In [390]: cat series.csv
0,"[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]"
load it:
In [391]: df = pd.read_csv('series.csv',header=None)
In [392]: df
Out[392]:
0100[[ 0 1 2 3]\n [ 4 5 6 7]\n [ 8 9 10 11]]
In [394]: astr=df[1][0]
In [395]: astr
Out[395]: '[[ 0 1 2 3]\n [ 4 5 6 7]\n [ 8 9 10 11]]'
parse the string representation of the array:
In[396]: astr.split('\n')
Out[396]: ['[[ 0 1 2 3]', ' [ 4 5 6 7]', ' [ 8 9 10 11]]']In[398]: astr.replace('[','').replace(']','').split('\n')
Out[398]: [' 0 1 2 3', ' 4 5 6 7', ' 8 9 10 11']In[399]: [i.split() for i in _]Out[399]: [['0', '1', '2', '3'], ['4', '5', '6', '7'], ['8', '9', '10', '11']]
In[400]: np.array(_, int)
Out[400]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
No guarantee that that's the prettiest cleanest parsing, but it gives an idea of the work you have to do. I'm reinventing the wheel, but searching for a duplicate was taking too long.
If possible try to avoid saving such a dataframe as csv. csv format is meant for a clean 2d table, simple consistent columns separated by a delimiter.
And for the most part avoid dataframes/series like this. A Series can have object dtype. And each object element can be complex, such as a list, dictionary, or array. But I don't think pandas
has special functions to handle those cases.
numpy
also has object dtypes (as my arr
), but a list is often just as good, if not better. Constructing such an array can be tricky. Math on such an array is hit or miss. Iteration on an object array is slower than iteration on a list.
===
re
might work as well. For example replacing whitespace with comma:
In [408]: re.sub('\s+',',',astr)
Out[408]: '[[,0,1,2,3],[,4,5,6,7],[,8,9,10,11]]'
Still not quite right. There are leading commas that will choke eval
.
Solution 2:
data = pd.read_csv('new_dataset.csv')
Method1: data.values
Method2: data.to_numpy()
If data.shape is 2D DataFrame, then the above two methods will give your 2D numpy array. Have a try!
Here is a demo:
df = pd.DataFrame(data={"A": [np.random.randn(480, 640), np.random.randn(490, 640)], "B": np.arange(5, 7)})
print(type(df.to_numpy()[0, 0])) # <class 'numpy.ndarray'>print(df.to_numpy()[0, 0].shape) # (480, 640)print(type(df.to_numpy()[1, 0])) # <class 'numpy.ndarray'>print(df.to_numpy()[1, 0].shape) # (490, 640)
I'm going to work in a while, you can try it first, and ask again if you have any questions.
Solution 3:
Add two columns to the data
dataframe : the grayscale image to converted to bytes using np.tostring()
and the original shape.
grayscale_images = []
grayscale_shapes = []
for index, row in data.iterrows():
img_path = row['Image path']
cv_image = cv2.imread(img_path)
gray = grayscale(cv_image)
grayscale_images.append(gray.tostring())
grayscale_shapes.append(gray.shape)
Read the CSV, then recover the 2d np array using 'np.fromstring()` and reset the correct shape.
imagedata = np.fromstring(df.loc(...)) # index the image cellimagedata.shape = df.loc(...) # index the corresponding shape
Post a Comment for "Read String Representation Of 2d Array From Csv Column Into A 2d Numpy Array"