Iterate Through Df Rows Faster
Solution 1:
It's hard to say exactly what your trying to do. However, if you're looping through rows chances are that there is a better way to do it.
For example, given a csv file that looks like this..
Event_Start_Time,TPRev,Subtest
4/12/19 06:00,"this. string. has dots.. in it.",{'A_Dict':'maybe?'}
6/10/19 04:27,"another stri.ng wi.th d.ots.",{'A_Dict':'aVal'}
You may want to:
- Format
Event_Start_Time
as datetime. - Get the week number from
Event_Start_Time
. - Remove all the dots (.) from the strings in column
TPRev
. - Expand a dictionary contained in
Subtest
to its own column.
Without looping through the rows, consider doing thing by columns. Like doing it to the first 'cell' of the column and it replicates all the way down.
Code:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Event_Start_Time TPRev Subtest
04/12/1906:00 this. string. has dots.. in it. {'A_Dict':'maybe?'}
16/10/1904:27 another stri.ng wi.th d.ots. {'A_Dict':'aVal'}
# format 'Event_Start_Time' as as datetime
df['Event_Start_Time'] = pd.to_datetime(df['Event_Start_Time'], format='%d/%m/%y %H:%M')
# get the week number from 'Event_Start_Time'
df['Week_Number'] = df['Event_Start_Time'].dt.isocalendar().week
# replace all '.' (periods) in the 'TPRev' column
df['TPRev'] = df['TPRev'].str.replace('.', '', regex=False)
# get a dictionary string out of column 'Subtest' and put into a new column
df = pd.concat([df.drop(['Subtest'], axis=1), df['Subtest'].map(eval).apply(pd.Series)], axis=1)
print(df)
Event_Start_Time TPRev Week_Number A_Dict
02019-12-0406:00:00 this string has dots in it 49 maybe?
12019-10-0604:27:00 another string with dots 40 aVal
print(df.info())
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Event_Start_Time 2 non-null datetime64[ns]
1 TPRev 2 non-nullobject2 Week_Number 2 non-null UInt32
3 A_Dict 2 non-nullobject
dtypes: UInt32(1), datetime64[ns](1), object(2)
So you end up with a dataframe like this...
Event_Start_Time TPRev Week_Number A_Dict
02019-12-0406:00:00thisstring has dots in it 49 maybe?
12019-10-0604:27:00 another stringwith dots 40 aVa
Obviously you'll probably want to do other things. Look at your data. Make a list of what you want to do to each column or what new columns you need. Don't mention how right now as chances are it's possible and has been done before - you just need to find the existing method.
You may write down get the difference in days from the current row and the row beneath etc.). Finally search out how to do the formatting or calculation you require. Break the problem down.
Post a Comment for "Iterate Through Df Rows Faster"