Pandas Dataframe Apply Function To Multiple Columns And Output Multiple Columns
Solution 1:
Write your transform_func the following way:
- it should have one parameter - the current row,
- this function can read individual columns from the current row and make any use of them,
- the returned object should be a Series with:
- values - whatever you want to return,
- index - target column names.
Example: Assuming that all 3 columns are of string type, concatenate A and B columns, add "some string" to C:
deftransform_func(row):
a = row.A; b = row.B; c = row.C;
return pd.Series([ a + b, c + '_xx'], index=['new_A', 'new_B'])
To get only the new values, apply this function to each row:
df.apply(transform_func, axis=1)
Note that the resulting DataFrame retains keys of the original rows (we will make use of this feature in a moment).
Or if you want to add these new columns to your DataFrame, join your df with the result of the above application, saving the join result under the original df:
df = df.join(df.apply(transform_func, axis=1))
Edit following the comment as of 03:36:34Z
Using zip is probably the slowest option. Row-based function should be quicker and it is a more intuitive construction. Probably the quickest way is to write 2 vectorized expressions, for each column separately. In this case something like:
df['new_A'] = df.A + df.B
df['new_B'] = df.C + '_xx'
But generally the problem is whether a row-based function can be expressed as vectorized expressions (as I did above). In the "negative" case you can apply a row-based function.
To compare how quick is each solution, use %timeit.
Solution 2:
The question seems somewhat related to this question. I referenced the comment made by @spen.smith on this answer in coming up with this.
df = pd.DataFrame([[1,2,3], [2,3,4], [3,5,7]], columns = ['A', 'B', 'C'])
print(df)
A B C
012312342357
Rather than modifying the return of the function, just create it as usual
defadd_subtract(args):
arg1, arg2 = args
ret1 = arg1 + arg2
ret2 = arg1 - arg2
return ret1, ret2
Examine the output of using apply
. The option result_type='expand'
returns the result as a dataframe instead of as a series of tuples.
print(df[['B', 'C']].apply(add_subtract, axis=1, result_type='expand'))
0105 -117 -1212 -2
We can then assign the columns of the apply
output to two new series by transposing followed by accessing the values. Transposing is necessary because the default behavior of calling values
treats each row as a list, whereas we want each column as a list. So the final expression is:
df['D'], df['E'] = df[['B', 'C']].apply(add_subtract, axis=1, result_type='expand').transpose().values
print(df)
A B C D E
01235 -112347 -1235712 -2
Post a Comment for "Pandas Dataframe Apply Function To Multiple Columns And Output Multiple Columns"