Skip to content Skip to sidebar Skip to footer

Pandas - Fast Way Of Accessing A Column Of Objects' Attribute

Let's say I have a custom class in python, that has the attribute val. If I have a pandas dataframe with a column of these objects, how can I access this attribute and make a new c

Solution 1:

You could use a list comprehension:

df['custom_val'] = [foo.val for foo indf['custom_object']]

Timings

# Set-up 100k Foo objects.
vals = [np.random.randn() for_ in range(100000)]
foos = [Foo(val) for val in vals]
df = pd.DataFrame(foos, columns=['custom_object'])

# 1) OP's apply method.
%timeit df['custom_object'].apply(lambda x: x.val)
# 10 loops, best of 3: 26.7 ms per loop# 2) Using a list comprehension instead.
%timeit [foo.val for foo in df['custom_object']]
# 100 loops, best of 3: 11.7 ms per loop# 3) For reference with the original list of objects (slightly faster than 2) above).
%timeit [foo.val for foo in foos]
# 100 loops, best of 3: 9.79 ms per loop# 4) And just on the original list of raw values themselves.
%timeit [val for val in vals]
# 100 loops, best of 3: 4.91 ms per loop

If you had the original list of values, you could just assign them directly:

# 5) Direct assignment to list of values.
%timeit df['v'] = vals
# 100 loops, best of 3: 5.88 ms per loop

Solution 2:

Setup code:

import operator
import random
from dataclasses import dataclassimport numpy as np
import pandas as pd


@dataclassclassSomeObj:
    val: int


df = pd.DataFrame(data={f"col_1": [SomeObj(random.randint(0, 10000)) for _ inrange(10000000)]})

Solution 1

df['col_1'].map(lambda elem: elem.val)

Time: ~3.2 seconds

Solution 2

df['col_1'].map(operator.attrgetter('val'))

Time: ~2.7 seconds

Solution 3

[elem.val for elem in df['col_1']]

Time: ~1.4 seconds

Note: Keep in mind that this solution produces a different result type, which may be an issue in certain situations.


Post a Comment for "Pandas - Fast Way Of Accessing A Column Of Objects' Attribute"