Skip to content Skip to sidebar Skip to footer

When Should I Use Dt.column Vs Dt['column'] Pandas?

I was doing some calculations and row manipulations and realised that for some tasks such as mathematical operations they both worked e.g. d['c3'] = d.c1 / d. c2 d['c3'] = d['c1']

Solution 1:

You should really just stop accessing columns as attributes and get into the habit of accessing using square brackets []. This avoids errors where your column names have illegal characters in python, embedded spaces, where your column name shares the same name as a built-in method, and ambiguous usage where for instance you have a column named index:

In[13]:
df = pd.DataFrame(np.random.randn(5,4), columns=[' a', 'mean', 'index', '2'])
df.columns.tolist()

Out[13]: [' a', 'mean', 'index', '2']

So if we now try to access column 2:

In[14]:
df.2
  File "<ipython-input-14-0490d6ae2ca0>", line 1
    df.2
       ^
SyntaxError: invalid syntax

It fails as it's an invalid name but df['2'] would work

In[15]:

df.a
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-b9872a8755ac> in <module>()
----> 1 df.a

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   3079if name inself._info_axis:
   3080returnself[name]
-> 3081return object.__getattribute__(self, name)
   30823083     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'a'

So because this is really ' a' with a leading space (this would also fail if there were spaces anywhere in the column name) it fails on KeyError

In[16]:
df.mean

Out[16]: 
<bound method DataFrame.mean of           a      mean     index         2
0 -0.022122  1.858308  1.823314  0.238105
1 -0.461662  0.482116  1.848322  1.946922
2  0.615889 -0.285043  0.201804 -0.656065
3  0.159351 -1.151883 -1.858024  0.088460
4  1.066735  1.015585  0.586550 -1.898469>

This is more subtle, it looks like it did something but in fact it just returns the method address, here ipython is just pretty printing it

In[17]:
df.index

Out[17]: RangeIndex(start=0, stop=5, step=1)

Above we have ambiguous intentions, because the index is a member it's returned that instead of the column 'index'.

So you should stop accessing columns as attributes and always use square brackets as it avoids all the problems above

Post a Comment for "When Should I Use Dt.column Vs Dt['column'] Pandas?"