When Should I Use Dt.column Vs Dt['column'] Pandas?
Solution 1:
You should really just stop accessing columns as attributes and get into the habit of accessing using square brackets []
. This avoids errors where your column names have illegal characters in python, embedded spaces, where your column name shares the same name as a built-in method, and ambiguous usage where for instance you have a column named index
:
In[13]:
df = pd.DataFrame(np.random.randn(5,4), columns=[' a', 'mean', 'index', '2'])
df.columns.tolist()
Out[13]: [' a', 'mean', 'index', '2']
So if we now try to access column 2
:
In[14]:
df.2
File "<ipython-input-14-0490d6ae2ca0>", line 1
df.2
^
SyntaxError: invalid syntax
It fails as it's an invalid name but df['2']
would work
In[15]:
df.a
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-b9872a8755ac> in <module>()
----> 1 df.a
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
3079if name inself._info_axis:
3080returnself[name]
-> 3081return object.__getattribute__(self, name)
30823083 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'a'
So because this is really ' a'
with a leading space (this would also fail if there were spaces anywhere in the column name) it fails on KeyError
In[16]:
df.mean
Out[16]:
<bound method DataFrame.mean of a mean index 2
0 -0.022122 1.858308 1.823314 0.238105
1 -0.461662 0.482116 1.848322 1.946922
2 0.615889 -0.285043 0.201804 -0.656065
3 0.159351 -1.151883 -1.858024 0.088460
4 1.066735 1.015585 0.586550 -1.898469>
This is more subtle, it looks like it did something but in fact it just returns the method address, here ipython is just pretty printing it
In[17]:
df.index
Out[17]: RangeIndex(start=0, stop=5, step=1)
Above we have ambiguous intentions, because the index is a member it's returned that instead of the column 'index'
.
So you should stop accessing columns as attributes and always use square brackets as it avoids all the problems above
Post a Comment for "When Should I Use Dt.column Vs Dt['column'] Pandas?"