Skip to content Skip to sidebar Skip to footer

How Can I Add Summary Rows To A Pandas DataFrame Calculated On Multiple Columns By Agg Functions Like Mean, Median, Etc

I have some data with multiple observations for a given Collector, Date, Sample, and Type where the observation values vary by ID. import StringIO import pandas as pd data = '''Co

Solution 1:

You could use aggfunc=[np.mean, np.median] to compute both the means and the medians. Then you could use margins=True to also obtain the means and medians for each column and for each row.

result = df.pivot_table(index=["Collector", "Date", "Sample", "Type"], 
    columns="ID", values="Value", margins=True, 
    aggfunc=[np.mean, np.median]).stack(level=0)

yields

ID                                          A      B      C     D      All
Collector Date       Sample Type                                          
Emily     2014-06-20 201    HV   mean    34.0  22.00  10.00  5.00  17.7500
                                 median  34.0  22.00  10.00  5.00  16.0000
          2014-06-23 203    HV   mean    33.0  35.00  13.00  1.00  20.5000
                                 median  33.0  35.00  13.00  1.00  23.0000
John      2014-06-22 221    HV   mean    40.0  39.00  11.00  2.00  23.0000
                                 median  40.0  39.00  11.00  2.00  25.0000
          2014-07-01 218    HV   mean    35.0  29.00  13.00  1.00  19.5000
                                 median  35.0  29.00  13.00  1.00  21.0000
All                              mean    35.5  31.25  11.75  2.25  20.1875
                                 median  34.5  32.00  12.00  1.50  17.5000

Yes, result contains more data than you asked for, but

result.loc['All']

has the additional values:

ID                          A      B      C     D      All
Date Sample Type                                          
                 mean    35.5  31.25  11.75  2.25  20.1875
                 median  34.5  32.00  12.00  1.50  17.5000

Or, you could further subselect result to get just the rows you are looking for:

result.index.names = [u'Collector', u'Date', u'Sample', u'Type', u'aggfunc']
mask = result.index.get_level_values('aggfunc') == 'mean'
mask[-1] = True
result = result.loc[mask]
print(result)

yields

ID                                           A      B      C     D      All
Collector Date       Sample Type aggfunc                                   
Emily     2014-06-20 201    HV   mean     34.0  22.00  10.00  5.00  17.7500
          2014-06-23 203    HV   mean     33.0  35.00  13.00  1.00  20.5000
John      2014-06-22 221    HV   mean     40.0  39.00  11.00  2.00  23.0000
          2014-07-01 218    HV   mean     35.0  29.00  13.00  1.00  19.5000
All                              mean     35.5  31.25  11.75  2.25  20.1875
                                 median   34.5  32.00  12.00  1.50  17.5000

Solution 2:

This might not be super clean, but you could assign to the new entries with .loc.

In [131]: table_mean = table.mean()

In [132]: table_median = table.median()

In [134]: table.loc['Mean', :] = table_mean.values

In [135]: table.loc['Median', :] = table_median.values

In [136]: table
Out[136]: 
ID                                   A      B      C     D
Collector Date       Sample Type                          
Emily     2014-06-20 201    HV    34.0  22.00  10.00  5.00
          2014-06-23 203    HV    33.0  35.00  13.00  1.00
John      2014-06-22 221    HV    40.0  39.00  11.00  2.00
          2014-07-01 218    HV    35.0  29.00  13.00  1.00
Mean                              35.5  31.25  11.75  2.25
Median                            34.5  32.00  12.00  1.50

Post a Comment for "How Can I Add Summary Rows To A Pandas DataFrame Calculated On Multiple Columns By Agg Functions Like Mean, Median, Etc"