Skip to content Skip to sidebar Skip to footer

Pandas: Sort Innermost Column Group-wise Based On Other Multilevel Column Excluding One Row

This is an extension to my previous question: Consider below df: In [68]: df = pd.DataFrame({'A': ['a'] * 11, ...: 'B': ['b'] * 11, ...: 'C':

Solution 1:

Solution with helper columns used for sorting - first convert values to numeric columns by Series.str.get and to_numeric and then create another column compared max value per groups for boolean:

lvls = list(x.index.names[:-1])
print (lvls)
['B', 'C']

x[('tmp', 'tmp')] = pd.to_numeric(x[('E','a')].str.get('value'), errors='coerce')

x[('max','tmp')] = x.groupby(lvls)[[('tmp','tmp')]].transform('max') == x[[('tmp','tmp')]]

All values in ascending parameter are True, default value:

x1 = x.sort_values(lvls + [('max','tmp'), ('tmp','tmp')])
print (x1)
                                            E   tmp    max
A                                           a   tmp    tmp
B C  D                                                    
b C1 D1    {'value': '4', 'percentage': None}   4.0False
     D2      {'value': 5, 'percentage': None}   5.0False
     D3      {'value': 9, 'percentage': None}   9.0True
  C2 D3     {'value': 11, 'percentage': None}  11.0False
     D1     {'value': 12, 'percentage': None}  12.0False
     D4                                    {}   NaN  False
     D2     {'value': 33, 'percentage': None}  33.0True
  C3 D1     {'value': 12, 'percentage': None}  12.0False
     D3   {'value': '12', 'percentage': None}  12.0False
     D2  {'value': 'N/A', 'percentage': None}   NaN  False
     D4     {'value': 24, 'percentage': None}  24.0True

Here is changed last True to False:

x2 = x.sort_values(lvls + [('max','tmp'), ('tmp','tmp')],
                   ascending=[True] * len(lvls) + [True, False])
print (x2)

                                            E   tmp    max
A                                           a   tmp    tmp
B C  D                                                    
b C1 D2      {'value': 5, 'percentage': None}   5.0False
     D1    {'value': '4', 'percentage': None}   4.0False
     D3      {'value': 9, 'percentage': None}   9.0True
  C2 D1     {'value': 12, 'percentage': None}  12.0False
     D3     {'value': 11, 'percentage': None}  11.0False
     D4                                    {}   NaN  False
     D2     {'value': 33, 'percentage': None}  33.0True
  C3 D1     {'value': 12, 'percentage': None}  12.0False
     D3   {'value': '12', 'percentage': None}  12.0False
     D2  {'value': 'N/A', 'percentage': None}   NaN  False
     D4     {'value': 24, 'percentage': None}  24.0True

Last remove helper columns:

x1 = x1.drop([('max','tmp'), ('tmp','tmp')], axis=1)
x2 = x2.drop([('max','tmp'), ('tmp','tmp')], axis=1)

Solution 2:

You can define a function that groups the multilevel column E, a on levels B and C and returns the indices that would sort the dataframe as specified by the rule where the row with the total value of all other rows is kept at last:

def sort_idx(s):
    idx = []
    for k, g in s.groupby(level=[0, 1], sort=False):
        i = g.idxmax()
        idx += [*g.drop(i).sort_values().index , i]
    return idx

s = pd.to_numeric(x[('E', 'a')].str['value'], errors='coerce')
x = x.loc[sort_idx(s)]

Result:

                                            E
A                                           a
B C  D                                       
b C1 D1    {'value': '4', 'percentage': None}
     D2      {'value': 5, 'percentage': None}
     D3      {'value': 9, 'percentage': None}
  C2 D3     {'value': 11, 'percentage': None}
     D1     {'value': 12, 'percentage': None}
     D4                                    {}
     D2     {'value': 33, 'percentage': None}
  C3 D1     {'value': 12, 'percentage': None}
     D3   {'value': '12', 'percentage': None}
     D2  {'value': 'N/A', 'percentage': None}
     D4     {'value': 24, 'percentage': None}

Post a Comment for "Pandas: Sort Innermost Column Group-wise Based On Other Multilevel Column Excluding One Row"