Pandas Melt 2 Groups Of Columns

June 22, 2024 Post a Comment

I have a table as id name sec1 sec2 sec3 sec4 sec5 sec6 1 abc mp(6) up(7) sp(8) cp(7) 2 4 and i want output as id name tests values slots 1 abc

Solution 1:

First we start with set_index and stack().

We then extract the test value and place it with its own column and reset the index:

s = df.set_index(['id','name'],append=True).stack().to_frame('tests')
s['values'] = s['tests'].str.extract('(\d+)')[0]
df1 = s.dropna(subset=['values']).reset_index([1,2,3])
df1['tests'] = df1['tests'].str.replace('\(.*\)','')

print(df1)
   id name level_3 tests values
0   1  abc    sec1    mp      6
0   1  abc    sec2    up      7
0   1  abc    sec3    sp      8
0   1  abc    sec4    cp      7

Okay, now we need to something very similar to a cross join in SQL, that is repeat the dataframe according the len of the slots and add each value accordingly.

Note, this is making a product of your dataframe, a better way would be to use merge but you've not made your input clear, as in how multiple rows will appear.

Let's try this with pd.concat making use of the keys argument:

vals = s.loc[s['values'].isna(),'tests'].values

final = pd.concat([df1 ]* len(vals),0,keys=vals)\
            .reset_index(0).rename(columns={'level_0' : 'slots'})

print(final[['id','name','tests','values','slots']])

   id name tests values  slots
0   1  abc    mp      6      2
0   1  abc    up      7      2
0   1  abc    sp      8      2
0   1  abc    cp      7      2
0   1  abc    mp      6      4
0   1  abc    up      7      4
0   1  abc    sp      8      4
0   1  abc    cp      7      4

Avoiding a total product using merge:

vals = s.loc[s['values'].isna(), 'tests'].reset_index([2,3],drop=True)\
                   .to_frame('slots').reset_index(1)


print(pd.merge(df1,vals,on=['id']))


   id name level_3 tests values slots
0   1  abc    sec1    mp      6     2
1   1  abc    sec1    mp      6     4
2   1  abc    sec2    up      7     2
3   1  abc    sec2    up      7     4
4   1  abc    sec3    sp      8     2
5   1  abc    sec3    sp      8     4
6   1  abc    sec4    cp      7     2
7   1  abc    sec4    cp      7     4

Solution 2:

One way about it is to melt twice, then extract the text and numbers into separate columns:

(
    df.melt(["id", "name", "sec5", "sec6"], var_name="var1", value_name="tests")
      .melt(["id", "name", "var1", "tests"], value_name="slots")
      .assign(values=lambda df: df.tests.str.extract(r"(\d)"),
             tests=lambda df: df.tests.str.extract(r"([a-z]+)"))
      .loc[:, ["id", "name", "tests", "values", "slots"]]
)

   id name tests values  slots
01  abc    mp      6211  abc    up      7221  abc    sp      8231  abc    cp      7241  abc    mp      6451  abc    up      7461  abc    sp      8471  abc    cp      74

Python Tutorial for Beginners

Pandas Melt 2 Groups Of Columns

Solution 1:

Solution 2:

Post a Comment for "Pandas Melt 2 Groups Of Columns"