Skip to content Skip to sidebar Skip to footer

Full Outer Join Of Two Or More Data Frames

Given the following three Pandas data frames, I need to merge them similar to an SQL full outer join. Note that the key is multi-index type_N and id_N with N = 1,2,3: import pandas

Solution 1:

I'll propose that you make life less complicated and not have different names for the things you want to merge on.

da = df_a.set_index(['type_1', 'id_1']).rename_axis(['type', 'id'])
db = df_b.set_index(['type_2', 'id_2']).rename_axis(['type', 'id'])
dc = df_c.set_index(['type_3', 'id_3']).rename_axis(['type', 'id'])

da.join(db, how='outer').join(dc, how='outer')

        name_1 name_2  name_3
typeid                      
0    3    Alex    NaN     NaN
     7     NaN  Bryce   White
1    4     Amy   Bill  School
     5   Allen  Brian     NaN
     5   Allen    Joe     NaN
     5    Jane  Brian     NaN
     5    Jane    Joe     NaN

Here's an obnoxious way to get those other columns

from cytoolz.dicttoolz import merge

i = pd.DataFrame(d.index.values.tolist(), d.index, d.index.names)
d = d.assign(**merge(
    i.mask(d[f'name_{j}'].isna()).add_suffix(f'_{j}').to_dict('l')
    for j in [1, 2, 3]
))

d[sorted(d.columns, key=lambda x: x.split('_')[::-1])]

        id_1 name_1  type_1 id_2 name_2  type_2 id_3  name_3  type_3
typeid033   Alex     0.0  NaN    NaN     NaN  NaN     NaN     NaN
     7   NaN    NaN     NaN    7  Bryce     0.07   White     0.0144    Amy     1.04   Bill     1.04  School     1.055  Allen     1.05  Brian     1.0  NaN     NaN     NaN
     55  Allen     1.05    Joe     1.0  NaN     NaN     NaN
     55   Jane     1.05  Brian     1.0  NaN     NaN     NaN
     55   Jane     1.05    Joe     1.0  NaN     NaN     NaN

Solution 2:

You could use 2 consecutive merges, first on df_a and df_b, and then on df_c:

In [49]: df_temp = df_a.merge(df_b, how='outer', left_on=['type_1', 'id_1'], right_on=['type_2', 'id_2'])

In [50]: df_temp.merge(df_c, how='outer', left_on=['type_2', 'id_2'], right_on=['type_3', 'id_3'])
Out[50]:
   type_1 id_1 name_1 type_2 id_2 name_2  type_3 id_3  name_3
00.03   Alex    NaN  NaN    NaN     NaN  NaN     NaN
11.04    Amy      14   Bill     1.04  School
21.05  Allen      15  Brian     NaN  NaN     NaN
31.05  Allen      15    Joe     NaN  NaN     NaN
41.05   Jane      15  Brian     NaN  NaN     NaN
51.05   Jane      15    Joe     NaN  NaN     NaN
6     NaN  NaN    NaN      07  Bryce     0.07   White

Solution 3:

Let us try create a new key for this , I am using reduce here

import functools
dfs=[df_a,df_b,df_c]
dfs=[x.assign(key=list(zip(x.iloc[:,0],x.iloc[:,1]))) for x in dfs]
merged_df = functools.reduce(lambda left,right: pd.merge(left,right,on='key',how='outer'), dfs)
merged_df.drop('key',1) 
Out[110]: 
   type_1 id_1 name_1  type_2 id_2 name_2  type_3 id_3  name_3
00.03   Alex     NaN  NaN    NaN     NaN  NaN     NaN
11.04    Amy     1.04   Bill     1.04  School
21.05  Allen     1.05  Brian     NaN  NaN     NaN
31.05  Allen     1.05    Joe     NaN  NaN     NaN
41.05   Jane     1.05  Brian     NaN  NaN     NaN
51.05   Jane     1.05    Joe     NaN  NaN     NaN
6     NaN  NaN    NaN     0.07  Bryce     0.07   White

Post a Comment for "Full Outer Join Of Two Or More Data Frames"