Full Outer Join Of Two Or More Data Frames
Given the following three Pandas data frames, I need to merge them similar to an SQL full outer join. Note that the key is multi-index type_N and id_N with N = 1,2,3: import pandas
Solution 1:
I'll propose that you make life less complicated and not have different names for the things you want to merge on.
da = df_a.set_index(['type_1', 'id_1']).rename_axis(['type', 'id'])
db = df_b.set_index(['type_2', 'id_2']).rename_axis(['type', 'id'])
dc = df_c.set_index(['type_3', 'id_3']).rename_axis(['type', 'id'])
da.join(db, how='outer').join(dc, how='outer')
name_1 name_2 name_3
typeid
0 3 Alex NaN NaN
7 NaN Bryce White
1 4 Amy Bill School
5 Allen Brian NaN
5 Allen Joe NaN
5 Jane Brian NaN
5 Jane Joe NaN
Here's an obnoxious way to get those other columns
from cytoolz.dicttoolz import merge
i = pd.DataFrame(d.index.values.tolist(), d.index, d.index.names)
d = d.assign(**merge(
i.mask(d[f'name_{j}'].isna()).add_suffix(f'_{j}').to_dict('l')
for j in [1, 2, 3]
))
d[sorted(d.columns, key=lambda x: x.split('_')[::-1])]
id_1 name_1 type_1 id_2 name_2 type_2 id_3 name_3 type_3
typeid033 Alex 0.0 NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN 7 Bryce 0.07 White 0.0144 Amy 1.04 Bill 1.04 School 1.055 Allen 1.05 Brian 1.0 NaN NaN NaN
55 Allen 1.05 Joe 1.0 NaN NaN NaN
55 Jane 1.05 Brian 1.0 NaN NaN NaN
55 Jane 1.05 Joe 1.0 NaN NaN NaN
Solution 2:
You could use 2 consecutive merges, first on df_a
and df_b
, and then on df_c
:
In [49]: df_temp = df_a.merge(df_b, how='outer', left_on=['type_1', 'id_1'], right_on=['type_2', 'id_2'])
In [50]: df_temp.merge(df_c, how='outer', left_on=['type_2', 'id_2'], right_on=['type_3', 'id_3'])
Out[50]:
type_1 id_1 name_1 type_2 id_2 name_2 type_3 id_3 name_3
00.03 Alex NaN NaN NaN NaN NaN NaN
11.04 Amy 14 Bill 1.04 School
21.05 Allen 15 Brian NaN NaN NaN
31.05 Allen 15 Joe NaN NaN NaN
41.05 Jane 15 Brian NaN NaN NaN
51.05 Jane 15 Joe NaN NaN NaN
6 NaN NaN NaN 07 Bryce 0.07 White
Solution 3:
Let us try create a new key for this , I am using reduce
here
import functools
dfs=[df_a,df_b,df_c]
dfs=[x.assign(key=list(zip(x.iloc[:,0],x.iloc[:,1]))) for x in dfs]
merged_df = functools.reduce(lambda left,right: pd.merge(left,right,on='key',how='outer'), dfs)
merged_df.drop('key',1)
Out[110]:
type_1 id_1 name_1 type_2 id_2 name_2 type_3 id_3 name_3
00.03 Alex NaN NaN NaN NaN NaN NaN
11.04 Amy 1.04 Bill 1.04 School
21.05 Allen 1.05 Brian NaN NaN NaN
31.05 Allen 1.05 Joe NaN NaN NaN
41.05 Jane 1.05 Brian NaN NaN NaN
51.05 Jane 1.05 Joe NaN NaN NaN
6 NaN NaN NaN 0.07 Bryce 0.07 White
Post a Comment for "Full Outer Join Of Two Or More Data Frames"