Merge Dataframes That Have Indices That One Contains Another (but Not The Same)
For example df1 has shape (533, 2176), indices such as Elkford (5901003) DM 01010, df2 has shape (743, 12), indices such as 5901003; the number in the bracket of indices of df1 wil
Solution 1:
file1.csv:
,col_1,col_2
5901001,a,-1
5901002,b,-2
5901003,c,-3
5901004,d,-4
5901005,e,-5
5901006,f,-6
5901007,g,-7
5901008,h,-8
5901009,i,-9
5901010,k,-10
Here df1.shape = (10, 2)
.
file2.csv:
,col_3
Elkford (Part 1) (5901003) DM 01010,1
Ahia (5901004) DM 01010,2
Canada (01) 20000,4
Fork (5901005) DM 01010,3
England (34) 20000,4
Here df2.shape = (3, 1)
.
Run this script:
import re
import pandas as pd
import numpy as np
def extract_id(s):
m = re.search('\((\d{7})\)', s)
if m:
return int(m.group(1))
df1 = pd.read_csv('file1.csv', index_col=0)
df2 = pd.read_csv('file2.csv', index_col=0)
indexes = df2.index.map(extract_id)
mask = ~np.isnan(indexes)
# filter incorrect row (without id)
df2 = df2[mask]
# convert index
df2.index = indexes[mask]
df = pd.concat([df1, df2], axis=1)
print(df)
Output:
col_1 col_2 col_3
5901001 a -1 NaN
5901002 b -2 NaN
5901003 c -3 1.0
5901004 d -4 2.0
5901005 e -5 3.0
5901006 f -6 NaN
5901007 g -7 NaN
5901008 h -8 NaN
5901009 i -9 NaN
5901010 k -10 NaN
Here df.shape = (10, 2 + 1)
Post a Comment for "Merge Dataframes That Have Indices That One Contains Another (but Not The Same)"