Merge Dataframes That Have Indices That One Contains Another (but Not The Same)

September 07, 2022 Post a Comment

For example df1 has shape (533, 2176), indices such as Elkford (5901003) DM 01010, df2 has shape (743, 12), indices such as 5901003; the number in the bracket of indices of df1 wil

Solution 1:

file1.csv:

,col_1,col_2
5901001,a,-1
5901002,b,-2
5901003,c,-3
5901004,d,-4
5901005,e,-5
5901006,f,-6
5901007,g,-7
5901008,h,-8
5901009,i,-9
5901010,k,-10

Here df1.shape = (10, 2).

file2.csv:

,col_3
Elkford (Part 1) (5901003) DM 01010,1
Ahia (5901004) DM 01010,2
Canada (01)   20000,4
Fork (5901005) DM 01010,3
England (34)   20000,4

Here df2.shape = (3, 1).

Run this script:

Baca Juga

import re

import pandas as pd
import numpy as np


def extract_id(s):
    m = re.search('\((\d{7})\)', s)
    if m:
        return int(m.group(1))


df1 = pd.read_csv('file1.csv', index_col=0)
df2 = pd.read_csv('file2.csv', index_col=0)


indexes = df2.index.map(extract_id)
mask = ~np.isnan(indexes)
# filter incorrect row (without id)
df2 = df2[mask]
# convert index
df2.index = indexes[mask]

df = pd.concat([df1, df2], axis=1)

print(df)

Output:

        col_1  col_2  col_3
5901001     a     -1    NaN
5901002     b     -2    NaN
5901003     c     -3    1.0
5901004     d     -4    2.0
5901005     e     -5    3.0
5901006     f     -6    NaN
5901007     g     -7    NaN
5901008     h     -8    NaN
5901009     i     -9    NaN
5901010     k    -10    NaN

Here df.shape = (10, 2 + 1)

Python Tutorial for Beginners

Merge Dataframes That Have Indices That One Contains Another (but Not The Same)

Solution 1:

Post a Comment for "Merge Dataframes That Have Indices That One Contains Another (but Not The Same)"