Skip to content Skip to sidebar Skip to footer

Pandas Merge (pd.merge) How To Set The Index And Join

I have two pandas dataframes: dfLeft and dfRight with the date as the index. dfLeft: cusip factorL date 2012-01-03 XXXX 4.5 2012-01-03 YYYY 6.2 ..

Solution 1:

Reset the indices and then merge on multiple (column-)keys:

dfLeft.reset_index(inplace=True)
dfRight.reset_index(inplace=True)
dfMerged = pd.merge(dfLeft, dfRight,
              left_on=['date', 'cusip'],
              right_on=['date', 'idc__id'],
              how='inner')

You can then reset 'date' as an index:

dfMerged.set_index('date', inplace=True)

Here's an example:

raw1 = '''
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
'''

raw2 = '''
2012-01-03    XYXX      45.
2012-01-03    YYYY      62.
2012-01-04    XXXX      -47.
2012-01-05    YYYY      61.
'''import pandas as pd
from StringIO import StringIO


df1 = pd.read_table(StringIO(raw1), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)
df2 = pd.read_table(StringIO(raw2), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)

df1.columns = ['date', 'cusip', 'factorL']
df2.columns = ['date', 'idc__id', 'factorL']

print pd.merge(df1, df2,
         left_on=['date', 'cusip'],
         right_on=['date', 'idc__id'],
         how='inner')

which gives

datecusipfactorL_xidc__idfactorL_y02012-01-03 00:00:00  YYYY6.2YYYY6212012-01-04 00:00:00  XXXX4.7XXXX-47

Solution 2:

You could append 'cuspin' and 'idc_id' as a indices to your DataFrames before you join (here's how it would work on the first couple of rows):

In [10]: dfL
Out[10]: 
           cuspin  factorL
date2012-01-03   XXXX      4.52012-01-03   YYYY      6.2In [11]: dfL1 = dfLeft.set_index('cuspin', append=True)

In [12]: dfR1 = dfRight.set_index('idc_id', append=True)

In [13]: dfL1
Out[13]: 
                   factorL
date       cuspin         
2012-01-03 XXXX        4.5
           YYYY        6.2In [14]: dfL1.join(dfR1)
Out[14]: 
                   factorL  factorR
date       cuspin                  
2012-01-03 XXXX        4.55
           YYYY        6.26

Post a Comment for "Pandas Merge (pd.merge) How To Set The Index And Join"