Skip to content Skip to sidebar Skip to footer

Choose Which Columns To Concat In A Loop Of Files With Different Number Of Columns

I have a dictionary: #file1 mentions 2 columns while file2 mentions 3 dict2 = ({'file1' : ['colA', 'colB'],'file2' : ['colY','colS','colX'], etc..}) First of all how to make the d

Solution 1:

So you have to select columns names for concat, e.g first 3 columns selected by positions:

for k, v in dict1.items():
    df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(k, v)) #reads to a dfdf['new'] = df.iloc[:, :3].astype(str).apply(' '.join, axis=1)#concatenation

If create list of possible columns names use intersection:

for k, v in dict1.items():
    df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(k, v)) #reads to a df
    L = ['colA','colB','colS']
    cols = df.columns.intersection(L)
    df['new'] = df[cols].astype(str).apply(' '.join, axis=1)#concatenation

Or filtering:

for k, v in dict1.items():
    df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(k, v)) #reads to a df
    L = ['colA','colB','colS']
    mask = df.columns.isin(L)
    df['new'] = df.loc[:, mask].astype(str).apply(' '.join, axis=1)#concatenation

EDIT:

If want create another data structure with another list of necessary columns names, possible solution is create list of tuples:

L = [('file1', ['colA', 'colB'], ['colA','colB']), 
     ('file2', ['colY','colS','colX'], ['colY','colS'])]

for i, j, k in L:
    print (i)
    print (j)
    print (k)

file1
['colA', 'colB']
['colA', 'colB']
file2
['colY', 'colS', 'colX']
['colY', 'colS']

So your solution should be rewritten:

for i, j, k in L:
   df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(i, j)) #reads to a dfdf['new'] = df[k].astype(str).apply(' '.join, axis=1)#concatenation

Post a Comment for "Choose Which Columns To Concat In A Loop Of Files With Different Number Of Columns"