Choose Which Columns To Concat In A Loop Of Files With Different Number Of Columns
I have a dictionary: #file1 mentions 2 columns while file2 mentions 3 dict2 = ({'file1' : ['colA', 'colB'],'file2' : ['colY','colS','colX'], etc..}) First of all how to make the d
Solution 1:
So you have to select columns names for concat, e.g first 3 columns selected by positions:
for k, v in dict1.items():
df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(k, v)) #reads to a dfdf['new'] = df.iloc[:, :3].astype(str).apply(' '.join, axis=1)#concatenation
If create list of possible columns names use intersection
:
for k, v in dict1.items():
df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(k, v)) #reads to a df
L = ['colA','colB','colS']
cols = df.columns.intersection(L)
df['new'] = df[cols].astype(str).apply(' '.join, axis=1)#concatenation
Or filtering:
for k, v in dict1.items():
df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(k, v)) #reads to a df
L = ['colA','colB','colS']
mask = df.columns.isin(L)
df['new'] = df.loc[:, mask].astype(str).apply(' '.join, axis=1)#concatenation
EDIT:
If want create another data structure with another list of necessary columns names, possible solution is create list of tuples:
L = [('file1', ['colA', 'colB'], ['colA','colB']),
('file2', ['colY','colS','colX'], ['colY','colS'])]
for i, j, k in L:
print (i)
print (j)
print (k)
file1
['colA', 'colB']
['colA', 'colB']
file2
['colY', 'colS', 'colX']
['colY', 'colS']
So your solution should be rewritten:
for i, j, k in L:
df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(i, j)) #reads to a dfdf['new'] = df[k].astype(str).apply(' '.join, axis=1)#concatenation
Post a Comment for "Choose Which Columns To Concat In A Loop Of Files With Different Number Of Columns"