I have one csv test1.csv (I do not have headers in it!!!). I also have as you can see delimiter with pipe but also with exactly one tab after the eight column.
ug|s|b|city|bg|1|94|ON-05-0216 9.72|28|288
ug|s|b|city|bg|1|94|ON-05-0217 9.72|28|288
I have second file test2.csv with only delimiter pipe
ON-05-0216|100|50
ON-05-0180|244|152
ON-05-0219|269|146
So because only one value (ON-05-0216
) is being matched from the eight column from the first file and first column from the second file it means that I should have only one value in output file, but with addition of SUM column from the second and third column from second file (100+50).
So the final result is the following:
ug|s|b|city|bg|1|94|ON-05-0216 Total=150|9.72|28|288
or
ug|s|b|city|bg|1|94|ON-05-0216|Total=150 9.72|28|288
whatever is easier.
I though that the best way to use is with pandas. But I stuck with taking multiple delimiters from the first file and how to match columns without column names, so not sure how to continue further.
import pandas as pd
a = pd.read_csv("test1.csv", header=None)
b = pd.read_csv("test2.csv", header=None)
merged = a.merge(b,)
merged.to_csv("output.csv", index=False)
Thank you in advance
Use:
# Reading files
df1 = pd.read_csv('file1.csv', header=None, sep='|')
df2 = pd.read_csv('file2.csv', header=None, sep='|')
# splitting file on tab and concatenating with rest
ndf = pd.concat([df1.iloc[:,:7], df1[7].str.split('\t', expand=True), df1.iloc[:,8:]], axis=1)
ndf.columns = np.arange(11)
# adding values from df2 and bringing in format Total=sum
df2.columns = ['c1', 'c2', 'c3']
tot = df2.eval('c2+c3').apply(lambda x: 'Total='+str(x))
# Finding which rows needs to be retained
idx_1 = ndf.iloc[:,7].str.split('-',expand=True).iloc[:,2]
idx_2 = df2.c1.str.split('-',expand=True).iloc[:,2]
idx = idx_1.isin(idx_2) # Updated
ndf = ndf[idx].reset_index(drop=True)
tot = tot[idx].reset_index(drop=True)
# concatenating both CSV together and writing output csv
ndf.iloc[:,7] = ndf.iloc[:,7].map(str) + chr(9) + tot
pd.concat([ndf.iloc[:,:8],ndf.iloc[:,8:]], axis=1).to_csv('out.csv', sep='|', header=None, index=None)
# OUTPUT
# ug|s|b|city|bg|1|94|ON-05-0216 Total=150|9.72|28|288