I have two CSV files which are file1.csv, and file2.csv. Both have multiple columns as follows:
a. file1.csv
username,user id,access hash,name,group,group id
SreyTey1998,963229606,7854138709318981862,Smaradey Chan,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Srey_Tey_1,2079816779,6921382059939144796,Srey tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
sreytey123,5316691604,668712126044928206,Phat SreyTey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Sreytey168,5455045488,-714912998136226691,Vong Soksreytey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
SreyTey99,5653783510,-2575791274366210473,Oun Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
sreytey1919,5819100400,3174041461521242292,Tey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Sreytey6666,6001252515,1586106578669001327,Srey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
SreyTey7777,6026179841,5596849859821333867,Srey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Ahh_Nak86,5637888996,-1267155033181296023,Yìì Ng,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
b. file2.csv
username,user id,access hash,name,group,group id
SreyTey1998,963229606,7854138709318981862,Smaradey Chan,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Srey_Tey_1,2079816779,6921382059939144796,Srey tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
sreytey123,5316691604,668712126044928206,Phat SreyTey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Sreytey168,5455045488,-714912998136226691,Vong Soksreytey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
SreyTey99,5653783510,-2575791274366210473,Oun Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
sreytey1919,5819100400,3174041461521242292,Tey Tey,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
AhhLyn1213,808888756,2482753619838480608,Ly-លី🌈â¤ï¸,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahhly09,938983724,-8302570306911018211,方塔莉,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahh_vong,873218908,1743989214734522713,Mek Sreyvong,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahhnitaccd,5420585351,-6331445989210603589,NITA CCD,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
c. The output file as file2-nodups.csv, should be:
username,user id,access hash,name,group,group id
AhhLyn1213,808888756,2482753619838480608,Ly-លី🌈â¤ï¸,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahhly09,938983724,-8302570306911018211,方塔莉,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahh_vong,873218908,1743989214734522713,Mek Sreyvong,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
ahhnitaccd,5420585351,-6331445989210603589,NITA CCD,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
I have tried the following codes:
with open('file1.csv', 'r', encoding="utf8") as t1:
fileone = t1.readlines()
with open('file2.csv', 'r', encoding="utf8") as t2:
filetwo = t2.readlines()
# scans through the two files and writes differences to new csv
with open('file2-nodups.csv', 'w', encoding="utf8") as outFile:
for line in filetwo:
if line not in fileone:
outFile.write(line)
The above does not work - because the output file (file2-nodups.csv) has the same content as file2.csv
Very appreciate any advice.
I would like to confirm that the above codes are Working for the following data:
file1.csv:
username,user id,access hash,name,group,group id
asgie2,19933,29kd982hi4hh6h443,
47uuha,491920,kdsagku5kkajgjag,
james_sing,4002899,4asg37yragdh300asgdlk,
joe_naro,4989222,hgjhe84jkaglagjj,
48700245,hlvkiiwej8njnnrk320kc,
file2.csv:
username,user id,access hash,name,group,group id
misschue,87340a,hgeikka83llagea,
james_sing,4002899,4asg37yragdh300asgdlk,
michell22,4883140,cn2ukkfhiigakgd3yhg,
Output file:
username,user id,access hash,name,group,group id
misschue,87340a,hgeikka83llagea,
michell22,4883140,cn2ukkfhiigakgd3yhg,
But they dont work for data with more columns above. Very appreciate any advice.
UPDATE: Python does not read strange characters correctly; that is why all suggested codes dont work. Those characters appear in the fourth column, for example,
ž‰ ឆាំងឣីុ 💗11 ""H1""",Zisy Ly បោះដុំនឹងលក់រាយកម្មងáŸ
After replacing them with English or numbers or space characters, the codes work as expected.
The best solution to this is compare one column (second) which is unique to every line, and if its different, output those lines in file2.csv to the third file.
Here are my files:
a. file1.csv
username,user id,access hash,name,group,group id
Manithbun,892557579,-8406919653747059917,Manith Bun.,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,1128046905,-541590019988886083,M.r rithy phone shope ♥ï¸â™¥ï¸â™¥ï¸,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
PET_KIMHEANG,1209296259,-3836506661281845754,áž–áŸáž គឹមហ៊ាង៚,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,1215736274,-3314364090375867636,"ម៉ាញ ឆាំងឣីុ 💗11 ""H1""",Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,1296059051,289341865810496004,កូន ប៉ាប៉ា ក្រុមឡានបែនសំបូស្នáŸáž áŸ,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
oun_Rorng,1330317717,-1815409784220647876,Mss Rorng,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,1373051922,4933577293541799006,Soroth Phe,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Veasnalove9999,1721055020,-7406583441966710751,Veasna Love,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
b. file2.csv
username,user id,access hash,name,group,group id
Manithbun,892557579,-8406919653747059917,Manith Bun.,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,1128046905,-541590019988886083,M.r rithy phone shope ♥ï¸â™¥ï¸â™¥ï¸,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
PET_KIMHEANG,1209296259,-3836506661281845754,áž–áŸáž គឹមហ៊ាង៚,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,1215736274,-3314364090375867636,"ម៉ាញ ឆាំងឣីុ 💗11 ""H1""",Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,1296059051,289341865810496004,កូន ប៉ាប៉ា ក្រុមឡានបែនសំបូស្នáŸáž áŸ,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
oun_Rorng,1330317717,-1815409784220647876,Mss Rorng,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,1373051922,4933577293541799006,Soroth Phe,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Veasnalove9999,1721055020,-7406583441966710751,Veasna Love,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
Thithseda,5185517950,-6420126539020087321,Thith Seda,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,5708325040,7960869157828131294,Tong Meang,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
takoma_05,396559485,4653963679316835279,V P,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
c. nodups.csv (the lines in file2.csv which are not found in file1.csv)
Thithseda,5185517950,-6420126539020087321,Thith Seda,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
,5708325040,7960869157828131294,Tong Meang,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
takoma_05,396559485,4653963679316835279,V P,Zisy Ly បោះដុំនឹងលក់រាយកម្មង់ពីរោងចក្រផ្ទាល់📥លáŸážážœáŸážšáž›áž»áž™0967699965,1806798461
I have tried the following codes, but nothing is written to the output file:
import csv
reader1 = csv.reader(open("file1.csv", 'r', encoding="utf8"))
row1 = next(reader1)
reader2 = csv.reader(open("file2.csv", 'r', encoding="utf8"))
row2 = next (reader2)
with open("nodups.csv", 'w', encoding="utf8") as f:
for row in reader2:
if (row1[1] != row2[1]):
f.write(row)
Please advise how to make that work. Very appreciate any help.
RESOLVED: Compare two CSV files based on second column (user id) and print out the differences found in second file to third file:
with open('file1.csv', 'r', encoding="utf8") as check_file:
check_set = set([row.split(',')[1] for row in check_file])
with open('file2.csv', 'r', encoding="utf8") as filetwo, open('file3.csv', 'w', encoding="utf8") as out_file:
for line in filetwo:
if line.split(',')[1] not in check_set:
out_file.write(line)
Or use the followings with headers:
with open('file1.csv', 'r', encoding="utf8") as f1:
fileone = set([row.split(',')[1] for row in f1]) # Retrieve data in 2nd column for all rows in first CSV file
with open('file2.csv', 'r', encoding="utf8") as filetwo, open('nodups.csv', 'w', encoding="utf8") as out_file:
out_file.write('username,user id,access hash,name,group,group id' + '\n')
for line in filetwo: # Scan each line in second CSV file
if line.split(',')[1] not in fileone: # If the data of that line is not found in first file, write it into third file
out_file.write(line)