Search code examples
pythoncsvdata-cleaning

clean csv file with python



I have a csv file that i'm trying to clean up with python.
it has lines separated by << \n >> or empty lines.
I would like each line that does not end with << " >> to be cut/pasted to the previous line.
here is a concrete example to be more explicit!\

CSV FILE I HAVE

*"id","name","age","city","remark"\
"1","kevin","27","paris","This is too bad"\
"8","angel","18","london","Incredible !!!"\
"14","maria","33","madrid","i can't believe it."\
"16","john","28","new york","hey men,\
\nhow do you did this"\
"22","naima","35","istanbul","i'm sure it's false,\
\
\nit can't be real"
"35","marco","26","roma","you'r my hero!"\
"39","lili","37","tokyo","all you need to knows.\
\n\nthe best way to upgrade easely"\
...*

CSV FILE I WOULD LIKE TO HAVE

*"id","name","age","city","remark"\
"1","kevin","27","paris","This is too bad"\
"8","angel","18","london","Incredible !!!"\
"14","maria","33","madrid","i can't believe it."\
"16","john","28","new york","hey men,how do you did this"\
"22","naima","35","istanbul","i'm sure it's false, it can't be real"\
"35","marco","26","roma","you'r my hero!"\
"39","lili","37","tokyo","all you need to knows. the best way to upgrade easely"\
...*

someone would be how to do?
thank you in advance for your help !

i'm actually try this python code -->

text = open("input.csv", "r", encoding='utf-8') 
  
text = ''.join([i for i in text])  
  
text = text.replace("\\n", "")
 
x = open("output.csv","w") 
  
x.writelines(text) 
x.close()

Solution

  • input.csv file content:

    "id","name","age","city","remark"
    "1","kevin","27","paris","This is too bad"
    "8","angel","18","london","Incredible !!!"
    "14","maria","33","madrid","i can't believe it."
    "16","john","28","new york","hey men,
    how do you did this"
    "22","naima","35","istanbul","i'm sure it's false,
    
    nit can't be real"
    "35","marco","26","roma","you'r my hero!"
    "39","lili","37","tokyo","all you need to knows.
    
    the best way to upgrade easely"
    

    Possible (quick and simple) solution is the following:

    with open('input.csv', 'r', encoding='utf-8') as file:
        data = file.read()
        
    clean_data = data.replace('"\n"', '"||"').replace("\n", "").replace('"||"', '"\n"')
        
    with open('output.csv', 'w', encoding='utf-8') as file:
        file.write(clean_data)
    

    Returns output.csv content:

    "id","name","age","city","remark"
    "1","kevin","27","paris","This is too bad"
    "8","angel","18","london","Incredible !!!"
    "14","maria","33","madrid","i can't believe it."
    "16","john","28","new york","hey men,how do you did this"
    "22","naima","35","istanbul","i'm sure it's false,nit can't be real"
    "35","marco","26","roma","you'r my hero!"
    "39","lili","37","tokyo","all you need to knows.the best way to upgrade easely"