Search code examples
pythonnewlinedata-cleaning

Removing whitespace and newlines in txt file with python


I managed to replace the things i wanted with a ; but now i struggle to remove the whitespace and newlines to get all the data until ; on a single line and then start the next.

Code:

replacements = {'Geboren am':';', 'Nato/a il':';', 'Né(e) le':';'}

with open('DATEN2.txt') as infile, open('DATENBEARBEITET2.txt', 'w') as outfile:
for line in infile:
    for src, target in replacements.iteritems():
        line = line.replace(src, target)
outfile.write(line)

What the input file looks like: (after the replacement)

       Kommissionen und Delegationen




                        06.12.1999 - 30.11.2003 




                    Begnadigungskommission (BeK-V)     



               ;

What it should look like:

Kommissionen und Delegationen, 06.12.1999 - 30.11.2003, Begnadigungskommission (BeK-V);

After a long time of searching I came to ask here if someone knows the correct repository or command to use for this kind of task, i'm really struggling to go to the next step.

Edit:/ Also, what was newlines before should turn into a comma, see sample output


Solution

  • I assume you want to eliminate the extra whitespace - eliminating all of it would result in KommissionenundDelegationen,06.... You can do that with strip() and join():

    replacements = {'Geboren am':';', 'Nato/a il':';', 'Né(e) le':';'}
    
    lines = []
    with open('DATEN2.txt') as infile, open('DATENBEARBEITET2.txt', 'w') as outfile:
        for line in infile:
            line = line.strip()
            if not line:
                continue
            for src, target in replacements.iteritems():
                line = line.replace(src, target)
            lines.append(line)
        outfile.write(', '.join(lines))
    

    This creates a list consisting of lines that have more than whitespace, with each line stripped of whitespace and with the appropriate replacements made. The list is then joined with a delimiter of ', '.