I managed to replace the things i wanted with a ;
but now i struggle to remove the whitespace and newlines to get all the data until ;
on a single line and then start the next.
Code:
replacements = {'Geboren am':';', 'Nato/a il':';', 'Né(e) le':';'}
with open('DATEN2.txt') as infile, open('DATENBEARBEITET2.txt', 'w') as outfile:
for line in infile:
for src, target in replacements.iteritems():
line = line.replace(src, target)
outfile.write(line)
What the input file looks like: (after the replacement)
Kommissionen und Delegationen
06.12.1999 - 30.11.2003
Begnadigungskommission (BeK-V)
;
What it should look like:
Kommissionen und Delegationen, 06.12.1999 - 30.11.2003, Begnadigungskommission (BeK-V);
After a long time of searching I came to ask here if someone knows the correct repository or command to use for this kind of task, i'm really struggling to go to the next step.
Edit:/ Also, what was newlines before should turn into a comma, see sample output
I assume you want to eliminate the extra whitespace - eliminating all of it would result in KommissionenundDelegationen,06...
. You can do that with strip()
and join()
:
replacements = {'Geboren am':';', 'Nato/a il':';', 'Né(e) le':';'}
lines = []
with open('DATEN2.txt') as infile, open('DATENBEARBEITET2.txt', 'w') as outfile:
for line in infile:
line = line.strip()
if not line:
continue
for src, target in replacements.iteritems():
line = line.replace(src, target)
lines.append(line)
outfile.write(', '.join(lines))
This creates a list
consisting of lines that have more than whitespace, with each line stripped of whitespace and with the appropriate replacements made. The list
is then joined with a delimiter of ', '
.