Search code examples
pythonpython-3.xlisttextslice

Python | Reformatting each line in a text file consistently


I have made my own corpus of misspelled words.

misspellings_corpus.txt:

English, enlist->Enlish
Hallowe'en, Halloween->Hallowean

I'm having an issue with my format. Thankfully, it is at least consistent.

Current format:

correct, wrong1, wrong2->wrong3

Desired format:

wrong1,wrong2,wrong3->correct
  • The order of wrong<N> isn't of concern,
  • There might be any number of wrong<N> words per line (separated by a comma: ,),
  • There's only 1 correct word per line (which should be to the right of ->).

Failed Attempt:

with open('misspellings_corpus.txt') as oldfile, open('new.txt', 'w') as newfile:
    for line in oldfile:
      correct = line.split(', ')[0].strip()
      print(correct)
      W = line.split(', ')[1].strip()
      print(W)
      wrong_1 = W.split('->')[0] # however, there might be loads of wrong words
      wrong_2 = W.split('->')[1]
      newfile.write(wrong_1 + ', ' + wrong_2 + '->' + correct)

Output new.txt (isn't working):

enlist, Enlish->EnglishHalloween, Hallowean->Hallowe'en

Solution: (Inspired by @alexis)

with open('misspellings_corpus.txt') as oldfile, open('new.txt', 'w') as newfile:
  for line in oldfile:
    #line = 'correct, wrong1, wrong2->wrong3'
    line = line.strip()
    terms = re.split(r", *|->", line)
    newfile.write(",".join(terms[1:]) + "->" + terms[0] + '\n')

Output new.txt:

enlist,Enlish->English
Halloween,Hallowean->Hallowe'en

Solution

  • Let's assume all the commas are word separators. I'll break each line on commas and arrows, for convenience:

    import re
    
    line = 'correct, wrong1, wrong2->wrong3'
    terms = re.split(r", *|->", line)
    new_line = ", ".join(terms[1:]) + "->" + terms[0]
    print(new_line)
    

    You can put that back in a file-reading loop, right?