Search code examples
pythonspacingpunctuation

Spacing and pattern replacement


It is two part question:

Part 1

To remove multiple white spaces, paragraph breaks to just one.

current code:

import re
# Read inputfile
with open('input.txt', 'r') as file :
  inputfile = file.read()

# Replace extras spaces with single space.
#outputfile = re.sub('\s+', ' ', inputfile).strip()
outputfile = ' '.join(inputfile.split(None))

# Write outputfile
with open('output.txt', 'w') as file:
  file.write(outputfile)

Part 2:

Once the extra spaces are removed; I search and replace pattern mistakes.

Like: ' [ ' to ' ['

Pattern1 = re.sub(' [ ', ' [', inputfile)

which throws an error:

raise error, v # invalid expression error: unexpected end of regular expression

Although. This works...(for example: to join words together before and after hyphen)

Pattern1 = re.sub(' - ', '-', inputfile)

I got many situations to handle with respect to punctuation problem after spacing issue is solved.

I don't want patterns to look into the output of previous pattern results and move further.

Is there a better approach to cut spaces around punctuation to just right.


Solution

  • For the first part, you can split it by newline blocks, compress each line, and then join it back on newlines, like so:

    import re
    text = "\n".join(re.sub(r"\s+", " ", line) for line in re.split("\n+", text))
    print(text)
    

    For the second part, you need to escape [ since it's a regex metacharacter (used to define character classes), like so:

    import re
    text = re.sub("\[ ", "[", text)
    text = re.sub(" ]", "]", text)
    print(text)
    

    Note that you don't need to escape the ] because it doesn't match a [ so it isn't special in this context.

    Try It Online!

    Alternatively for the second part, text = text.replace("[ ", "[").replace(" ]", "]") because you don't even need regular expressions.