Search code examples
pythonstringreplacesubstringpattern-matching

Alter data with pattern matching for python strings


I'm doing my bachelor thesis and I need to visualize data with python / jupyter. Now I'm facing the problem that the .txt file I got is nearly executable in python but not completely.

What I'm doing now is just reading this whole file as a string and after that I manipulate is such that it is executable with exec().

Yes I now, that's critical but since I am working on a virtual machine and I now that this data is not malicious I decided to do it like this.

The remaining problem is that I get lists like this:

sample = [[0 0 0 0 0]
          [0 0 0 0 0]
          [0 0 0 0 0]
          [0 0 1 0 0]]

and I need to add a comma to get a list like:

sample = [[0, 0, 0, 0, 0],
          [0, 0, 0, 0, 0],
          [0, 0, 0, 0, 0],
          [0, 0, 1, 0, 0]]

I found re.sub() which seems helpful but I'm wondering how. The 'lists' are now stored as a string.

I would be grateful for any advice :)

Hints for the correct function to use or maybe the use of this function would be very helpful.


Solution

  • Assuming the file contents are in a string called file_contents, you can use

    file_contents = re.sub(r'](?!=])', '],', file_contents)
    

    to add a comma after each ] not followed by another ]. Similarly,

    file_contents = re.sub(r'(?<=\d) (?=\d)', ', ', file_contents)
    

    will add commas to the spaces surrounded by digits.