Search code examples
pythonregular-language

Regular Expressions statement for incoming file


I'm really new to both Python and regular expressions, but I am having to use them for my internship. I am reading in a file and using Regular Expressions to pick out the important stuff from the file.

In particular I am having trouble with one line. In the file it looks like this:

  TOWNHOME_PTS_COST                price_per_household_lin_this_x

I want to have the second string in the line, but the mix of letters and underscores is making it hard for me to code it. I have tried running the following, but it doesn't seem to be working. Additionally, I have been using the .group(1) function to pull the contents from other files, but for this line it does not seem to be working, it just returns a blank; I'm pretty sure these two issues are related.

import re

myString ="          TOWNHOME_PTS_COST         price_per_household_lin_this_x"
mapName = re.match(r"[\s]*TOWNHOME_PTS_COST[\s]*([a-z]||_)*", myString)
if(mapName):
    print("Found It!")
    print(mapName.group(0))
else:
    print("Not working")

Output:

Found It!
TOWNHOME_PTS_COST                price

I would like to have the entire second string price_per_household_lin_this_x, I have also tried doubling up on the ([a-z]||_)* and placing more * inside that statement, but they all return the same thing for the second string price. Thanks for your help!


Solution

  • Code :

    import re
    
    myString ="          TOWNHOME_PTS_COST         price_per_household_lin_this_x"
    mapName = re.match(r"[\s]*TOWNHOME_PTS_COST[\s]*([a-z_])+", myString)#Notice the ([a-z_])+ 
    if(mapName):
        print("Found It!")
        print(mapName.group(0))
    else:
        print("Not working")
    

    Output :

    Found It!
              TOWNHOME_PTS_COST         price_per_household_lin_this_x