Search code examples
pythonstartswith

python startswith function returns content of entire text file


I'm trying to extract all instances of the number that follows the string PAX:. The string that indicates PAX is preceded by a string that starts with RCT.

In the data below, I would be trying to extract 2.

Data originally as follows:

"                                 T44-39                                 "
"RCT# 26798                                                       PAX: 2"
"STORE# 6                    TERMINAL# 3                         ONLINE"

Code of first attempt was as follows:

with open("e-journal.txt","r") as rf:
    with open("e-journal_py output.txt","w") as wf:
        for line in rf:
            line = line.strip()
            if line.startswith('"RCT#'):
                pax = line.split()
                pax2 = pax[3]
                print (pax2)

However, each line started and ended with ", so I attempted to replace " by revising the code.

After using the replace function, print returns the following:

T44-39                                 \nRCT# 26798                                                       PAX: 2\nSTORE# 6                    TERMINAL# 3                         ONLINE\n                        

Second attempt at code is as follows:

with open("e-journal.txt","r") as rf:
    with open("e-journal_py output.txt","w") as wf:
        data = rf.read()
        data = data.replace('"','')
        with open(data) as data:
            for line in data:
                line = line.strip()
                if line.startswith("RCT"):
                    pax = line.split()
                    pax2 = pax[1]

The revised code removes " at the beginning and end of each line, but also returns content of the entire text file. In other words, the startswith function does not return the number of PAX. How do I revise the code to return the number that follows the string PAX?

Also, given there is no code to print, I'm not sure what prompted the cost to return the entire data set


Solution

  • Your first attempt was the most sensible. It already returned 2", so all you had to do was to remove the trailing ".

    You can use the rstrip string method to do that. Simply change

    pax2 = pax[3]
    

    to

    pax2 = pax[3].rstrip('"')
    

    or if you want to treat it as an integer, instead of a string, add int() around it:

    pax2 = int(pax[3].rstrip('"'))