I'm trying to extract all instances of the number that follows the string PAX:
. The string that indicates PAX is preceded by a string that starts with RCT
.
In the data below, I would be trying to extract 2
.
Data originally as follows:
" T44-39 "
"RCT# 26798 PAX: 2"
"STORE# 6 TERMINAL# 3 ONLINE"
Code of first attempt was as follows:
with open("e-journal.txt","r") as rf:
with open("e-journal_py output.txt","w") as wf:
for line in rf:
line = line.strip()
if line.startswith('"RCT#'):
pax = line.split()
pax2 = pax[3]
print (pax2)
However, each line started and ended with "
, so I attempted to replace "
by revising the code.
After using the replace
function, print returns the following:
T44-39 \nRCT# 26798 PAX: 2\nSTORE# 6 TERMINAL# 3 ONLINE\n
Second attempt at code is as follows:
with open("e-journal.txt","r") as rf:
with open("e-journal_py output.txt","w") as wf:
data = rf.read()
data = data.replace('"','')
with open(data) as data:
for line in data:
line = line.strip()
if line.startswith("RCT"):
pax = line.split()
pax2 = pax[1]
The revised code removes "
at the beginning and end of each line, but also returns content of the entire text file. In other words, the startswith
function does not return the number of PAX
. How do I revise the code to return the number that follows the string PAX
?
Also, given there is no code to print, I'm not sure what prompted the cost to return the entire data set
Your first attempt was the most sensible. It already returned 2"
, so all you had to do was to remove the trailing "
.
You can use the rstrip
string method to do that. Simply change
pax2 = pax[3]
to
pax2 = pax[3].rstrip('"')
or if you want to treat it as an integer, instead of a string, add int()
around it:
pax2 = int(pax[3].rstrip('"'))