I am new to Python (using Python 3.6). I have a read.txt file containing information about a firm. The file starts with different report characteristics
CONFORMED PERIOD REPORT: 20120928 #this is 1 line
DATE OF REPORT: 20121128 #this is another line
and then starts all the text about the firm..... #lots of lines here
I am trying to extract both dates (['20120928','20121128']) as well as some strings that are in the text (i.e. if the string exists, then I want a '1'). Ultimately, I want a vector giving me both dates + the 1s and 0s of different strings, that is, something like: ['20120928','20121128','1','0']. My code is the following:
exemptions = [] #vector I want
with open('read.txt', 'r') as f:
line2 = f.read() # read the txt file
for line in f:
if "CONFORMED PERIOD REPORT" in line:
exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", "")) # add line without stating CONFORMED PERIOD REPORT, just with the date)
elif "DATE OF REPORT" in line:
exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above
var1 = re.findall("string1", line2, re.I) # find string1 in line2, case-insensitive
if len(var1) > 0: # if the string appears, it will have length>0
exemptions.append('1')
else:
exemptions.append('0')
var2 = re.findall("string2", line2, re.I)
if len(var2) > 0:
exemptions.append('1')
else:
exemptions.append('0')
print(exemptions)
If I run this code, I obtain ['1','0'], omitting the dates and giving correct reads of the file, var1 exists (ok '1') and var2 does not (ok '0'). What I don't understand is why it doesn't report the dates. Importantly, when I change line2 to "line2=f.readline()", then I obtain ['20120928','20121128','0','0']. Ok with the dates now, but I know that var1 exists, it seems it doesn't read the rest of the file? If I omit "line2=f.read()", it spits out a vector of 0s for each line, except for my desired output. How can I omit these 0s?
My desired output would be: ['20120928','20121128','1','0']
Sorry for bothering. Thank you anyway!
The way I went through it was finally the following:
exemptions = [] #vector I want
with open('read.txt', 'r') as f:
line2 = "" # create an empty string variable out of the "for line" loop
for line in f:
line2 = line2 + line #append each line to the above created empty string
if "CONFORMED PERIOD REPORT" in line:
exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", "")) # add line without stating CONFORMED PERIOD REPORT, just with the date)
elif "DATE OF REPORT" in line:
exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above
var1 = re.findall("string1", line2, re.I) # find string1 in line2, case-insensitive
if len(var1) > 0: # if the string appears, it will have length>0
exemptions.append('1')
else:
exemptions.append('0')
var2 = re.findall("string2", line2, re.I)
if len(var2) > 0:
exemptions.append('1')
else:
exemptions.append('0')
print(exemptions)
So far this is what I got. It worked for me, although I guess working with beautifulsoup would increase the efficiency of the code. Next step :)