I'm trying to parse items from a text file with lines of text separated by semicolons like this:
4037;HKO_2005;OBJECT-ORIENTED PROGRAMMING ;18.12.2011;5
4037;HKO_2009;DATABASES I ;2.5.2011;5
4037;HKO_2011;ALGORITHMS I ;7.5.2011;5
4037;HKO_2038;PROGRAMMING BASICS IN JAVA ;22.5.2010;5
to a list of lists like this:
['4037', 'HKO_2005', 'OBJECT-ORIENTED PROGRAMMING', '18.12.2011', '5'],
['4037', 'HKO_2009', 'DATABASES I', '2.5.2011', '5'],
['4037', 'HKO_2011', 'ALGORITHMS I', '7.5.2011', '5'],
['4037', 'HKO_2038', 'PROGRAMMING BASICS IN JAVA', '22.5.2010', '5']
Right now the code I'm using for testing looks like this:
class Main:
def inputFile(self):
with open('data.txt', 'r') as data:
self.stuff = data.readlines()
self.separate = [elem.strip().split(';') for elem in self.stuff]
print(self.separate)
justdoit = Main()
justdoit.inputFile()
My problem is what you already saw: the text file didn't look to have double newlines until I pasted it here. Using my code the readlines()-method creates empty lists in between with the newlines like this:
['4037', 'HKO_2005', 'OBJECT-ORIENTED PROGRAMMING ', '18.12.2011', '5'],
[''],
['4037', 'HKO_2009', 'DATABASES I ', '2.5.2011', '5'],
[''],
['4037', 'HKO_2011', 'ALGORITHMS I ', '7.5.2011', '5'],
[''],
['4037', 'HKO_2038', 'PROGRAMMING BASICS IN JAVA ', '22.5.2010', '5']
['']
I believe I can later strip the blanks from the course names with rstrip(), but the newlines are giving me a headache. Earlier I was getting an IndexError because of this and I had no idea the text file had double newlines. How can I effectively ignore or remove these extra newlines before the lists are created?
You can add a condition to the list comprehension:
self.separate = [elem.strip().split(';') for elem in self.stuff if elem.strip()]