I have a plain text file with the following contents:
@M00964: XXXXX
YYY
+
ZZZZ
@M00964: XXXXX
YYY
+
ZZZZ
@M00964: XXXXX
YYY
+
ZZZZ
and I would like to read this into a list split into items according to the ID code @M00964
, i.e. :
['@M00964: XXXXX
YYY
+
ZZZZ'
'@M00964: XXXXX
YYY
+
ZZZZ'
'@M00964: XXXXX
YYY
+
ZZZZ']
I have tried using
in_file = open(fileName,"r")
sequences = in_file.read().split('@M00964')[1:]
in_file.close()
but this removes the ID sequence @M00964
. Is there any way to keep this ID sequence in?
As an additional question is there any way of maintaining white space in a list (rather than have /n symbols).
My overall aim is to read in this set of items, take the first 2, for example, and write them back to a text file maintaining all of the original formatting.
Specific to your example, can't you just do something as follows:
in_file = open(fileName, 'r')
file = in_file.readlines()
new_list = [''.join(file[i*4:(i+1)*4]) for i in range(int(len(file)/4))]
list_no_n = [item.replace('\n','') for item in new_list]
print new_list
print list_no_n
[EXPANDED FORM]
new_list = []
for i in range(int(len(file)/4)): #Iterates through 1/4 of the length of the file lines.
#This is because we will be dealing in groups of 4 lines
new_list.append(''.join(file[i*4:(i+1)*4])) #Joins four lines together into a string and adds it to the new_list
[Writing to new file]
write_list = ''.join(new_list).split('\n')
output_file = open(filename, 'w')
output_file.writelines(write_list)