I have a txt file as follows,
#onetwothree.txt
>one
QWERTYUIOP
>two
ASDFGHJKL
>three
ZXCVBNM
...
and I want to split that txt file into several files as follows,
#one.txt
>one
QWERTYUIOP
and
#two.txt
>two
ASDFGHJKL
and
#three.txt
>three
ZXCVBNM
here is the code I worte,
import re
with open("onetwothree.txt") as file:
name=re.findall(r'\>[^\n]+',file.read())
sequence=re.findall(r'name[ind][^/n]+' for ind in enumerate(name), file.read())
.
.
.
I know that there is something wrong in following part.
sequence=re.findall(r'name[ind][^/n]+' for ind in enumerate(name), file.read())
I want to make a list using re.findall
,enumerate
and following list is what I want to get.
>>>print (seq)
["QWERTYUIOP","ASDFGHJKL","ZXCVBNM"]
how can I fix this codesequence=re.findall(r'name[ind][^/n]+' for ind in enumerate(name), file.read())
right?
First of all, you can't read a file twice using read()
, second time you call it, it returns an empty string.
Also, i think you got the wrong understanding of re.findall
. It takes only 2 parameters (regex,string).
You can accomplish the task in one go, without calling findall
twice.
s = '''>one
QWERTYUIOP
>two
ASDFGHJKL
>three
ZXCVBNM
''' # replace this with file.read()
res = re.findall(">([^\n]+)\n(\w+)",s) #each regex in paren constitutes a group
print(res)
#[('one ', 'QWERTYUIOP'), ('two', 'ASDFGHJKL'), ('three', 'ZXCVBNM')]