Search code examples
pythonregexoutputbioinformaticstxt

How do I get my dictionary to print out more values from my regex loop instead of printing only the last one?


Im still a very beginner coder and i have been using python to learn about regexes and output them into .txt files

This is what i have so far

{python bizarre_protein,echo=T,eval=T}

bizarre_protein = "MTLWARPSSKRGWYWHIRSSSHEEEGYFVWEEPSTLAVSFLYCWHIPSWHATSWHIRSSSRVADEGWRAPSPLYW"
import re
pattern = re.compile("[W][A-Z][A-Z][P|R|N][S]{1}")

for m in re.finditer(pattern, bizarre_protein):
  print(m.start(),m.end(),m.group(0))
      
#start with pattern find W then add 2 A-Z, P|R|N and the S

some_protein = {"motif_start": [m.start(), m.start(), m.start(), m.start(), m.start()], "motif_sequence":[m.group(0), m.group(0), m.group(0), m.group(0), m.group(0)]}
text_lines = [ ]
text_line = "index\t"

for column in some_protein.keys():
  text_line = text_line + column + "\t"
  print(text_line)
text_lines.append(text_line)

for i in range(0,len(some_protein[column])):
  text_line= str(i) + "\t"
  for column in some_protein.keys():
    text_line += str(some_protein[column][i])
    text_line += "\t"
    print(text_line)
  text_lines.append(text_line)

out_handle = open("bizarre_protein.txt","w")
for line in text_lines:
  line = line.rstrip("\t")
  print(line)
  line = line + "\n"
  ignoreme = out_handle.write(line)

ignoreme = out_handle.close()

This is the result I get and it does output into the txt file I created but I need it to output all the rows (3, WARPS - 66, WRAPS) and not just the last one, I tried quite a few things but none of them have worked. how do I get it to list all of the rows instead of just the last one, thanks in advance

3 8 WARPS
14 19 WHIRS
29 34 WEEPS
43 48 WHIPS
53 58 WHIRS
66 71 WRAPS
#this is what i need in the txt file ^

index   motif_start motif_sequence  
0   66  WRAPS
1   66  WRAPS
2   66  WRAPS
3   66  WRAPS
4   66  WRAPS

#this is all i get^

Solution

  • Is this the result you expect?

    import re
    
    bizarre_protein = "MTLWARPSSKRGWYWHIRSSSHEEEGYFVWEEPSTLAVSFLYCWHIPSWHATSWHIRSSSRVADEGWRAPSPLYW"
    pattern = re.compile("W[A-Z]{1,2}[P|R|N]S")
    
    with open("bizarre_protein.txt", "w") as f:
        f.write("index\tmotif_start\tmotif_sequence\n")
        for m in re.finditer(pattern, bizarre_protein):
            print(m.start(), m.end(), m.group(0))
            f.writelines("{}\t{}\t{}\n".format(m.start(), m.end(), m.group(0)))