Search code examples
pythonpython-2.7iteratorfastq

Create dictionary using 3 lines as values


I have fastq files I wish to parse in. Below shows an example of 1 'read' of thousands in each file:

@PSI179204_0037:4:1:2139:945#0/2
AGAGATCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGCGAGCCTGAACCAGCCAAGTAGCGTGAGGGACGACTGCCCTACGGGTTGTAAACCTCTTTTGTTCGGGAATAAAGTGCGGCACGCGTGCCGGTTTGTATGTCCCGTTCGAATAG
+PSI179204_0037:4:1:2139:945#0/2
ghhhhhhhhhhhfhdhhhfhhhhhgeeghhhdghfgheh[hhfhfhhhhehghffcahhhhfgcfgeaegd_ah_aaOa[a[aW___W^`a`b`da`ZXO]N^``BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB^C

My aim is to get them to be in a dictionary as shown below, each line has been shortened:

{'  @PSI179204_0037:4': 'AGAGATCCTACG' '+PSI179204_0' 'ghhhhhhhhh' }

I saw on here that you can state a line as a key and then use the next(filename) command to use the next line as the value so attempted to use this but with 3 next(filename) entries as shown in the code below:

file1 = open(inputfile1, 'r')
file2 = open(inputfile2, 'r')
File1dict = {}
File2dict = {}



for key in file1:
    File1dict[key.strip()] = next(file1) = next(file1) = next(file1)
    print (File1dict)

for key in file2:
    File2dict[key.strip()] = next(file2) = next(file2) = next(file2)    
    print (File2dict)

Currently I am getting the following error:

  File "Dict_maybesworking.py", line 31
    File1dict[key.strip()] = next(file1) = next(file1) = next(file1)
SyntaxError: can't assign to function call

Does anyone know how make this code work and if not an alternative?

Whole script below:

from __future__ import print_function
from collections import defaultdict
from itertools import groupby
import argparse 
from itertools import izip

parser = argparse.ArgumentParser() #simplifys the wording of using argparse as stated in the python tutorial
parser.add_argument("-r1", type=str, action='store',  dest='input1', help="input the forward read file") # allows input of the forward read
parser.add_argument("-r2", type=str, action='store', dest='input2', help="input the reverse read file") # allows input of the reverse read
parser.add_argument("-v", "--verbose", action="store_true", help=" Increases the output, only needs to be used to provide feedback for debugging")
parser.add_argument("-u", type=str, action='store', dest='unique', help="Unique insturment number for fastq file required") # allows input of the reverse read
parser.add_argument("-n", action="count", default=0, help="Allows for up to 5 mismatches, Default is 0")
parser.add_argument("-o", "--output", help="Directs the output to a name of your choice")
args = parser.parse_args()

Uni = str(args.unique)
inputfile1 = str(args.input1)
inputfile2 = str(args.input2)
output = str(args.output)  
output_file= open(output, "w")
Unmatched_1 = open('Unmatched_1', "a")
Unmatched_2 = open('Unmatched_2', "a")
file1 = open(inputfile1, 'r')
file2 = open(inputfile2, 'r')
File1dict = {}
File2dict = {}



for key in file1:
    File2dict[key.strip()] = [file2.next(), file2.next(), file2.next()]
    print (File1dict)

for key in file2:
    File2dict[key.strip()] = [file2.next(), file2.next(), file2.next()] 
    print (File2dict)

Command line use:

python Dict_maybesworking.py -r1 Real_test_1 -r2 Real_test_2 -u PSI179204 -o file_result

Solution

  • Since file objects are iterable, you can iterate as you are now to get the key, then slice from the same iterable the next 3 occurences to get the value, eg:

    from itertools import islice
    
    with open('file1') as fin:
        stripped_lines = (line.strip() for line in fin)
        f1dict = {key: list(islice(stripped_lines, 3)) for key in stripped_lines}
    

    Note that the for line in fin consumes one line at a time, but the list(islice(fin, 3)) then consumes 3 lines from fin, so that the next for then consumes the line after that, etc..

    eg:

    >>> from itertools import islice
    >>> r = range(20)
    >>> i = iter(r)
    >>> {key: list(islice(i, 3)) for key in i}
    {0: [1, 2, 3], 8: [9, 10, 11], 4: [5, 6, 7], 12: [13, 14, 15], 16: [17, 18, 19]}