So there is this sed command that allows you to transform the quality code in ASCII into bar symbols:
sed -e 'n;n;n;y/!"#$%&'\''()*+,-.\/0123456789:;<=>?@ABCDEFGHIJKL/▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇██████/' myfile.fastq
I have been checking ways to do the same in python, but I have not found a solution I can use. Maybe pysed or re.sub, but I do not even know how to write the ASCII code in a string without python getting mixed up the characters.
So, you want to transliterate characters in the 3rd line of your FASTQ file?
You can use str.translate
on translation table built with str.maketrans
:
#!/usr/bin/env python3
lut = str.maketrans('''!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKL''',
'''▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇██████''')
with open('/path/to/fastq') as f:
line3 = f.readlines()[3].strip()
print(line3.translate(lut))
For a sample file from Wikipedia:
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
the Python script above will produce:
▁▁▁▂▁▁▁▁▂▂▂▂▂▂▁▁▁▂▂▂▁▁▁▁▁▂▃▃▂▂▂▂▂▂▁▁▂▂▂▂▄▄▇▇▇▆▆▆▆▆▆▇▇▇▇▇▇▇▄▄
However, note that according to FASTQ format description on Wikipedia, your translation table is incorrect. The character !
represents the lowest quality while ~
is the highest (not L
as you have).
Also note that quality value characters directly map the ASCII character range !
-~
to the quality value. In other words, we can build the translation table programmatically:
span = ord('█') - ord('▁') + 1
src = ''.join(chr(c) for c in range(ord('!'), ord('~')+1))
dst = ''.join(chr(ord('▁') + span*(ord(c)-ord('!'))//len(src)) for c in src)
lut = str.maketrans(src, dst)