I'm super new to this, I honestly don't understand that much. Can someone help me to create a code to get the sum of column # 3, sorry if this is too silly, hope you can help me. Thanks
It's a tab file.
#Open file (must be a .tab file)
file = open("chromosome_length.tab")
#According to the READ ME file, chromosome 17 is the mitochondrial chromosome.
##Print line 17
lines_to_print = [16]
for index, line in enumerate(file):
if ( index in lines_to_print):
print("Mitochondrial chromosome:")
print(line)
#How long are the chromosome?
with open("chromosome_length.tab") as f:
lines = f.read().split('\n')
values = [int(i.split()[2]) for i in lines]
print(sum(values))
#Error:
Traceback (most recent call last):
File "/Users/vc/Downloads/assig.py", line 19, in <module>
values = [int(i.split()[2]) for i in lines]
File "/Users/vc/Downloads/assig.py", line 19, in <listcomp>
values = [int(i.split()[2]) for i in lines]
IndexError: list index out of range
Process finished with exit code 1
FILE:
3 NC_001135 316620
4 NC_001136 1531933
5 NC_001137 576874
You can do this:
with open('chromosome_length.tab') as f:
lines = f.read().split('\n')
values = [int(i.split()[2]) for i in lines if i]
print(sum(values))
Explanation:
Opening the file chromosome_length.tab
in reading mode, reading all the text, splitting the text by new line (\n
)
At this point, we have something like this in our lines
list:
[
"1 NC1234 1234",
"2 NC4321 5678",
...
]
In order to get the 3rd column of each line, we iterate through each line in lines
, split the line by space, so we have ["1", "NC1234", "1234"]
, get the 3rd column by [2]
and convert it to int
.
So, we have all the values in our values
list: [1234, 5678, ...]
In the end, we use the built-in function sum()
to sum the values in the values
list and print them
UPD: Problem was in the empty string ''
at the end of the list. Adding filter if i
for our inline for
loop solved this issue.
Hope that helps :)