Search code examples
pythonindexingcontent-length

How to get the length of the column #3? Python


I'm super new to this, I honestly don't understand that much. Can someone help me to create a code to get the sum of column # 3, sorry if this is too silly, hope you can help me. Thanks

It's a tab file.

#Open file (must be a .tab file)

file = open("chromosome_length.tab")

#According to the READ ME file, chromosome 17 is the mitochondrial chromosome.

##Print line 17

lines_to_print = [16]

for index, line in enumerate(file):
  if ( index in lines_to_print):
    print("Mitochondrial chromosome:")
    print(line)

#How long are the chromosome?

with open("chromosome_length.tab") as f:
    lines = f.read().split('\n')

values = [int(i.split()[2]) for i in lines]
print(sum(values))

#Error:

Traceback (most recent call last):
  File "/Users/vc/Downloads/assig.py", line 19, in <module>
    values = [int(i.split()[2]) for i in lines]
  File "/Users/vc/Downloads/assig.py", line 19, in <listcomp>
    values = [int(i.split()[2]) for i in lines]
IndexError: list index out of range

Process finished with exit code 1

FILE:

3   NC_001135   316620
4   NC_001136   1531933
5   NC_001137   576874

Solution

  • You can do this:

    with open('chromosome_length.tab') as f:
        lines = f.read().split('\n')
    
    values = [int(i.split()[2]) for i in lines if i]
    print(sum(values))
    

    Explanation:

    Opening the file chromosome_length.tab in reading mode, reading all the text, splitting the text by new line (\n)
    At this point, we have something like this in our lines list:

    [
        "1 NC1234 1234",
        "2 NC4321 5678",
        ...
    ]
    

    In order to get the 3rd column of each line, we iterate through each line in lines, split the line by space, so we have ["1", "NC1234", "1234"], get the 3rd column by [2] and convert it to int.

    So, we have all the values in our values list: [1234, 5678, ...]

    In the end, we use the built-in function sum() to sum the values in the values list and print them


    UPD: Problem was in the empty string '' at the end of the list. Adding filter if i for our inline for loop solved this issue.


    Hope that helps :)