Search code examples
pythontext

How to write "between" and "until" to extract integers from a text file?


I have a lot of lines like below:

_:9:_:SER _ 9 SER S 3 0.000 66.878  
_:11:_:LEU _ 11 LEU E 8 0.000 67.168    
_:108:_:ARG _ 108 ARG   1 0.000 62.398  

Each item is separated by a space. Because different lines have different numbers (e.g. 9, 11, 108), the subsequent values are not aligned at the same column positions. In the 3rd line, there are 3 spaces between ARG and 1 because this line does not have that information so a space is used to fill it.

I need to extract two pieces of information

  1. the 9, 11, 108 values at 1st, 2nd and 3rd lines, respectively

  2. the 3, 8, 1 values (before the 0.000) at 1st, 2nd and 3rd lines, respectively

I want to use python to write a generalised script to extract that information, instead of a lengthy if-then loop to consider one-digit, two-digit and three-digit cases individually.

My idea is something like this:

  1. extract the integer values between the first and the second :

  2. extract the integer values after the 5th space, until another space is detected after that value.


Solution

  • If you can get each line as a string, you can do this:

    your_line = "_:108:_:ARG _ 108 ARG   1 0.000 62.398"
    splitted = your_line.split()
    # splitted = ['_:108:_:ARG', '_', '108', 'ARG', '1', '0.000', '62.398']
    

    Essentially, it makes a list of strings from the original list, delimited by the character you put in. If you don't give split() an argument then it just splits the string on any whitespace.

    Now you can easily extract the information you want:

    info1 = splitted[2]
    info2 = splitted[4]