Search code examples
pythondata-extraction

how to extract a range of data in my special data structure in python


I have a data file with similar special structure as below:

#F A 1 1 1 3 3 2 2 1 0.002796 0.000005 0.000008 -4.938531 1.039083 3 1 0.002796 0.000005 0.000007 -4.938531 1.039083 4 0 0.004961 -0.000008 -0.000002 -4.088534 0.961486 5 0 0.004961 0.000006 -0.000002 -4.079798 0.975763

First column is only a description (no need to be considered)and I want to (1)separate all data that their second column is 1 from the ones that their second column is 0 and then (2)extract the data lines that their 5th number(for example in first data line, it will be 0.000008) is in a specific range and then took the 6th number of that line (for our example it would be -4.938531), then take average of all of them( captured 6th values) and finally write them in a new file. For that I wrote this code that although does not include the first task, also it is not working. could anyone please help me with debugging or suggest me a new method?

A=0.0 #to be used for separating data as mentioned in the first task
B=0.0 #to be used for separating data as mentioned in the first task
with open('inputdatafile') as fin, open('outputfile','w') as fout:
 for line in fin:
    if line.startswith("#"):
        continue
    else:
        col = line.split()
        6th_val=float(col[-2])
        2nd_val=int(col[1])
        if (str(float(col[6])) > 0.000006 and str(float(col[6])) < 0.000009):
            fout.write(" ".join(col) + "\n")
        else:
            del line

Solution

    1. Varaible names in python can't start with a number, so change 6th_val to val_6 and 2nd_val to val_2.
    2. str(float(col[6])) produces string, which can't be compared with float '0.000006', so change any str(float(...)) > xxx to float(...) > xxx .
    3. You don't have to delete line, garabage collector does it for you, so remove 'del line'

      A=0.000006
      B=0.000009
      S=0.0
      C=0
      with open('inputdatafile') as fin, open('outputfile','w') as fout:
        for line in fin:
          if line.startswith("#"):
            continue
          else:
            col = line.split()
            if col[1] == '1':
              val_6=float(col[-2])
              val_5=int(col[-3])
              if val_5 > A and val_5 < B:
                fout.write(" ".join(col) + "\n")
                s += val_6
                c += 1
        fout.write("Average 6th: %f\n" % (S/C))