Search code examples
pythonlistfilereadfile

Problems reading files with letters and numbers


I'm trying to work with a file that contains a header a set of numbers separated by double space and some text at the end (as shown in the image below).

enter image description here

My goal is to extract these numbers so that I can build a graph with them. Another problem is that the program's decimal separator is a comma and python uses a period.

I feel like this is pretty easy to do, but my stupidity limits me.


Solution

  • I cannot provide you a precise answer since we don't have access to the exact file. I'm using this as example.

    I'm using this as sample.txt Check Pastebin here.

    So the code goes:

    f = open("sample.txt","r")
    file_lines = f.read().splitlines()
    header_lines = file_lines[1]
    # split takes a separator as first argument
    headers = [k for k in header_lines.split("  ")]
    numbers_line = file_lines[2]
    # strip remove spaces from the start and end "                1  2 3"
    numbers_line = numbers_line.strip().split("  ")
    # in my example data starts at 4th line and ends at 8th line (inclusive)
    data_line_start = 4
    data_line_end = 8
    data_lines = file_lines[data_line_start-1:data_line_end]
    # format data_lines remove spaces from start and end
    data_lines = [j.strip() for j in data_lines]
    # data_lines => DATA LINES
    # ['0.03592  0.04902  0.0248  0.0327  0.0520  0.0318', '0.0553  0.06602  0.0548  0.0232  0.0710  0.0782', '0.08413  0.04402  0.0348  0.0654  0.0612  0.0428', '0.0543  0.06202  0.0148  0.0732  0.0810  0.0882', '0.0443  0.04102  0.0343  0.0556  0.0652  0.0928']
    # we still need to format this using doble space as separator
    data_array = []
    for data_line in data_lines:
        data_line_formatted = [float(k) for k in data_line.split("  ")]
        data_array.append(data_line_formatted)
    print("HEADERS")
    print(headers)
    print("NUMBERS LINE")
    print(numbers_line)
    print("DATA ARRAY")
    print(data_array)
    

    OUTPUT:

    HEADERS
    ['Plate:', 'PLate1', '1.3', 'PlateFormat', 'EndPoint', 'Absorbance', 'Reduced', 'FALSE', '1', '1', '410', '1', '12', '96', '1', '5']
    NUMBERS LINE
    ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']
    DATA ARRAY
    [[0.03592, 0.04902, 0.0248, 0.0327, 0.052, 0.0318], [0.0553, 0.06602, 0.0548, 0.0232, 0.071, 0.0782], [0.08413, 0.04402, 0.0348, 0.0654, 0.0612, 0.0428], [0.0543, 0.06202, 0.0148, 0.0732, 0.081, 0.0882], [0.0443, 0.04102, 0.0343, 0.0556, 0.0652, 0.0928]]
    

    You can use the open() function to open a file, then get a list of line files and storing into file_lines variable, what's next is just using some python string methods to format the data. The script below might not be useful but you can adapt it to your needs. Let me know if it helped you.