Search code examples
pythonfloating-pointtype-conversiontensorscientific-notation

How can I parse a text file with scientific notation ( in tensor format) and turn them into float


I have multiple txt file in the format:

[tensor([[1.7744e+02, 4.7730e+02, 1.2396e+02, 1.1678e+02, 5.9988e-01],
         [7.8410e+02, 1.7532e+02, 6.2769e+02, 2.1083e+02, 9.9969e-01],
         device='cuda:0')]

I want to remove tensor, [], (), ,device='cuda:0' and convert scientific notation to decimal to get the output as:

177.44 4.77.30 1.23.96 1.16.78 5.9.988
784.10 175.32 627.69 210.83 99.969

This is my program:

for i in os.listdir():
if i.endswith(".txt"):
with open(i, "r+") as f:
    content = f.readlines()

    f.truncate(0)
    f.seek(0)

    for line in content:
        if not line.startswith("[tensor(["):
            f.write(line)
        elif not line.startswith('        '):
            f.write(line)
        elif not line.startswith("device='"):
            f.write(line)

The tensor character is gone but all the other are remaining, how to remove other characters ( also the white space at the beginning of each line)


Solution

  • Hi you can leverange numpy.matrix ability to transform an string with array shape to create a matrix, then if you need in array not matrix convert with numpy.array

    #data Definition
    data = """[tensor([[1.7744e+02, 4.7730e+02, 1.2396e+02, 1.1678e+02, 5.9988e-01],
             [7.8410e+02, 1.7532e+02, 6.2769e+02, 2.1083e+02, 9.9969e-01],
             device='cuda:0')]"""
    
    #cleaningStep, remove tensor, and all other things
    elementsToRemove= ['\n',' ','[tensor(','device=',"'cuda:0')"]
    
    cleanData = data
    for el in elementsToRemove:
        cleanData = cleanData.replace(el,'')
    
    #convert to numeric using np.matrix
    import numpy as np
    
    numericData_matrix = np.matrix(cleanData)
    numericData_array = np.array(numericData_matrix)
    

    hope this solves your problem!