Search code examples
pythonpandasnumpyasciireadlines

Reading non uniform lines ascii data - Python


I trying to read a non-uniform lines ascii data, e.g.

 4  0.0790926412 -0.199457773  0.325952223  0.924105917  48915.3072 -2086.17061
  73540.4807 10
 4  0.0245689377 -0.805261448 -0.152373497  0.573006386 -39801.696  49084.2418
  16665.3857 10
 4  0.0427767979 -0.0185129676 -0.143135691 -0.989529911  38770.6518
 -70784.7024  32640.6307 10
 4  0.0262684678  0.137741 -0.820259709 -0.555158921  25293.3918 -51148.4003
 -126522.859 10
 4  0.145932295  0.466618154 -0.00805648931 -0.88442218  90951.8483  19221.4234
 -40205.3438 10
 4  0.0907820906  0.584060054 -0.671576188  0.455915866 -78193.2124 -31269.5848
  47260.338 10
 4  0.0794897928  0.654042761  0.537625452  0.532153117  24643.9195  39614.3788
  97184.4856 10
 4  0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439
 -13991.5163 10
 4  0.0295554749 -0.53757783 -0.3710939  0.757165368  20106.124 -171013.738
 -14052.1145 10
 4  0.0189505245 -0.773354757 -0.0747623556 -0.629549847 -71468.2726
 -53145.1259  36948.4058 10

The problem is that I need read each two lines into one. I am trying using pandas.read_csv or numpy.genfromtxt, but they read and separete into independent lines. I tried to merged every 2 lines without sucess, because, how you can see, sometimes I have a line separated in 7 and 2 columns, somentimes in 6 and 3 columns. With 9 total columns to read.


Solution

  • Something like this should work.

    Put your data in a string, or in a document and manipulate it with python. Then when you have your data as you want you use pandas.

    string1 = '''4  0.0790926412 -0.199457773  0.325952223  0.924105917  48915.3072 -2086.17061
      73540.4807 10
     4  0.0245689377 -0.805261448 -0.152373497  0.573006386 -39801.696  49084.2418
      16665.3857 10
     4  0.0427767979 -0.0185129676 -0.143135691 -0.989529911  38770.6518
     -70784.7024  32640.6307 10
     4  0.0262684678  0.137741 -0.820259709 -0.555158921  25293.3918 -51148.4003
     -126522.859 10
     4  0.145932295  0.466618154 -0.00805648931 -0.88442218  90951.8483  19221.4234
     -40205.3438 10
     4  0.0907820906  0.584060054 -0.671576188  0.455915866 -78193.2124 -31269.5848
      47260.338 10
     4  0.0794897928  0.654042761  0.537625452  0.532153117  24643.9195  39614.3788
      97184.4856 10
     4  0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439
     -13991.5163 10
     4  0.0295554749 -0.53757783 -0.3710939  0.757165368  20106.124 -171013.738
     -14052.1145 10
     4  0.0189505245 -0.773354757 -0.0747623556 -0.629549847 -71468.2726
     -53145.1259  36948.4058 10'''
    
    splitted = string1.splitlines()
    result = ""
    for index,item in enumerate(splitted):
      if index % 2 != 0:
        result += item+ "\n"
      else:
           result += item 
    print(result)
    
    4  0.0790926412 -0.199457773  0.325952223  0.924105917  48915.3072 -2086.17061  73540.4807 10
     4  0.0245689377 -0.805261448 -0.152373497  0.573006386 -39801.696  49084.2418  16665.3857 10
     4  0.0427767979 -0.0185129676 -0.143135691 -0.989529911  38770.6518 -70784.7024  32640.6307 10
     4  0.0262684678  0.137741 -0.820259709 -0.555158921  25293.3918 -51148.4003 -126522.859 10
     4  0.145932295  0.466618154 -0.00805648931 -0.88442218  90951.8483  19221.4234 -40205.3438 10
     4  0.0907820906  0.584060054 -0.671576188  0.455915866 -78193.2124 -31269.5848  47260.338 10
     4  0.0794897928  0.654042761  0.537625452  0.532153117  24643.9195  39614.3788  97184.4856 10
     4  0.0896920622 -0.517384933 -0.609729743 -0.600451889 -17455.9074 -17601.0439 -13991.5163 10
     4  0.0295554749 -0.53757783 -0.3710939  0.757165368  20106.124 -171013.738 -14052.1145 10
    

    Or if you read it from a file:

    data = open('/path/original.txt', 'r')
    string1 = data.read()
    splitted = string1.splitlines()
    result = ""
    for index,item in enumerate(splitted):
       if index % 2 != 0:
         result += item+ "\n"
       else:
         result += item
    new_data = open('/path/new_data.txt','w')
    new_data.write(result)