Search code examples
pythonfilefile.readalllines

Python Re-ordering the lines in a dat file by string


Sorry if this is a repeat but I can't find it for now.

Basically I am opening and reading a dat file which contains a load of paths that I need to loop through to get certain information.

Each of the lines in the base.dat file contains m.somenumber. For example some lines in the file might be:

Volumes/hard_disc/u14_cut//u14m12.40_all.beta/beta8
Volumes/hard_disc/u14_cut/u14m12.50_all.beta/beta8
Volumes/hard_disc/u14_cut/u14m11.40_all.beta/beta8

I need to be able to re-write the dat file so that all the lines are re-ordered from the largest m.number to the smallest m.number. Then when I loop through PATH in database (shown in code) I am looping through in decreasing m.

Here is the relevant part of the code

base = open('base8.dat', 'r')
database= base.read().splitlines()
base.close()
counter=0
mu_list=np.array([])
delta_list=np.array([])
ofsset = 0.00136
beta=0


for PATH in database:
    if os.path.exists(str(PATH)+'/CHI/optimal_spectral_function_CHI.dat'):    

        n1_array = numpy.loadtxt(str(PATH)+'/AVERAGES/av-err.n.dat')
        n7_array= numpy.loadtxt(str(PATH)+'/AVERAGES/av-err.npx.dat')
        n1_mean = n1_array[0]
        delta=round(float(5.0+ofsset-(n1_array[0]*2.+4.*n7_array[0])),6)

        par = open(str(PATH)+"/params10", "r")

        for line in par:
            counter= counter+1
            if re.match("mu", line):
                mioMU= re.findall('\d+', line.translate(None, ';'))
                mioMU2=line.split()[2][:-1]
                mu=mioMU2
                print mu, delta, PATH

                mu_list=np.append(mu_list, mu)
                delta_list=np.append(delta_list,delta)

        optimal_counter=0

print delta_list, mu_list

I have checked the possible flagged repeat but I can't seem to get it to work for mine because my file doesn't technically contain strings and numbers. The 'number' I need to sort by is contained in the string as a whole:

Volumes/data_disc/u14_cut/from_met/u14m11.40_all.beta/beta16

and I need to sort the entire line by just the m(somenumber) part


Solution

  • Assuming that the number part of your line has the form of a float you can use a regular expression to match that part and convert it from string to float.

    After that you can use this information in order to sort all the lines read from your file. I added a invalid line in order to show how invalid data is handled.

    As a quick example I would suggest something like this:

    import re
    
    # TODO: Read file and get list of lines
    
    l = ['Volumes/hard_disc/u14_cut/u14**m12.40**_all.beta/beta8',
        'Volumes/hard_disc/u14_cut/u14**m12.50**_all.beta/beta8',
        'Volumes/hard_disc/u14_cut/u14**m11.40**_all.beta/beta8',
        'Volumes/hard_disc/u14_cut/u14**mm11.40**_all.beta/beta8']
    
    regex = r'^.+\*{2}m{1}(?P<criterion>[0-9\.]*)\*{2}.+$'
    p = re.compile(regex)
    
    criterion_list = []
    
    for s in l:
        m = p.match(s)
        if m:
            crit = m.group('criterion')
            try:
                crit = float(crit)
            except Exception as e:
                crit = 0
        else:
            crit = 0
        criterion_list.append(crit)
    
    
    tuples_list = list(zip(criterion_list, l))
    output = [element[1] for element in sorted(tuples_list, key=lambda t: t[0])]
    print(output)
    
    # TODO: Write output to new file or overwrite existing one.
    

    Giving:

    ['Volumes/hard_disc/u14_cut/u14**mm11.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m11.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m12.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m12.50**_all.beta/beta8']
    

    This snippets starts after all lines are read from the file and stored into a list (list called l here). The regex group criterion catches the float part contained in **m12.50** as you can see on regex101. So iterating through all the lines gives you a new list containing all matching groups as floats. If the regex does not match on a given string or casting the group to a float fails, crit is set to zero in order to have those invalid lines at the very beginning of the sorted list later.

    After that zip() is used to get a list of tules containing the extracted floats and the according string. Now you can sort this list of tuples based on the tuple's first element and write the according string to a new list output.