Sorry if this is a repeat but I can't find it for now.
Basically I am opening and reading a dat
file which contains a load of paths that I need to loop through to get certain information.
Each of the lines in the base.dat
file contains m.somenumber
. For example some lines in the file might be:
Volumes/hard_disc/u14_cut//u14m12.40_all.beta/beta8
Volumes/hard_disc/u14_cut/u14m12.50_all.beta/beta8
Volumes/hard_disc/u14_cut/u14m11.40_all.beta/beta8
I need to be able to re-write the dat file so that all the lines are re-ordered from the largest m.number to the smallest m.number. Then when I loop through PATH in database (shown in code) I am looping through in decreasing m.
Here is the relevant part of the code
base = open('base8.dat', 'r')
database= base.read().splitlines()
base.close()
counter=0
mu_list=np.array([])
delta_list=np.array([])
ofsset = 0.00136
beta=0
for PATH in database:
if os.path.exists(str(PATH)+'/CHI/optimal_spectral_function_CHI.dat'):
n1_array = numpy.loadtxt(str(PATH)+'/AVERAGES/av-err.n.dat')
n7_array= numpy.loadtxt(str(PATH)+'/AVERAGES/av-err.npx.dat')
n1_mean = n1_array[0]
delta=round(float(5.0+ofsset-(n1_array[0]*2.+4.*n7_array[0])),6)
par = open(str(PATH)+"/params10", "r")
for line in par:
counter= counter+1
if re.match("mu", line):
mioMU= re.findall('\d+', line.translate(None, ';'))
mioMU2=line.split()[2][:-1]
mu=mioMU2
print mu, delta, PATH
mu_list=np.append(mu_list, mu)
delta_list=np.append(delta_list,delta)
optimal_counter=0
print delta_list, mu_list
I have checked the possible flagged repeat but I can't seem to get it to work for mine because my file doesn't technically contain strings and numbers. The 'number' I need to sort by is contained in the string as a whole:
Volumes/data_disc/u14_cut/from_met/u14m11.40_all.beta/beta16
and I need to sort the entire line by just the m(somenumber) part
Assuming that the number part of your line has the form of a float you can use a regular expression to match that part and convert it from string to float.
After that you can use this information in order to sort all the lines read from your file. I added a invalid line in order to show how invalid data is handled.
As a quick example I would suggest something like this:
import re
# TODO: Read file and get list of lines
l = ['Volumes/hard_disc/u14_cut/u14**m12.40**_all.beta/beta8',
'Volumes/hard_disc/u14_cut/u14**m12.50**_all.beta/beta8',
'Volumes/hard_disc/u14_cut/u14**m11.40**_all.beta/beta8',
'Volumes/hard_disc/u14_cut/u14**mm11.40**_all.beta/beta8']
regex = r'^.+\*{2}m{1}(?P<criterion>[0-9\.]*)\*{2}.+$'
p = re.compile(regex)
criterion_list = []
for s in l:
m = p.match(s)
if m:
crit = m.group('criterion')
try:
crit = float(crit)
except Exception as e:
crit = 0
else:
crit = 0
criterion_list.append(crit)
tuples_list = list(zip(criterion_list, l))
output = [element[1] for element in sorted(tuples_list, key=lambda t: t[0])]
print(output)
# TODO: Write output to new file or overwrite existing one.
Giving:
['Volumes/hard_disc/u14_cut/u14**mm11.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m11.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m12.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m12.50**_all.beta/beta8']
This snippets starts after all lines are read from the file and stored into a list (list called l
here). The regex group criterion
catches the float part contained in **m12.50**
as you can see on regex101. So iterating through all the lines gives you a new list containing all matching groups as floats. If the regex does not match on a given string or casting the group to a float fails, crit
is set to zero in order to have those invalid lines at the very beginning of the sorted list later.
After that zip()
is used to get a list of tules containing the extracted floats and the according string. Now you can sort this list of tuples based on the tuple's first element and write the according string to a new list output
.