Search code examples
pythonlistcsvtetgen

python 2.7 - reorder the output of lists with a new 'order' constraint


I know some basics in c++, but I am a beginner in python.

I have a piece of working code (see below) and I'd like to add a constraint for formatting its output, and I cannot figure out how to do it...

Let me first explain what the program does:

I have an input file colors.csv that contain a list of colors, one color a line: the colors are defined by their name and colorimetric coordinates X, Y and Z, it looks so:

Colorname, X1, Y1, Z1
Colorname2, X2, Y2, Z2
...etc.

Given any list of XYZ coordinates, contained in another input file targets.csv the program will give me a list of solutions in an output file output.txt

This solution is calculated by first triangulation of the points cloud with tetgen and then barycentric coordinates of the point in a tetrahedron, (but it doesn't matters to explain everything here...)

The solution has the form:

target, name0, density0, name1, density1, name2, density2, name3, density3

There are always only 4 names and associated densities.

It will look for example like this:

122 ,PINKwA,0.202566115168,GB,0.718785775317,PINK,0.0647284446787,TUwA,0.0139196648363

123 ,PINKwA,0.200786239192,GB,0.723766147717,PINK,0.0673550497794,TUwA,0.00809256331169

124 ,PINKwA,0.19900636349,GB,0.72874651935,PINK,0.0699816544755,TUwA,0.00226546268446

125 ,OR0A,0.00155317194109,PINK,0.0716160265958,PINKwA,0.195962072115,GB,0.730868729348

126 ,OR0A,0.00409427478508,PINK,0.0726192660009,PINKwA,0.192113520109,GB,0.731172939105

127 ,OR0A,0.00663537762906,PINK,0.073622505406,PINKwA,0.188264968103,GB,0.731477148862

What I would like to do now?

For practical reasons, I would like my output to follow a certain order. I would like a "priority list" to rule the order of the name, density output.

My actual program output the color names in an order that I don't understand, but anyway I need these color names to be in a specific order, for example PINK should always be the first PINKwA the second, etc.

Instead of:

127 ,OR0A,0.00663537762906,PINK,0.073622505406,PINKwA,0.188264968103,GB,0.731477148862

I want;

127 ,PINK,0.073622505406,PINKwA,0.188264968103,OR0A,0.00663537762906,GB,0.731477148862

Because my priority list says:

0, PINK
1, PINKwA
2, OR0A
3, GB

How could I simply add this function to the code below? Any idea?

EDITED CODE (works...):

import tetgen, geometry
from pprint import pprint
import random, csv
import numpy as np
from pprint import pprint

all_colors = [(name, float(X), float(Y), float(Z))
              for name, X, Y, Z in csv.reader(open('colors.csv'))]

priority_list = {name: int(i)
                 for i, name in csv.reader(open('priority.csv'))}

# background is marked SUPPORT
support_i = [i for i, color in enumerate(all_colors) if color[0] == 'SUPPORT']
if len(support_i)>0:
    support = np.array(all_colors[support_i[0]][1:])
    del all_colors[support_i[0]]
else:
    support = None

tg, hull_i = geometry.tetgen_of_hull([(X,Y,Z) for name, X, Y, Z in all_colors])
colors = [all_colors[i] for i in hull_i]

print ("thrown out: "
       + ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0])))

targets = [(name, float(X), float(Y), float(Z), float(BG))
           for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]

for target in targets:
    name, X, Y, Z, BG = target
    target_point = support + (np.array([X,Y,Z]) - support)/(1-BG)
    tet_i, bcoords = geometry.containing_tet(tg, target_point)

    output = open('output.txt','a')

    if tet_i == None:
        output.write(str(target[0]))
        output.write('\n')


    else:
        names = [colors[i][0] for i in tg.tets[tet_i]]
        sorted_indices = sorted(enumerate(names), key=lambda (i, name): priority_list[name])
        output.write(target[0])
        counting = 0

        for i, name in sorted(enumerate(names), key=lambda (i, name): priority_list[name]):
            output.write(',%s,%s' % (name, bcoords[i]))
            counting = counting + 1

            if counting > 3:
                output.write('\n')
                counting = 0

output.close()

Solution

  • First, you'll need to encode your priority list directly in your Python code :

    priority_list = {
        'PINK': 0,
        'PINKwA': 1,
        'OR0A': 2,
        'GB': 3,
    }
    

    This will let you quickly retrieve the order for a given color name. Then, you can use the key argument to sorted to sort your names by their priority. Critically, though, you need to retrieve not the sorted names but the indices of the sorted names, much like http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html.

    sorted_indices = sorted(enumerate(names), key=lambda (i, name): priority_list[name])
    

    The enumerate builtin annotates each name with its index in the original list of names, and then the sorted builtin sorts the resulting (i, name) pairs based on their rank in the priority list. Then we can write the names out to the file, followed by the corresponding element (using the index value) from the bcoords array.

    for i, name in sorted_indices:
        output.write(',%s,%s' % (name, bcoords[i]))
    

    So, here's what I'd make the final block in your code look like :

    names = [colors[i][0] for i in tg.tets[tet_i]]
    output.write(target[0])
    for i, name in sorted(enumerate(names), key=lambda (i, name): priority_list[name]):
        output.write(',%s,%s' % (name, bcoords[i]))
    output.write('\r\n')
    output.close()
    

    Here I changed your file output strategy to be a bit more Pythonic -- in general, adding strings together is largely not done, it's better instead to create a format string and fill in variables (you can also use .format() on the string to do this). Also, you can make multiple calls to .write() and they will simply continue to write bytes to the file, so no need to create a big long string all at once to write out. Finally, no need to call str on '\r\n' as it's already a string.