Search code examples
pythonlistdirectorycopysnakemake

if element is in a list create a new list with named element + string with same elements in it


I apologize if this is an oddly presented question. I'm having difficulty trying to specifically nail down how to frame this challenge in python and ultimately incorporated into snakemake.

Lets say we have a few list of elements that correspond to files:

gp1 = ["a", "b", "c"]
gp2 = ["x", "y", "z"]

And we have a previous list that names all elements

all = ["a", "b", "c", "x", "y", "z"]

for each "gp#" list I would like to create a new set of lists that represents said element, i.e for gp1 the result would be 3 lists as follows:

gpa = ["a", "b", "c"]
gpb = ["a", "b", "c"]
gpc = ["a", "b", "c"]

and we would have the equivalent for gp2

gpx = ["x", "y", "z"]
gpy = ["x", "y", "z"]
gpz = ["x", "y", "z"]

With these lists in hand I can then use all as a wildcard and do something like:

mkdir {all} #generate a directory for each element
cp gp{all} {all} #copy the files associated with the elements from each list into each respective corresponding directory

(i.e this would be the "shell/run" portion of two snakemake rules at some point).

This would result in a directory structure as follows:

a
|_a
|_b
|_c
b
|_a
|_b
|_c
c       
|_a
|_b
|_c 
x       
|_x
|_y
|_z
y      
|_x
|_y
|_z
z      
|_x
|_y
|_z

Again apologies if this is confusing and I'm happy to clarify as comments come in. I'm having a difficult time conceptualizing how to approach this.


Solution

  • A dictionary is the best way of dynamically creating variable names, since the names that are semantically related are stored together. Otherwise, you'll have to find all these variables from within the lot of other globals you may have.

    For your case, this may work:

    gp1_dict = {"gp" + gp1[i] : gp1 for i in range(len(gp1))}
    gp2_dict = {"gp" + gp2[i] : gp2 for i in range(len(gp2))}
    
    print(gp1_dict)
    print(gp2_dict)
    
    # Outputs:
    # {'gpa': ['a', 'b', 'c'], 'gpb': ['a', 'b', 'c'], 'gpc': ['a', 'b', 'c']}
    # {'gpx': ['x', 'y', 'z'], 'gpy': ['x', 'y', 'z'], 'gpz': ['x', 'y', 'z']}
    

    So now I can refer to each list with it's key name, i.e.:

    print(gp1_dict["gpa"])
    print(gp2_dict["gpx"])
    
    # Outputs
    # ['a', 'b', 'c']
    # ['x', 'y', 'z']
    

    If there's gp1, gp2, gp3... etc, if you have those in an iterable (a list for example), you can create a dictionary of dictionaries just following the same idea.

    Also, if the values in each list are not duplicated between lists, maybe you may want a single dictionary (note that the order in the fors in the comprehension syntax follows the same order that if you were to write nested loops):

    gps = [gp1, gp2]
    gps_dict = {"gp" + gp_x[i] : gp_x for gp_x in gps for i in range(len(gp_x))}
    
    print(gps_dict['gpa'])
    print(gps_dict['gpx'])
    
    # Outputs
    # ['a', 'b', 'c']
    # ['x', 'y', 'z']
    

    And you can easily iterate all the names, for example:

    for key in gps_dict.keys():
        print(key)
    
    # Outputs
    
    # gpa
    # gpb
    # gpc
    # gpx
    # gpy
    # gpz