Search code examples
python-3.xpandasiteratorhardware

Slow iteration with pandas


I'm using the following code to generate all the chords with 6 elements or less, with 12 possible notes for each element. So the quantity of chords generated should be : (12 * 12 * 12 * 12 * 12 * 12) + (12 * 12 * 12 * 12 * 12) + (12 * 12 * 12 * 12) + (12 * 12 * 12) + (12 * 12) + (12) = 3.257.436 . Right ?

I believe it will take 30 hours to finish on my notebook, if the processing velocity dosent change with time... I made a free Virtual machine on google clouds (8 vCpus, 8gb de ram) and executed the script, but its been almost 4 hours alredy.

So i'm thinking if there is a way to speed up the process. I couldnt use the Vms with 16 vCpus. And i dont know what i can do to improve my script.

def calculando_todos_acordes_e_diferencas():
    import pandas as pd
    import itertools                          
    anagrama=[]
    for i in range(1,13):
        anagrama.append(i)

    tst=[[[0],[0]]]
    df=pd.DataFrame(tst, columns=["notas","diferencas"])
    count_name=-1

    for qntd_notas in range(7):
        for i in itertools.product((anagrama), repeat=qntd_notas) :
            diferencas=[]
            count=-1
            for primeiro in i :
                count=count+1
        
        
                if i.index(primeiro) != len(i)-1 :
                    for segundo in i[count+1:]:
                        diferenca= segundo - primeiro
                        if diferenca < 0 :
                            diferenca=diferenca* -1
                        diferencas.append(diferenca)

          #  if len(df.index) == 100000 :
           #     count_name=count_name+1
            #    df=df.append({"notas":list(i),"diferencas":diferencas},ignore_index=True)
             #   df.to_csv("acordes e diferencas pt %s.csv" %(count_name), index=False)
              #  df=pd.DataFrame(tst, columns=["notas","diferencas"])

            df=df.append({"notas":list(i),"diferencas":diferencas},ignore_index=True)
    
    df.to_csv("acordes e diferencas TOTAL2.csv", index=False)
            #else:
            
     
calculando_todos_acordes_e_diferencas()

Solution

  • If I understand correctly, what you want are the combinations of all notes for group sizes of 1-6. This does not yield 3.2 millions possibilities, but only 2509.

    What you are looking for is a powerset. This is actually achieved very quickly with itertools and you have a recipe for it in the documentation, which I adapted here for your need:

    from itertools import chain, combinations
    
    def powerset(iterable, maximum=6):
        s = list(iterable)
        if not maximum:
            maximum=len(s)
        return chain.from_iterable(combinations(s, r) for r in range(1, maximum+1))
    

    Then use:

    chords = list(powerset(range(12), maximum=6))
    

    And voilà... runs in 200µs, not 30 hours ;)

    If you really want the permutations, replace combinations with permutations in the above code. Runs in ~100µs.