Search code examples
pythonpandasdataframedata-analysis

Creating/Manipulating a pandas dataframe from multiple lists?


I need to create a dataframe with four columns for further processing. Each column has a range from 0-100 with increments of 1 (those are percentages). That's the easy part. The not-so-easy part (for me. for others probably rather trivial) is, that the checksum of each row always has to be 100. That obviously expands the number of rows from 100 to 100*100*100*100.

For example:

100 0 0 0
99  1 0 0
99  0 0 1
87  4 5 4
...

I've tried creating four lists first

lstA = list(range(0, 101, 1))
lstB = list(range(0, 101, 1))

and then a df out of them, but i'm at a loss how i'm supposed to run that checksum operation on the dataframe, while also sorting the columns and appending the resulting additional rows accordingly.


Solution

  • You need to loop over the different possibilities for the first 3 values and compute the fourth one

    out = []
    for i in range(101):
        for j in range(101-i):
            for k in range(101-i-j):
                out.append([i, j, k, 100-i-j-k])
                    
    df = pd.DataFrame(out)
    

    NB. note that number of valid combinations is 176851.

    Output:

              0  1  2    3
    0         0  0  0  100
    1         0  0  1   99
    2         0  0  2   98
    3         0  0  3   97
    4         0  0  4   96
    ...     ... .. ..  ...
    176846   98  2  0    0
    176847   99  0  0    1
    176848   99  0  1    0
    176849   99  1  0    0
    176850  100  0  0    0
    
    [176851 rows x 4 columns]