I need to create a dataframe with four columns for further processing. Each column has a range from 0-100 with increments of 1 (those are percentages). That's the easy part. The not-so-easy part (for me. for others probably rather trivial) is, that the checksum of each row always has to be 100. That obviously expands the number of rows from 100
to 100*100*100*100
.
For example:
100 0 0 0
99 1 0 0
99 0 0 1
87 4 5 4
...
I've tried creating four lists first
lstA = list(range(0, 101, 1))
lstB = list(range(0, 101, 1))
and then a df out of them, but i'm at a loss how i'm supposed to run that checksum operation on the dataframe, while also sorting the columns and appending the resulting additional rows accordingly.
You need to loop over the different possibilities for the first 3 values and compute the fourth one
out = []
for i in range(101):
for j in range(101-i):
for k in range(101-i-j):
out.append([i, j, k, 100-i-j-k])
df = pd.DataFrame(out)
NB. note that number of valid combinations is 176851.
Output:
0 1 2 3
0 0 0 0 100
1 0 0 1 99
2 0 0 2 98
3 0 0 3 97
4 0 0 4 96
... ... .. .. ...
176846 98 2 0 0
176847 99 0 0 1
176848 99 0 1 0
176849 99 1 0 0
176850 100 0 0 0
[176851 rows x 4 columns]