I need to find all possible combinations of the following variables, each containing a X number of observations
Variable Obs
e.g. (black, pink), (black, pink, yellow), (black, pink, yellow, red), (red, green).... Order is not important, so I must delete all the combinations that contain the same elements (black, pink) and (pink, black).
Also, at the end I would need to calculate the number of total observations per each combination.
What is the fastest method, which is also less prone to errors?
I read about Tuples but I am not able to write the code myself.
You can use tuples
(to install ssc install tuples
), like the example below. Note that I use postfile
with a temporary name for the handle and temporary file for the results. After the loop is complete, I open the temporary file colors
, and use gsort
to sort in descending order.
tuples black pink yellow red green
scalar black=1
scalar pink=2
scalar yellow=6
scalar red=15
scalar green=17
tempname colors_handle
tempfile colors
postfile `colors_handle' str40 colors cnt using `colors', replace
forvalues i = 1/`ntuples' {
scalar sum = 0
foreach n of local tuple`i' {
scalar sum = sum + `n'
}
post `colors_handle' ("`tuple`i''") (sum)
}
postclose `colors_handle'
use `colors',clear
gsort -cnt
list
Output:
colors cnt
1. black pink yellow red green 41
2. pink yellow red green 40
3. black yellow red green 39
4. yellow red green 38
5. black pink red green 35
6. pink red green 34
7. black red green 33
8. red green 32
9. black pink yellow green 26
10. pink yellow green 25
11. black pink yellow red 24
12. black yellow green 24
13. yellow green 23
14. pink yellow red 23
15. black yellow red 22
16. yellow red 21
17. black pink green 20
18. pink green 19
19. black green 18
20. black pink red 18
21. green 17
22. pink red 17
23. black red 16
24. red 15
25. black pink yellow 9
26. pink yellow 8
27. black yellow 7
28. yellow 6
29. black pink 3
30. pink 2
31. black 1