I want to create a variable that is one or zero if a company (companyid
below) is "multicolor" in each year. Below is my data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 companyid int year float(red blue green)
"001045" 2015 0 1 0
"001045" 2015 0 1 0
"001045" 2015 0 1 0
"001045" 2015 0 1 0
"001045" 2017 1 0 0
"001045" 2017 1 0 0
"001049" 2019 0 1 0
"001049" 2019 0 0 1
"001055" 2018 1 0 0
"001055" 2018 0 1 0
"001055" 2018 0 0 1
So for example, company #001055 is red, blue, and green for 2018 so this 'multicolor' variable should equal to one.
Additionally, I also want to create variables for the different combinations. I.e. a red-blue var = 1 if a company is red and blue = 1 in each year.
I was trying to do something with bysort companyid year: gen multicolor = 1 if red == 1 & blue == 1 & green == 1
but I realize that has a lot missing in what I want to accomplish.
The overall goal is to reduce multiple year observations so I have one observation per year per company.
This single year/company record would have the info if that company was red, green, blue, or the exact mix of these colors if it is mixed. Below would be the example of data that I want to create from the data above.
input str6 companyid int year float(red blue green r-b-g red-blue blue-green ...more...)
"001045" 2015 0 1 0 0 0 0 ...
"001045" 2017 1 0 0 0 0 0 ...
"001049" 2019 0 0 0 0 0 1 ...
"001055" 2018 0 0 0 1 0 0 ...
I think this is a lot easier than you are fearing. First, collapse
to maximum values by company and year. Then you have the individual values of red blue green
. Second, concatenate the values, so that "110"
is red and blue but not green, and so on.
tabulate
would generate all the indicators corresponding to combinations found in the data.
In effect, the 3 colors and 2 possibilities permit binary encoding, and the string is a binary number too.
The correspondence for true 1 and false 0 that maximum over 0s and 1s means "any" and that minimum over 0s and 1s means "all" is obvious once understood, but worth explaining otherwise. For a Stata context, see this FAQ
clear
input str6 companyid int year float(red blue green)
"001045" 2015 0 1 0
"001045" 2015 0 1 0
"001045" 2015 0 1 0
"001045" 2015 0 1 0
"001045" 2017 1 0 0
"001045" 2017 1 0 0
"001049" 2019 0 1 0
"001049" 2019 0 0 1
"001055" 2018 1 0 0
"001055" 2018 0 1 0
"001055" 2018 0 0 1
end
collapse (max) red blue green, by(companyid year)
egen colors = concat(red blue green)
list
+-----------------------------------------------+
| compan~d year red blue green colors |
|-----------------------------------------------|
1. | 001045 2015 0 1 0 010 |
2. | 001045 2017 1 0 0 100 |
3. | 001049 2019 0 1 1 011 |
4. | 001055 2018 1 1 1 111 |
+-----------------------------------------------+