Search code examples
rdummy-variablecategorization

Categorizing variable and make it a dummy in R


Guten Tag community :)

I'm currently working with the ESGscore of some companies. The ESGscore varies between 0 and 100.

I want to categorize the ESGscore into 4 segments:

0 - 25 --> poor --> 4
>25 - 50 --> medium --> 3
>50 - 75 --> good --> 2
75 - 100 --> excellent --> 1

The problem with dummy.code is that it is rearranging the ESGscore. So for example the ESGscore of the AIR PRODUCTS & CHEMICALS INC is always 'excellent' but the output shows me, that it is just medium.

This is what the CODE looks like:

Datensatz_final_so$ESG.Kategorien <- ifelse(Datensatz_final_so$ESGscore <= 25, "4",
                                            ifelse(Datensatz_final_so$ESGscore > 25 & Datensatz_final_so$ESGscore <= 50, "3",
                                                   ifelse(Datensatz_final_so$ESGscore > 50 & Datensatz_final_so$ESGscore <= 75, "2",
                                                          ifelse(Datensatz_final_so > 75, "1", 0))))``
    # Create ESGscore dummy #
    Dummy.ESG <- dummy.code(Datensatz_final_so$ESG.Kategorien)

colnames(Dummy.ESG) <- c("poor", "medium", "good", "excellent")

# Connect data and dummy #
Datensatz_final <- cbind(Datensatz_final, Dummy.ESG)

Do you know how to fix that?

One way is to rearrange colnames to

colnames(Dummy.ESG) <- c("good", "excellent", "poor", "medium")

but it is creating the problem, that R picks medium for reference in the analysis.

Thank you in advance! :)

DATA EXAMPLE:

    structure(list(Company = c("AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC", 
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC", 
"AIR PRODUCTS & CHEMICALS INC", "AIR PRODUCTS & CHEMICALS INC", 
"AIR PRODUCTS & CHEMICALS INC", "HESS CORP", "HESS CORP", "HESS CORP", 
"HESS CORP", "HESS CORP", "HESS CORP", "HESS CORP", "APACHE CORP", 
"APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP", "APACHE CORP", 
"APACHE CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP", 
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "AVERY DENNISON CORP", 
"AVERY DENNISON CORP", "AVERY DENNISON CORP", "BALL CORP", "BALL CORP", 
"BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP", "BALL CORP", 
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", 
"CHEVRON CORP", "CHEVRON CORP", "CHEVRON CORP", "ECOLAB INC", 
"ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC", "ECOLAB INC", 
"ECOLAB INC", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", 
"EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", "EXXON MOBIL CORP", 
"FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", "FMC CORP", 
"FMC CORP", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", 
"HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", "HALLIBURTON CO", 
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE", 
"HELMERICH & PAYNE", "HELMERICH & PAYNE", "HELMERICH & PAYNE", 
"HELMERICH & PAYNE"), Year = c(2011, 2012, 2013, 2014, 2015, 
2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 
2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 
2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 
2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 2012, 2013, 2014, 
2015, 2016, 2017, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2011, 
2012, 2013, 2014, 2015, 2016, 2017), gvkey = c(1209, 1209, 1209, 
1209, 1209, 1209, 1209, 1380, 1380, 1380, 1380, 1380, 1380, 1380, 
1678, 1678, 1678, 1678, 1678, 1678, 1678, 1913, 1913, 1913, 1913, 
1913, 1913, 1913, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 2991, 
2991, 2991, 2991, 2991, 2991, 2991, 4213, 4213, 4213, 4213, 4213, 
4213, 4213, 4503, 4503, 4503, 4503, 4503, 4503, 4503, 4510, 4510, 
4510, 4510, 4510, 4510, 4510, 5439, 5439, 5439, 5439, 5439, 5439, 
5439, 5581, 5581, 5581, 5581, 5581, 5581, 5581), ESGscore = c(84.2750015258789, 
81.9225006103516, 77.4024963378906, 80.1125030517578, 78.6449966430664, 
76.3775024414062, 79.2699966430664, 69.4899978637695, 65.8300018310547, 
64.4300003051758, 74.3000030517578, 75.7600021362305, 71.4599990844727, 
74.6900024414062, 55.8300018310547, 56.0900001525879, 57.5, 60.75, 
60.8800010681152, 67.379997253418, 71.9899978637695, 82.9000015258789, 
77.3899993896484, 76.9300003051758, 78.7399978637695, 76.2283325195312, 
74.2125015258789, 68.3600006103516, 64.4100036621094, 65.6600036621094, 
63.75, 67.7300033569336, 67.5699996948242, 74.4300003051758, 
68.5699996948242, 86.5100021362305, 84.3099975585938, 82.6600036621094, 
82.3399963378906, 88.4100036621094, 90.0800018310547, 92.25, 
74.6999969482422, 72.3600006103516, 68.3899993896484, 67.9300003051758, 
65.629997253418, 74.9000015258789, 74.8600006103516, 81.6999969482422, 
79.370002746582, 79.0899963378906, 75.25, 81.9499969482422, 81.0199966430664, 
88.3399963378906, 59.8199996948242, 55.6500015258789, 52.2999992370605, 
51.8499984741211, 56.9199981689453, 66.620002746582, 65.3300018310547, 
85.9800033569336, 83.9499969482422, 85.1100006103516, 67.4300003051758, 
76.4400024414062, 69.9199981689453, 78.4599990844727, 19.0599994659424, 
17.5200004577637, 18.1200008392334, 23.5025005340576, 35.5349998474121, 
36.7350006103516, 41.1725006103516)), row.names = c(NA, -77L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • Your sample data does not include a variable called ESG.Kategorien but it does include ESGscore. The following should give you what you want:

    Datensatz_final_so$Dummy <- cut(Datensatz_final_so$ESGscore, breaks=c(0, 25, 50, 75, 100), labels=c("poor", "medium", "good", "excellent"))
    table(Datensatz_final_so$Dummy)
    # 
    #      poor    medium      good excellent 
    #         4         3        38        32 
    levels(Datensatz_final_so$Dummy)
    # [1] "poor"      "medium"    "good"      "excellent" 
    

    Note your original categorization places 75 in good AND excellent.