Search code examples
rlabelfactors

Add character factor column in R based on integer in previous column?


I have a dataframe of hail events and their numeric related damages. I want to add a column (DAMAGE_PROP) that has factors of "yes" or "no" to indicate whether or not there was damage to the property as indicated in the existing column DAMAGE_PROPERTY_NUM. With the code I'm currently using, I'm getting an "unused argument" error (levels = 2, labels = c("yes", "no")).

Can you assist? Here's the code snippet, and the table of listed damages:

 AZ_HAIL$DAMAGE_PROP <- as.factor(AZ_HAIL$DAMAGE_PROPERTY_NUM, 
                              levels = 2, 
                              labels = c("yes", "no"))
table(AZ_HAIL$DAMAGE_PROPERTY_NUM)

       $0    $250   $400   $500  $1000  $2000   $5000    $6000    $10000   $15000    $20000 etc.
     1163       2      1      5      5      4      12        1         7        1         3 

I also tried the more general code

factor(x = character(), levels, labels = levels,
       exclude = NA, ordered = is.ordered(x), nmax = NA)

not using "as.factor"

AZ_HAIL$DAMAGE_PROP <- factor(AZ_HAIL$DAMAGE_PROPERTY_NUM = character(), 
    Error: unexpected '=' in "AZ_HAIL$DAMAGE_PROP <- factor(AZ_HAIL$DAMAGE_PROPERTY_NUM ="

I have also tried

    answer <- factor(c("yes", "no"))
    type <- unlist(lapply(answer, function(x) ifelse(as.numeric(substr(as.character(x), 2,   nchar(as.character(x))))>0, 'yes', 'no')))
    damage <- data.frame(answer=answer, type=factor(type))
    is.factor(AZ_HAIL$DAMAGE_PROP)
    damage
results:
      answer type
    1    yes <NA>
    2     no <NA>

Solution

  • You could say DAMAGE_PROPERTY_NUM > 0 which results in FALSE/TRUE then give labels= "no" "yes" which will be applied in alphabetical order.

    > AZ_HAIL |>
    +   transform(DAMAGE_PROP=factor(DAMAGE_PROPERTY_NUM > 0, labels=c('no', 'yes')))
        DAMAGE_PROPERTY_NUM DAMAGE_PROP
    1                     0          no
    2                     0          no
    ...
    16                    0          no
    17                10000         yes
    18                    0          no
    19                    0          no
    20                    0          no
    21                    0          no
    22                    0          no
    23                  500         yes
    24                    0          no
    25                    0          no
    26                    0          no
    27                    0          no
    28                    0          no
    29                    0          no
    ...
    

    Data:

    set.seed(42)
    AZ_HAIL <- data.frame(
      DAMAGE_PROPERTY_NUM=sample(c(0, 250, 400, 500, 1000, 2000, 5000, 6000, 10000, 
                                   15000, 20000), 1500,
           prob=c(1163, 2, 1, 5, 5, 4, 12, 1, 7, 1, 3), replace=TRUE))