Search code examples
rdplyrfilterconditional-statementsconditional-formatting

R and dplyr it's not recognizing a variable and realizing the condition


I have one data frame about animal sightings (more than 300), with species of whales, dolphins, pinipedes and penguins.

And I want to create a new column reino, which would be misticeto for whales, odontoceto for dolphins, pinipede for pinipedes and penguin for penguins.

But when I do that, one specific whale (Ballena franca austral (Eubalaena australis)) result in NA value at reino column.

I tried million things, but nothing worked.

library(data.table)
library(dplyr)
> packageVersion("dplyr")
[1] ‘1.1.4’


dt = data.table(especie= c('Ballena franca austral (Eubalaena australis)','Ballena barbada no identificada (Parvorden Mysticeti)', 'Ballena jorobada (Megaptera novaeangliae)', 'Ballena rorcual no identificada (Balaenoptera sp.)', 'Ballena sei (Balaenoptera borealis)', 'Ballena fin (Balaenoptera physalus)','Delfín austral (Lagenorhynchus australis)', 'Delfín no identificado (Familia Delphinidae)', 'Tonina overa (Cephalorhynchus commersonii)', 'Delfín oscuro (Lagenorhynchus obscurus)','Lobo marino de dos pelos no identificado (Arctocephalus sp.)', 'Lobo marino no identificado (Familia Otariidae)', 'Lobo marino de dos pelos sudamericano (Arctocephalus australis)', 'Lobo marino de un pelo sudamericano (Otaria flavescens)', 'Lobo marino de dos pelos antártico (Arctocephalus gazella)','Lobo marino de dos pelos sudamericano (Arctocephalusaustralis)','Pingüino patagónico (Spheniscus magellanicus)', 'Pingüino no identificado (Familia Spheniscidae)', 'Cormorán Imperial (Phalacrocorax albiventer)', 'Pingüino penacho amarillo austral (Eudyptes chrysocome)', 'Pingüino rey (Aptenodytes patagonicus)'))

as.character(dt$especie)

I want do do this:

dt$reino[dt$especie %in% c('Ballena franca austral (Eubalaena australis)','Ballena barbada no identificada (Parvorden Mysticeti)', 'Ballena jorobada (Megaptera novaeangliae)', 'Ballena rorcual no identificada (Balaenoptera sp.)', 'Ballena sei (Balaenoptera borealis)', 'Ballena fin (Balaenoptera physalus)', 'Ballena barbada no identificada(Parvorden Mysticeti)')] <- 'misticeto'

dt$reino[dt$especie %in% c('Delfín austral (Lagenorhynchus australis)', 'Delfín no identificado (Familia Delphinidae)', 'Tonina overa (Cephalorhynchus commersonii)', 'Delfín oscuro (Lagenorhynchus obscurus)')] <- 'odontoceto'

dt$reino[dt$especie %in% c('Lobo marino de dos pelos no identificado (Arctocephalus sp.)', 'Lobo marino no identificado (Familia Otariidae)', 'Lobo marino de dos pelos sudamericano (Arctocephalus australis)', 'Lobo marino de un pelo sudamericano (Otaria flavescens)', 'Lobo marino de dos pelos antártico (Arctocephalus gazella)','Lobo marino de dos pelos sudamericano (Arctocephalusaustralis)')] <- 'pinipede'

dt$reino[dt$especie %in% c('Pingüino patagónico (Spheniscus magellanicus)', 'Pingüino no identificado (Familia Spheniscidae)', 'Cormorán Imperial (Phalacrocorax albiventer)', 'Pingüino penacho amarillo austral (Eudyptes chrysocome)', 'Pingüino rey (Aptenodytes patagonicus)')] <- 'penguin'

When I run the code for the first time came this warning:

Warning message: Unknown or uninitialised column: reino.

Then, I continue running the code, and the new column reino came with NA value only for Ballena franca austral (Eubalaena australis).

I tried to do these things, but didn't work:

> class(dt)
[1] "tbl_df"     "tbl"        "data.frame"

### 1
dt %>% left_join(data.frame(especie = c('Ballena franca austral (Eubalaena australis)', reino = c('misticeto'))))

## 2
mutate(reino = case_match(especie, 'Ballena franca austral (Eubalaena australis)' ~ 'misticeto'))

## 3
misticeto <- filter (dt, especie %in% c('Ballena barbada no identificada (Parvorden Mysticeti)','Ballena barbada no identificada(Parvorden Mysticeti)', 'Ballena fin (Balaenoptera physalus)','Ballena jorobada (Megaptera novaeangliae)', 'Ballena rorcual no identificada (Balaenoptera sp.)','Ballena sei (Balaenoptera borealis)','Ballena franca austral (Eubalaena australis)'))

## 4
dt <- dt %>%
  mutate(reino = case_when(especie %in% c('Ballena franca austral (Eubalaena australis)','Ballena barbada no identificada (Parvorden Mysticeti)','Ballena barbada no identificada(Parvorden Mysticeti)', 'Ballena fin (Balaenoptera physalus)','Ballena jorobada (Megaptera novaeangliae)', 'Ballena rorcual no identificada (Balaenoptera sp.)','Ballena sei (Balaenoptera borealis)') ~ 'misticeto', 
                           
                           especie %in% c('Delfín austral (Lagenorhynchus australis)', 'Delfín no identificado (Familia Delphinidae)', 'Delfín oscuro (Lagenorhynchus obscurus)', 'Tonina overa (Cephalorhynchus commersonii)') ~ 'odontoceto',
                           
                           especie %in% c('Lobo marino de dos pelos antártico (Arctocephalus gazella)', 'Lobo marino de dos pelos no identificado (Arctocephalus sp.)', 'Lobo marino de dos pelos sudamericano (Arctocephalus australis)', 'Lobo marino de dos pelos sudamericano (Arctocephalusaustralis)', 'Lobo marino de un pelo sudamericano (Otaria flavescens)') ~ 'pinipede',
                           
                           especie %in% c('Cormorán Imperial (Phalacrocorax albiventer)', 'Pingüino no identificado (Familia Spheniscidae)', 'Pingüino patagónico (Spheniscus magellanicus)', 'Pingüino penacho amarillo austral (Eudyptes chrysocome)','Pingüino rey (Aptenodytes patagonicus)'))) ~ 'pinguin'

But when I do this, the dt changes from data frame to Value (it's "excluded"):

> class(dt)
[1] "formula"

The second problem is that:

  • I noticed that one dolphin and one penguin came to the misticeto (whale) condition.
misticeto<-filter(dt, reino == "misticeto");misticeto

> table(misticeto$especie)

Ballena barbada no identificada (Parvorden Mysticeti)  Ballena barbada no identificada(Parvorden Mysticeti) 
                                                   23                                                     1 
                  Ballena fin (Balaenoptera physalus)          Ballena franca austral (Eubalaena australis) 
                                                    4                                                    33 
            Ballena jorobada (Megaptera novaeangliae)    Ballena rorcual no identificada (Balaenoptera sp.) 
                                                   13                                                     9 
                  Ballena sei (Balaenoptera borealis)               Delfín oscuro (Lagenorhynchus obscurus) 
                                                    8                                                     1 
               Pingüino rey (Aptenodytes patagonicus) 
                                                    1 

#The Delfín oscuro (Lagenorhynchus obscurus) and Pingüino rey (Aptenodytes patagonicus)  shouldn't be there.

  • And also, not all dolphins are going to the reino column, one is missing (who came to misticeto condition).
odontocetos<-filter(dt, reino == "odontoceto");odontocetos
> table(odontocetos$especie)

   Delfín austral (Lagenorhynchus australis) Delfín no identificado (Familia Delphinidae) 
                                          25                                           12 
  Tonina overa (Cephalorhynchus commersonii) 
                                           8 

#The dolphin that came to misticeto condition is missing here, 'Delfín oscuro (Lagenorhynchus obscurus)'

Someone can help me, please? I don't understand what is happening.

Thanks!

UPDATE

I gonna insert what R is showing for me, it's from here that I'm coping the names of the species

> table(dt$especie)

          Ballena barbada no identificada (Parvorden Mysticeti) 
                                                             24 
                            Ballena fin (Balaenoptera physalus) 
                                                              4 
                   Ballena franca austral (Eubalaena australis) 
                                                             33 
                      Ballena jorobada (Megaptera novaeangliae) 
                                                             13 
             Ballena rorcual no identificada (Balaenoptera sp.) 
                                                              9 
                            Ballena sei (Balaenoptera borealis) 
                                                              8 
                   Cormorán Imperial (Phalacrocorax albiventer) 
                                                              1 
                      Delfín austral (Lagenorhynchus australis) 
                                                             25 
                   Delfín no identificado (Familia Delphinidae) 
                                                             12 
                        Delfín oscuro (Lagenorhynchus obscurus) 
                                                              1 
     Lobo marino de dos pelos antártico (Arctocephalus gazella) 
                                                              1 
   Lobo marino de dos pelos no identificado (Arctocephalus sp.) 
                                                             42 
Lobo marino de dos pelos sudamericano (Arctocephalus australis) 
                                                             29 
 Lobo marino de dos pelos sudamericano (Arctocephalusaustralis) 
                                                              1 
        Lobo marino de un pelo sudamericano (Otaria flavescens) 
                                                             22 
                Lobo marino no identificado (Familia Otariidae) 
                                                             31 
                Pingüino no identificado (Familia Spheniscidae) 
                                                              3 
                  Pingüino patagónico (Spheniscus magellanicus) 
                                                             27 
        Pingüino penacho amarillo austral (Eudyptes chrysocome) 
                                                              1 
                         Pingüino rey (Aptenodytes patagonicus) 
                                                              1 
                     Tonina overa (Cephalorhynchus commersonii) 

For me, looks exactly the same as in the codes Ballena franca austral (Eubalaena australis)


Solution

  • Try this:

    library(data.table)
    library(dplyr)
    
    dt <- data.table(especie = c("Ballena franca austral (Eubalaena australis)", "Ballena barbada no identificada (Parvorden Mysticeti)", "Ballena jorobada (Megaptera novaeangliae)", "Ballena rorcual no identificada (Balaenoptera sp.)", "Ballena sei (Balaenoptera borealis)", "Ballena fin (Balaenoptera physalus)", "Delfín austral (Lagenorhynchus australis)", "Delfín no identificado (Familia Delphinidae)", "Tonina overa (Cephalorhynchus commersonii)", "Delfín oscuro (Lagenorhynchus obscurus)", "Lobo marino de dos pelos no identificado (Arctocephalus sp.)", "Lobo marino no identificado (Familia Otariidae)", "Lobo marino de dos pelos sudamericano (Arctocephalus australis)", "Lobo marino de un pelo sudamericano (Otaria flavescens)", "Lobo marino de dos pelos antártico (Arctocephalus gazella)", "Lobo marino de dos pelos sudamericano (Arctocephalusaustralis)", "Pingüino patagónico (Spheniscus magellanicus)", "Pingüino no identificado (Familia Spheniscidae)", "Cormorán Imperial (Phalacrocorax albiventer)", "Pingüino penacho amarillo austral (Eudyptes chrysocome)", "Pingüino rey (Aptenodytes patagonicus)"))
    
    library(textclean)
    dt_2 <- dt %>% mutate(especie = replace_non_ascii(especie))
    
    misticeto <- replace_non_ascii(c("Ballena franca austral (Eubalaena australis)", "Ballena barbada no identificada (Parvorden Mysticeti)", "Ballena jorobada (Megaptera novaeangliae)", "Ballena rorcual no identificada (Balaenoptera sp.)", "Ballena sei (Balaenoptera borealis)", "Ballena fin (Balaenoptera physalus)", "Ballena barbada no identificada(Parvorden Mysticeti)"))
    
    odontoceto <- replace_non_ascii(c("Delfín austral (Lagenorhynchus australis)", "Delfín no identificado (Familia Delphinidae)", "Tonina overa (Cephalorhynchus commersonii)", "Delfín oscuro (Lagenorhynchus obscurus)"))
    
    pinipede <- replace_non_ascii(c("Lobo marino de dos pelos no identificado (Arctocephalus sp.)", "Lobo marino no identificado (Familia Otariidae)", "Lobo marino de dos pelos sudamericano (Arctocephalus australis)", "Lobo marino de un pelo sudamericano (Otaria flavescens)", "Lobo marino de dos pelos antártico (Arctocephalus gazella)", "Lobo marino de dos pelos sudamericano (Arctocephalusaustralis)"))
    
    penguin <- replace_non_ascii(c("Pingüino patagónico (Spheniscus magellanicus)", "Pingüino no identificado (Familia Spheniscidae)", "Cormorán Imperial (Phalacrocorax albiventer)", "Pingüino penacho amarillo austral (Eudyptes chrysocome)", "Pingüino rey (Aptenodytes patagonicus)"))
    
    
    dt_2 <- dt_2 %>%
      mutate(reino = case_when(
        especie %in% misticeto ~ "misticeto",
        especie %in% odontoceto ~ "odontoceto",
        especie %in% pinipede ~ "pinipede",
        especie %in% penguin ~ "penguin"
      ))
    
    

    I just guess the problem of invisible non-ASCII characters. Remove them from all text. If this does not work, you may have other issues and i will delete this answer.