Search code examples
rsortingextract

Get two columns by filtering value of three numeric columns in R


I have this dataframe

df<-structure(list(estado = c("Aguascalientes", "Aguascalientes", 
"Aguascalientes", "Aguascalientes", "Aguascalientes", "Aguascalientes", 
"Aguascalientes", "Aguascalientes"), municipio = c("Aguascalientes", 
"Aguascalientes", "Aguascalientes", "Aguascalientes", "Aguascalientes", 
"Aguascalientes", "Aguascalientes", "Aguascalientes"), `Presidente Municipal` = c("C. ZEFERINO MUÑOZ", 
"C. AURELIO PADILLA", "C. FELIPE RUIZ", "C. GABRIEL CARMONA", 
"C. EVARISTO FEMAT", "C. FRANCISCO ARMENGOL", "LIC. MARIA TERESA JIMENEZ ESQUIVEL", 
"C. LEONARDO MONTAÑEZ  CASTRO"), Sexo = c("H", "H", "H", "H", 
"H", "H", "M", "H"), Partido = c(NA, NA, NA, NA, NA, NA, "PAN", 
"COAL. POR AGUASCALIE"), a = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), b = c(1903, 
1927, 1900, 1925, 1906, 1907, 15, 15), c = c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
    d = c(NA, NA, NA, NA, 1913, 1908, 2019, 2021), e = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), f = c(NA, NA, NA, NA, NA, NA, 14, 14), g = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), h = c(NA, NA, NA, NA, NA, NA, 2021, 2024), id_estado = c(1, 
    1, 1, 1, 1, 1, 1, 1), id_municipio = c(1, 1, 1, 1, 1, 1, 
    1, 1), cve_inegi = c(1001, 1001, 1001, 1001, 1001, 1001, 
    1001, 1001)), row.names = c(NA, -8L), class = "data.frame")

I want to have two extra columns that capture values from columns b, d, h but only in two columns.

For example, in row 1 b is 1903 and d is NA, I want my extra two columns to have 1903 and NA. For row 7 I want to get the value of d 2019 and for h 2021.

Something like:

df2<-structure(list(estado = c("Aguascalientes", "Aguascalientes", 
"Aguascalientes", "Aguascalientes", "Aguascalientes", "Aguascalientes", 
"Aguascalientes", "Aguascalientes"), municipio = c("Aguascalientes", 
"Aguascalientes", "Aguascalientes", "Aguascalientes", "Aguascalientes", 
"Aguascalientes", "Aguascalientes", "Aguascalientes"), `Presidente Municipal` = c("C. ZEFERINO MUÑOZ", 
"C. AURELIO PADILLA", "C. FELIPE RUIZ", "C. GABRIEL CARMONA", 
"C. EVARISTO FEMAT", "C. FRANCISCO ARMENGOL", "LIC. MARIA TERESA JIMENEZ ESQUIVEL", 
"C. LEONARDO MONTAÑEZ  CASTRO"), Sexo = c("H", "H", "H", "H", 
"H", "H", "M", "H"), Partido = c(NA, NA, NA, NA, NA, NA, "PAN", 
"COAL. POR AGUASCALIE"), a = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), b = c(1903, 
1927, 1900, 1925, 1906, 1907, 15, 15), c = c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
    d = c(NA, NA, NA, NA, 1913, 1908, 2019, 2021), e = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), f = c(NA, NA, NA, NA, NA, NA, 14, 14), g = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), h = c(NA, NA, NA, NA, NA, NA, 2021, 2024), id_estado = c(1, 
    1, 1, 1, 1, 1, 1, 1), id_municipio = c(1, 1, 1, 1, 1, 1, 
    1, 1), cve_inegi = c(1001, 1001, 1001, 1001, 1001, 1001, 
    1001, 1001),X = c(1903, 1927, 1900, 1925, 1906, 1907, 2019, 2021), Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 1913, 1908, 2021, 2024)), row.names = c(NA, -8L), class = "data.frame")

So far, I have managed to filter values for my three columns b, d, h, but I'm not sure how to extract the values keeping the rest of the columns. As I'm working with years post 1900 it is valid to assume all numeric values are more than 1900 or NA.This is my attempt:

df2<-df[,c( 7,9,13)][, sapply([,c( 7,9,13)], function(col) is.numeric(col) 
                      && any(col >= 1900)), drop = FALSE]

Solution

  • An approach using dplyr::if_else

    library(dplyr)
    
    cbind(df, 
          with(df, if_else(b >= 1900, data.frame(X=b, Y=d), data.frame(X=d, Y=h))))
    

    output

              estado      municipio               Presidente Municipal Sexo
    1 Aguascalientes Aguascalientes                  C. ZEFERINO MUÑOZ    H
    2 Aguascalientes Aguascalientes                 C. AURELIO PADILLA    H
    3 Aguascalientes Aguascalientes                     C. FELIPE RUIZ    H
    4 Aguascalientes Aguascalientes                 C. GABRIEL CARMONA    H
    5 Aguascalientes Aguascalientes                  C. EVARISTO FEMAT    H
    6 Aguascalientes Aguascalientes              C. FRANCISCO ARMENGOL    H
    7 Aguascalientes Aguascalientes LIC. MARIA TERESA JIMENEZ ESQUIVEL    M
    8 Aguascalientes Aguascalientes       C. LEONARDO MONTAÑEZ  CASTRO    H
                   Partido  a    b  c    d  e  f  g    h id_estado id_municipio
    1                 <NA> NA 1903 NA   NA NA NA NA   NA         1            1
    2                 <NA> NA 1927 NA   NA NA NA NA   NA         1            1
    3                 <NA> NA 1900 NA   NA NA NA NA   NA         1            1
    4                 <NA> NA 1925 NA   NA NA NA NA   NA         1            1
    5                 <NA> NA 1906 NA 1913 NA NA NA   NA         1            1
    6                 <NA> NA 1907 NA 1908 NA NA NA   NA         1            1
    7                  PAN NA   15 NA 2019 NA 14 NA 2021         1            1
    8 COAL. POR AGUASCALIE NA   15 NA 2021 NA 14 NA 2024         1            1
      cve_inegi    X    Y
    1      1001 1903   NA
    2      1001 1927   NA
    3      1001 1900   NA
    4      1001 1925   NA
    5      1001 1906 1913
    6      1001 1907 1908
    7      1001 2019 2021
    8      1001 2021 2024