Search code examples
ralgorithmdetection

Gender detection in R


is there a way to do gender detection from a list of European names in R. Thanks in advance As example I have this list of names surname couples:

namesurname<-c("Hassan Al-Khayr",        "Flores Juberías Carlos" ,"Géza Lévai"  ,           "Miklós Lipták"     ,     "László Péter"    ,       "László Váradi"    ,      "Sándor Molnár"     ,    
  "Csaba Attila Nemes"  ,   "Zoltán Károly"     ,     "István Bajza"      ) 

Solution

  • The {genderizeR} package wraps up calls to genderizer.io's API. Genderizer.io estimates surnames out of a text string, and correlates them with gender values obtained from vast social media metadata, thus it is quite robust for current naming conventions.

    library(tidyverse)
    library(genderizeR)
    
    namesurname<-c("Hassan Al-Khayr", "Flores Juberías Carlos","Géza Lévai", "Miklós Lipták", "László Péter" ,"László Váradi" , "Sándor Molnár", "Csaba Attila Nemes", "Zoltán Károly", "István Bajza")
    
    df_gender <- findGivenNames(x = namesurname, textPrepare = TRUE)
    genderize(x = namesurname, genderDB = df_gender)
    
                          text givenName gender genderIndicators
     1:        Hassan Al-Khayr    hassan   male                3
     2: Flores Juberías Carlos    carlos   male                2
     3:             Géza Lévai      <NA>   <NA>                0
     4:          Miklós Lipták    miklós   male                1
     5:           László Péter    lászló   male                2
     6:          László Váradi    lászló   male                1
     7:          Sándor Molnár    molnár   male                2
     8:     Csaba Attila Nemes    attila   male                3
     9:          Zoltán Károly    zoltán   male                2
    10:           István Bajza    istván   male                2