Search code examples
rread.csvkorean-nlp

Error while reading CSV containing Korean language


I am trying to read CSV file in which one column contain korean text using below lines

Sys.setlocale(category="LC_ALL", locale = "Korean")
old <- read.csv("Past-Korean.csv", encoding = "utf-8",header=T,na.strings=c("")) 

But I am getting error

Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals,  : 
  invalid multibyte string at '<ec><8b><9c>스템 ë¬¸ì œ'

I am able to read Chinese and Japanese using similar sytax, but facing issue while reading Korean Can anyone help me here?


Solution

  • In absence of sample data I can't test it but would you mind trying this approach?

    library(readr)
    locale("ko")
    
    df <- read_csv(file = "your_csv_file.csv", 
                   locale = locale(date_names = "ko", encoding = "UTF-8"))