Search code examples
runicodeunzipimport-from-excel

read_excel returns Unicode characters instead of a table


I would like to get a table from an Excel sheet using readxl package. I have the following code:

dir = tempfile()
download.file("http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip", dir)
unzip(dir, list = TRUE)
file = unzip(dir, list = TRUE)[2, 1]
xl = read_excel(file, sheet = 1)
xl

And the result that I am getting is:

# A tibble: 7 × 11
                                                                                                                                                                                                                                                                                                                                                      `42857`
                                                                                                                                                                                                                                                                                                                                                        <chr>
1 <U+041E><U+0431><U+043B><U+0438><U+0433><U+0430><U+0446><U+0438><U+0438> <U+043C><U+0435><U+0436><U+0434><U+0443><U+043D><U+0430><U+0440><U+043E><U+0434><U+043D><U+044B><U+0445> <U+0444><U+0438><U+043D><U+0430><U+043D><U+0441><U+043E><U+0432><U+044B><U+0445> <U+043E><U+0440><U+0433><U+0430><U+043D><U+0438><U+0437><U+0430><U+0446><U+0438><U+0439>
2                                                                                                                                                                                                                                                                                                                                  <U+2116> <U+043F>/<U+043F>
3                                                                                                                                                                                                                                                                                                                                                        <NA>
4                                                                                                                                                                                                                                                                                                                                                           1
5                                                                                                                                                                                                                                                                                                                                                           2
6                                                                                                                                                                                                                                                                                                                                                           3
7                                                                                                                                                                                                                                                                                                                                                           4
# ... with 10 more variables: X__1 <chr>, X__2 <chr>, X__3 <chr>, X__4 <chr>, X__5 <chr>, X__6 <chr>, X__7 <chr>, X__8 <chr>, X__9 <chr>, X__10 <chr>

Can anybody suggest a solution to this please?

After some corrections:

# A tibble: 7 x 11
                                         `42857`         X__1         X__2         X__3                          X__4              X__5
                                           <chr>        <chr>        <chr>        <chr>                         <chr>             <chr>
1 Облигации международных финансовых организаций         <NA>         <NA>         <NA>                          <NA>              <NA>
2                                          № п/п          НИН         ISIN Торговый код Краткое наименование эмитента Дней до погашения
3                                           <NA>         <NA>         <NA>         <NA>                          <NA>              <NA>
4                                              1 KZP01Y05E384 KZ2D00002623  EABRK281217     Евразийский банк развития               236
5                                              2 KZP02Y05E382 KZ2D00002631  EABRK250418     Евразийский банк развития               353
6                                              3 KZP03Y05E380 KZ2D00002763  EABRK180819     Евразийский банк развития               826
7                                              4 KZP04Y05E388 KZ2D00002771  EABRK180919     Евразийский банк развития               856
# ... with 5 more variables: X__6 <chr>, X__7 <chr>, X__8 <chr>, X__9 <chr>, X__10 <chr>

Solution

  • I think you need to change your locale settings, you can do this using Sys.setlocale("LC_CTYPE", "russian"), Once I have changed the settings to russian I am able to see the UTF-8 to russian alphabets, though I am however not sure about the region you are in .

    To go back to prior settings I have used Sys.setlocale("LC_CTYPE", "english") , under these settings I am getting U+XXX values for the russain (cyrillic) alphabets.

    I am having below column names with russian locale:

    > names(xl)
     [1] "№ п/п"                                      
     [2] "НИН"                                        
     [3] "ISIN"                                       
     [4] "Торговый код"                               
     [5] "Краткое наименование эмитента"              
     [6] "Дней до погашения"                          
     [7] "без дисконта, \"чистая\""                   
     [8] "без дисконта, \"грязная\""                  
     [9] "с  дисконтом, \"грязная\" (для целей биржи)"
    [10] "Цена прошлого периода, % (без дисконта)"    
    [11] "Ставка купона, %"     
    

    Table:

    enter image description here

    Please backup your locale settings before you work this solution.

    You can do :

    oldloc <- Sys.getlocale("LC_CTYPE")
    
    Sys.setlocale("LC_CTYPE", "ru_RU")
    
    Sys.setlocale("LC_CTYPE", oldloc)
    

    Try it may be it will help you. Thanks