Search code examples
rspecial-charactersterra

How to accept accented characters in imported shapefile using terra?


I am importing some shapefiles that have bilingual (English/French) province and territory names, which include some accented characters. I want to keep these accented characters, but they are currently being replaced with symbols. How can I do that?

One of the shapefiles I'm using can be downloaded here: https://www12.statcan.gc.ca/census-recensement/alternative_alternatif.cfm?l=eng&dispext=zip&teng=lfsa000b21a_e.zip&k=%20%20%20158240&loc=//www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/files-fichiers/lfsa000b21a_e.zip

I have tried the following:

statdat_fsa <- terra::vect("C:/Users/me/Desktop/lfsa000b21a_e.shp", layer_options = "UTF-8")

and

statdat_fsa <- terra::vect("C:/Users/me/Desktop/lfsa000b21a_e.shp", encoding = "UTF-8")

Both of these produce the following error: Error in .local(x, ...) : unused argument (layer_options = "UTF-8").

I've then been converting these files using sf, and opening them in R for easier viewing.

fsa_sf <- sf::st_as_sf(statdat_fsa)#so I can see the attribute table

If you scroll down in this table, you can quickly see the PRNAME entries where special characters have replaced accented ones. How can I get R to accept these accented characters instead of replacing them with symbols?


Solution

  • terra assumes "UTF-8", but this file appears to be "latin1" encoded (per dog's answer). You can specify that with opts="ENCODING=LATIN1".

    v <- terra::vect("lfsa000b21a_e.shp", opts="ENCODING=LATIN1")
    as.data.frame(v[93])
    #  CFSAUID        DGUID PRUID                        PRNAME LANDAREA
    #1     B3N 2021A0011B3N    12 Nova Scotia / Nouvelle-Écosse   7.5457