Search code examples
runicodepackageunicode-escapes

R Package cmd check - unable to identify non-ascii character


I've written a small function that returns the categories of the ICD-10, since I use them frequently. The functions works as expected, however when I want to integrate it into my package it gives me the following error message. I replaced the german Umlauts 'ö', 'ä', 'ü' with the unicode notation \uxxxx but that does not seem to help it. Did I miss some other non-ASCII character? I can not seem to find it

R CMD Check warning

W  checking R files for non-ASCII characters ... 
   Found the following file with non-ASCII characters:
     ICD_10.R
   Portable packages must use only ASCII characters in their R code,
   except perhaps in comments.
   Use \uxxxx escapes for other characters.

Function

#' Get ICD-10 Codes as Character Vector
#' @description  Returns a character vector of length 11 for all ICD-10 Categories
#' @param lang Language for the character vector, curr available in english and german (lang = "ger"), Default: "eng"
#' @return Character Vector containing the 11 categroies for mental disorders (F Codes F01-F99)
#'
#' @author Bjoern 
#'
#' @examples
#' get_ICD_10_cats() # returns english ICD-10 Cats
#' get_ICD_10_cats("ger") # returns the german ones
#' @export
get_ICD_10_cats <- function(lang="eng") {
  eng <- c("F01-F09 Mental disorders due to known physiological conditions",
                    "F10-F19 Mental and behavioral disorders due to psychoactive substance use",
                    "F20-F29 Schizophrenia, schizotypal, delusional, and other non-mood psychotic disorders",
                    "F30-F39 Mood \u005Baffective\u005D disorders",
                    "F40-F48 Anxiety, dissociative, stress-related, somatoform and other nonpsychotic mental disorders",
                    "F50-F59 Behavioral syndromes associated with physiological disturbances and physical factors",
                    "F60-F69 Disorders of adult personality and behavior",
                    "F70-F79 Intellectual disabilities",
                    "F80-F89 Pervasive and specific developmental disorders",
                    "F90-F98 Behavioral and emotional disorders with onset usually occurring in childhood and adolescence",
                    "F99-F99 Unspecified mental disorder")

  ger <-  c(
    "F01-F09 Organische, einschließlich symptomatischer psychischer St\u00F6rungen",
    "F10-F19 Psychische und Verhaltensst\u00F6rungen durch psychotrope Substanzen",
    "F20-F29 Schizophrenie, schizotype und wahnhafte St\u00F6rungen",
    "F30-F39 Affektive St\u00F6rungen",
    "F40-F48 Neurotische, Belastungs- und somatoforme St\u00F6rungen",
    "F50-F59 Verhaltensauff\u00E4lligkeiten mit k\u00F6rperlichen St\u00F6rungen und Faktoren",
    "F60-F69 Pers\u00F6nlichkeits- und Verhaltensst\u00F6rungen",
    "F70-F79 Intelligenzst\u00F6rung",
    "F80-F89 Entwicklungsst\u00F6rungen",
    "F90-F98 Verhaltens- und emotionale St\u00F6rungen mit Beginn in der Kindheit und Jugend",
    "F99-F99 Nicht n\u00E4her bezeichnete psychische St\u00F6rungen"
  )
  if(tolower(lang) %in% c("ger", "de")) return(ger) else return(eng)

}

Solution Edit in a Nutshell

Thanks to Dirk Eddelbuettel and the dang package there is a perfect solution to find non-ASCII characters in your package:

remotes::install_github("eddelbuettel/dang")
dang::checkPackageAsciiCode(dir = ".")

This returns the ASCII Character one has missed, in my case ß, which can be replaced by "\u00DF"


Solution

  • Two things:

    • With R 4.2.* and consistent use of UTF-8 this may no longer be an issue if utf-8 encoding is declared, it may be worth a try

    • Finding such offending non-Ascii character can be a pain; at one point in 2020 I needed that once and extracted the base R code into a function checkPackagesAsciiCode.R in my dang package containing a (somewhat random) collection of functions

    If you have your package on a public repo (GitHub maybe?) I can take a closer look.

    Tschoe mit oe, but in 7bit

    Edit: Otherwise, and from just eyeballing, you have a remaining 'ß' in 'einschließlich' you may want to try replacing by an \uxxxx sequence.