Search code examples
rsortingnatural-sort

Natural sorting with R differs on deployment (maybe OS/Locale issue)


I am using the package "naturalsort" found here: https://github.com/kos59125/naturalsort Natural sorting is not something that is implemented elsewhere in a good manner in R as far as I know, so I was happy to find this package.

I use the function naturalsort to sort file names just like windows explorer, which works great locally.

But when I use it in my production environment deployed with Docker on Google Cloud Run, the sorting changes. I don't know if this is due to changes in locale(I am fra Denmark) or it is due to OS differences between my windows PC and the Docker/Google Cloud Run deployment.

I have created a example ready to be run in R:

######## Code start ###########
require(plumber)
require(naturalsort) #for name sorting

#* Retrieve sorted string list
#* @get /sortstrings
#* @param nothing
function(nothing) {
  
  print(nothing)
  
  test <- c("0.jpg", "file (4_5_1).jpeg", "1 tall thin image.jpeg",
            "8.jpeg", "8.jpg", "file (2.1.2).jpeg", "file (0).jpeg", "3.jpeg",
            "file (1).jpeg", "file (2.1.1).jpeg", "file (0) (3).jpeg", "file (2).jpeg",
            "file (2.1).jpeg", "file (4_5).jpeg", "file (4).jpeg", "file (39).jpeg")
  
  print("Direct sort")
  print(naturalsort(text = test))
  
  sorted_strings <- naturalsort(text = test)
  
  return(sorted_strings) 
}
######## Code end ###########

I would expect it to sort the file names like below, which it does locally both when run directly in the script and also when doing it through plumber RUN API:

    c("0.jpg", 
  "1 tall thin image.jpeg", 
  "3.jpeg", 
  "8.jpeg", 
  "8.jpg", 
  "file (0) (3).jpeg", 
  "file (0).jpeg", 
  "file (1).jpeg", 
  "file (2).jpeg", 
  "file (2.1).jpeg", 
  "file (2.1.1).jpeg", 
  "file (2.1.2).jpeg", 
  "file (4).jpeg", 
  "file (4_5).jpeg", 
  "file (4_5_1).jpeg", 
  "file (39).jpeg"
  )

But instead it sorts it like this:

c("0.jpg",
"1 tall thin image.jpeg",
"3.jpeg",
"8.jpeg",
"8.jpg",
"file (0) (3).jpeg",
"file (0).jpeg",
"file (1).jpeg",
"file (2.1.1).jpeg",
"file (2.1.2).jpeg",
"file (2.1).jpeg",
"file (2).jpeg",
"file (4_5_1).jpeg",
"file (4_5).jpeg",
"file (4).jpeg",
"file (39).jpeg")

Which is not like windows explorer.


Solution

  • Try fixing the collating sequence prior to the naturalsort call. It varies by locale and can affect how strings are compared (and therefore sorted).

    ## Get initial value
    lcc <- Sys.getlocale("LC_COLLATE")
    
    ## Use fixed value
    Sys.setlocale("LC_COLLATE", "C")
    
    sorted_strings <- naturalsort(text = test)
    
    ## Restore initial value
    Sys.setlocale("LC_COLLATE", lcc)
    

    You can find some details in ?sort, ?Comparison, and ?locales and more here.