Regexp_replace collides with German umlaut ü, ö, ä

I am writing a macro in dbt with SQL to clean names. I elegantly wanted to upper the first letter of the names but my

regexp_replace('(\w)(\w*)', x -> upper(x[1]) || lower(x[2])

collides with the German umlauts ä, ö, ü

So for example the last name schöneberger with my regex expression from above becomes SchöNeberger and not Schöneberger.

Does someone know what to write so I can upper Schöneberger and other name with umlauts as well?

Solution

Athena uses Trino syntax, which uses Java regex syntax. Java supports the extended character classes using Unicode properties from Perl, including \p{L}, which is basically "any Unicode letter." So this will work for you:

regexp_replace(name_col, '(\p{L})(\p{L}*)', x -> upper(x[1]) || lower(x[2]))

Proof: https://regex101.com/r/N84wjS/2

Math.Sin() gives incorrect value
How to run my python script when the sunOS is start booting
Express-session: not resetting cookie expiration on each request
Getting a stack overflow exception when normalizing a vector
Edit default summary function in R gives error for multiple variables
What was a For loop? Why isn't it needed in R?
How to use download button in shiny and save results in various formats (csv, texte, pdf, spss...)?
Why are there two assignment operators, `<-` and `->` in R?
lm()$assign: what is it?
How to get the value of list(...) in R and S functions
Design matrix for MLM from library(lme4) with fixed and random effects
how to generate elements not included in my sample
Create a matrix with gradually changing values without a for loop
Emacs ESS and S-plus ( S+ ) 8.1 compatability
How to lag date-index in a time-series in R?
Nonlinear regression in R / S
Calling R from S-Plus?