Search code examples
dataframejuliauppercaselowercase

Julia DataFrames: How to efficiently convert column data and column name to uppercase/lowercase at once?


I have a DataFrame that looks like this:

5×4 DataFrame
 Row │ Col_1      Col_2     Col_a   Col_z  
     │ Float64    Float64   String  String 
─────┼─────────────────────────────────────
   1 │ 0.201256   0.418266  aabbcc  xxyyzz
   2 │ 0.804066   0.136453  aabbcc  xxyyzz
   3 │ 0.442338   0.305655  aabbcc  xxyyzz
   4 │ 0.0676846  0.113499  aabbcc  xxyyzz
   5 │ 0.380939   0.773559  aabbcc  xxyyzz

but with many String columns. What is an efficient (and preferably one-liner) solution to convert both column data and column names to uppercase for only these columns? So to get something like:

5×4 DataFrame
 Row │ Col_1      Col_2     COL_A   COL_Z  
     │ Float64    Float64   String  String 
─────┼─────────────────────────────────────
   1 │ 0.201256   0.418266  AABBCC  XXYYZZ
   2 │ 0.804066   0.136453  AABBCC  XXYYZZ
   3 │ 0.442338   0.305655  AABBCC  XXYYZZ
   4 │ 0.0676846  0.113499  AABBCC  XXYYZZ
   5 │ 0.380939   0.773559  AABBCC  XXYYZZ

Solution

  • if df is your data frame there tare two options.

    If you do not need to keep the column order

    select(df, Not(names(df, AbstractString)), names(df, AbstractString) .=> ByRow(uppercase) .=> uppercase)
    

    If you need to keep the column order:

    select(df, [n => eltype(v) <: AbstractString ? ByRow(uppercase) => uppercase : n for  (n, v) in  pairs(eachcol(df))])
    

    (both solutions assume you do not have missing values in your data as in your question)