Search code examples
dataframejuliacellapply

Julia: Apply function to every cell within a DataFrame (without loosing column names)


I am diving into Julia, hence my "novice"-question.

Coming from R and Python, I am used to apply simple functions (arithmetic or otherwise) to entire pandas.DataFrames and data.frames, respectively.

#both R and Python
df - 1               # returns all values -1, given all values are numeric
df == "someString"   # returns a boolean df

a bit more complex

#python
df = df.applymap(lambda v: v - 1 if v > 1 else v)
#R
df[] <- lapply(df, function(x) ifelse(x>1,x-1,x))

The thing is, I don't know how to do this in Julia, I don't find analogue solutions easily on the web. And Stackoverflow helps a lot when using Google. So here it is. How do I do it in Julia?

Thanks for your help!

PS:

So far I have come up with the following solutions, where I loos my column names.

DataFrame(colwise(x -> x .-1, df))

# seems like to much code for only subtracting 1 and loosing col names

Solution

  • Please update your DataFrames.jl installation to version 1.4.2.

    You can do all you want using broadcasting like this:

    julia> df = DataFrame(rand(2,3), :auto)
    2×3 DataFrame
     Row │ x1        x2        x3
         │ Float64   Float64   Float64
    ─────┼──────────────────────────────
       1 │ 0.720264  0.759493  0.998702
       2 │ 0.726994  0.560153  0.243982
    
    julia> df .+ 1
    2×3 DataFrame
     Row │ x1       x2       x3
         │ Float64  Float64  Float64
    ─────┼───────────────────────────
       1 │ 1.72026  1.75949  1.9987
       2 │ 1.72699  1.56015  1.24398
    
    julia> df .< 0.5
    2×3 DataFrame
     Row │ x1     x2     x3
         │ Bool   Bool   Bool
    ─────┼─────────────────────
       1 │ false  false  false
       2 │ false  false   true
    
    julia> df2 = string.(df)
    2×3 DataFrame
     Row │ x1                  x2                  x3
         │ String              String              String
    ─────┼────────────────────────────────────────────────────────────
       1 │ 0.7202642575401104  0.7594928463144177  0.9987024771396766
       2 │ 0.7269944483236035  0.5601527006649413  0.2439815742224939
    
    julia> parse.(Float64, df2)
    2×3 DataFrame
     Row │ x1        x2        x3
         │ Float64   Float64   Float64
    ─────┼──────────────────────────────
       1 │ 0.720264  0.759493  0.998702
       2 │ 0.726994  0.560153  0.243982
    

    Is this what you wanted?