juliajulia-dataframe# Row wise median for julia dataframes

I want to compute the median values of all rows in a dataframe. Some columns contain NaN values. Some rows even have all NaN values. The problem with median is

- if there's any NaN values in a vector it returns NaN. In this case I would like to skip NaNs (like in Pandas).
- it is undefined for empty vectors (throws an error). In this case I want to return a NaN (like in Pandas)

I came up with the following solution:

`df = DataFrame(rand(100, 10), :auto) df[1, :x3] = NaN df[20, [:x3, :x6]] .= NaN df[5, :] .= NaN safemedian(y) = all(isnan.(y)) ? NaN : median(filter(!isnan, y)) x = select(df, AsTable(:) => ByRow(safemedian∘collect) => "median")`

This works however it's rather slow.

**Question 1)** Is there a way to speed this up?

I think the collect method is causing the sluggish performance. But I need to use the collect method otherwise I get an error:

`safemedian(y) = all(isnan.(y)) ? NaN : median(filter(!isnan, y)) x = select(df, AsTable(:) => ByRow(safemedian) => "median") # results in ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved`

This is because `AsTable(:)`

passes each row a named tuple.

**Question 2)** Is there a way to pass rows as vectors instead?

This way I could pass the row to any function that expects a vector (for example the `nanmedian`

function from the NaNStatistics.jl Package). Note I would not need to use the `collect`

method if the `AsVector(:)`

method was implemented (see [here]). Unfortunately it didn't get the go ahead and I'm not sure what the alternative is.

**Question 3)** This one is more philisophical. Coming from Python/Pandas some operations in Julia are hard to figure out. Pandas for example handles NaNs seemlessly (for better or worse). In Julia I artificially replace the missing values in my dataframe using `mapcols!(x -> coalesce.(x, NaN), df)`

. This is because many package functions (and functions I've written) are implemented for `AbstractArray{T} where {T<:Real}`

and not `AbstractArray{Union{T, Missing}} where {T<:Real}`

(ie. they don't propagate missings). But since there is no `skipnan`

yet there is a `skipmissing`

function in Julia, I'm thinking I've got it all wrong. Is the idiomatic way to keep missing values in Julia and handle them where appropriate? Or is it ok to use NaN's (and keep the type fixed as say `Float64`

)?

Solution

Try:

```
filter.(!isnan,eachrow(Matrix(df))) .|>
v->isempty(v) ? NaN : median(v)
```

Each library has idiosyncracies which melt away with practice. So coming from paNdas you are familiar with pandas and it feels natural. After a while, it is entirely possible you would find things that are natural in Julia to be awkward in panads.

For more perspective, ask this question on Discourse, which has very recently had a thread directly on these issues.

- map, reduce with `|>` in julia
- Does there exist any alternative of `logspace` in Julia (v1.3.1)?
- Plotting points in a pixel grid on Julia
- Julia: is there a function to obtain the version number of a package?
- Get permutation of one array based on the size ordering of the other array
- Making Julia show different plots in different windows
- invalid identifier while using a custom operator
- How to get equally scaled axes with Plots in Julia
- Scatter plot of two rows of a DataFrame in Julia using Plotly
- Generalizing the inputs of the nlsolve function in Julia
- Adding constraints to jump model from dict
- Non-iterable argument to a function called by Julia `map`?
- Compute row sums and column sums efficiently in Julia
- Julia copying folder into an existing folder
- How to replace a function in Julia?
- How to force Julia to use multiple threads for matrix multiplication?
- Can Revise.jl handle `ERROR: LoadError: invalid redefinition of constant`?
- Define piecewise function with automatic broadcasting in Julia
- julia Handling time difference in dataframe
- Fast tensor-dot on sparse arrays with GPU in any programming language?
- Why do allocations occur during broadcasting assignment to a preallocated array?
- Comparing RK4 to Euler method
- How to put even numbers from matrix in separate vector in Julia?
- Number of iterations performed by a for-loop in Julia
- Can I redefine a function, but still use the old definition within the new definition?
- how do I use analytical form as gradient with ! function?
- Julia equivalent to R `as.numeric()`
- Why does Float16(1.1)-Float16(1)=Float16(0.0996)?
- Julia manual and defining an infix operator
- In Julia, how to convert a unsigned number to a signed number like in C?