I have DataFrames
that have whitespace in their column names, because the CSV files they were generated from had whitespace in the names as well. The DataFrame
s were generated with the lines
csvnames::Array{String,1} = filter(x -> endswith(x, ".csv"), readdir(CSV_DIR))
dfs::Dict{String, DataFrame} = Dict( csvnames[i] => CSV.File(CSV_DIR * csvnames[i]) |> DataFrame for i in 1:length(csvnames))
The DataFrame
s have column names such as "Tehtävä 1", but none of the following expressions work when I try to access the column (here ecols
is a dataframe):
plot = axes.plot(ecols[Symbol("Tehtävä 1")])
produces the error TypeError("float() argument must be a string or a number, not 'PyCall.jlwrap'")
plot = axes.plot(ecols[:Tehtävä_1])
produces the error ERROR: LoadError: ArgumentError: column name :Tehtävä_1 not found in the data frame; existing most similar names are: :Tehtävä 1
plot = axes.plot(ecols[:Tehtävä 1])
raises the error ERROR: LoadError: MethodError: no method matching typed_hcat(::DataFrame, ::Symbol, ::Int64)
It therefore seems that I have no way of plotting DataFrame
columns that have spaces in their names. Printing them works just fine, as the line
println(ecols[Symbol("Tehtävä 1")])
produces and array of floats: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
, which it is supposed to. Is Matplotlib just incompatible with DataFrame
s with whitespace in their column names and if it is, how could I remove all whitespace from the columns of a Julia DataFrame
?
I forgot to mention one very crucial point: the DataFrame
contains missing
values, which Matplotlib can't comprehend. This was causing error 1. I would still very much like to know if there is a way of getting rid of any whitespace in the table column names, possibly during the construction of the DataFrame
.
The first approach works just fine, but it seems you are not using PyPlot.jl correctly (in particular you try to create a variable called plot
which will overshadow plot
function from PyPlot.jl).
To see that it works run:
julia> df = DataFrame(Symbol("Tehtävä 1") => 1.0:5.0)
5×1 DataFrame
│ Row │ Tehtävä 1 │
│ │ Float64 │
├─────┼───────────┤
│ 1 │ 1.0 │
│ 2 │ 2.0 │
│ 3 │ 3.0 │
│ 4 │ 4.0 │
│ 5 │ 5.0 │
julia> plot(df[Symbol("Tehtävä 1")])
1-element Array{PyCall.PyObject,1}:
PyObject <matplotlib.lines.Line2D object at 0x000000003F9EE0B8>
and a plot is shown as expected.
EDIT
If you want to remove whitespace from column names of data frame df
write:
names!(df, Symbol.(replace.(string.(names(df)), Ref(r"\s"=>""))))