Given column names and column types like these:
col_names = ["A", "B", "C"]
col_types = ["String", "Int64", "Bool"]
I want to create an empty DataFrame
like this:
desired_DF = DataFrame(A = String[], B = Int64[], C = Bool[]) #But I cannot specify every column name and type like this every time.
How do I do this?
I seek either your code snippet for doing the needful or, if you like the following solution I've copied below, please explain it to me.
I've seen a solution here. It works, but I do not understand it, especially the third line, in particular the semicolon at the beginning and the three dots at the end.
col_names = [:A, :B] # needs to be a vector Symbols
col_types = [Int64, Float64]
# Create a NamedTuple (A=Int64[], ....) by doing
named_tuple = (; zip(col_names, type[] for type in col_types )...)
df = DataFrame(named_tuple) # 0×2 DataFrame
Also, I was hoping that perhaps there is an even more elegant way to do the needful?
Let us start with the input:
julia> col_names = ["A", "B", "C"]
3-element Vector{String}:
"A"
"B"
"C"
julia> col_types = [String, Int64, Bool]
3-element Vector{DataType}:
String
Int64
Bool
Note the difference, col_types
need to be types not strings. col_names
are good the way you proposed.
Now there are many ways to solve your problem. Let me show you the simplest one in my opinion:
First, create a vector of vectors that will be columns of your data frame:
julia> [T[] for T in col_types]
3-element Vector{Vector}:
String[]
Int64[]
Bool[]
Now you just need to pass it to DataFrame
constructor, where this vector of vectors is a first argument, and the second argument are column names:
julia> DataFrame([T[] for T in col_types], col_names)
0×3 DataFrame
Row │ A B C
│ String Int64 Bool
─────┴─────────────────────
and you are done.
If you would not have column names you can generate them automatically passing :auto
as a second argument:
julia> DataFrame([T[] for T in col_types], :auto)
0×3 DataFrame
Row │ x1 x2 x3
│ String Int64 Bool
─────┴─────────────────────
This is a simple way to get what you want.
Now let us decompose the approach you mentioned above:
(; zip(col_names, type[] for type in col_types )...)
To understand it you need to know how keyword arguments can be passed to functions. See this:
julia> f(; kwargs...) = kwargs
f (generic function with 1 method)
julia> f(; [(:a, 10), (:b, 20), (:c, 30)]...)
pairs(::NamedTuple) with 3 entries:
:a => 10
:b => 20
:c => 30
Now the trick is that in the example above:
(; zip(col_names, type[] for type in col_types )...)
you use exactly this trick. Since you do not pass a name of the function a NamedTuple
is created (this is how Julia syntax works). The zip
part just creates you the tuples of values, like in my example function above:
julia> collect(zip(col_names, type[] for type in col_types ))
3-element Vector{Tuple{Symbol, Vector}}:
(:A, String[])
(:B, Int64[])
(:C, Bool[])
So the example is the same as passing:
julia> (; [(:A, String[]), (:B, Int64[]), (:C, Bool[])]...)
(A = String[], B = Int64[], C = Bool[])
Which is, given what we have said, the same as passing:
julia> (; :A => String[], :B => Int64[], :C => Bool[])
(A = String[], B = Int64[], C = Bool[])
Which is, in turn, the same as just writing:
julia> (; A = String[], B = Int64[], C = Bool[])
(A = String[], B = Int64[], C = Bool[])
So - this is the explanation how and why the example you quoted works. However, I believe that what I propose is simpler.