I have a Dataframe of several columns say column1, column2...column100. How do I select only a subset of the columns eg (not column1) should return all columns column2...column100.
data[[colnames(data) .!= "column1"]])
doesn't seem to work.
I don't want to mutate the dataframe. I just want to select all the columns that don't have a particular column name like in my example
EDIT 2/7/2021: as people seem to still find this on Google, I'll edit this to say right at the top that current DataFrames (1.0+) allows both Not()
selection supported by InvertedIndices.jl
and also string types as column names, including regex selection with the r""
string macro. Examples:
julia> df = DataFrame(a1 = rand(2), a2 = rand(2), x1 = rand(2), x2 = rand(2), y = rand(["a", "b"], 2))
2×5 DataFrame
Row │ a1 a2 x1 x2 y
│ Float64 Float64 Float64 Float64 String
─────┼────────────────────────────────────────────────
1 │ 0.784704 0.963761 0.124937 0.37532 a
2 │ 0.814647 0.986194 0.236149 0.468216 a
julia> df[!, r"2"]
2×2 DataFrame
Row │ a2 x2
│ Float64 Float64
─────┼────────────────────
1 │ 0.963761 0.37532
2 │ 0.986194 0.468216
julia> df[!, Not(r"2")]
2×3 DataFrame
Row │ a1 x1 y
│ Float64 Float64 String
─────┼────────────────────────────
1 │ 0.784704 0.124937 a
2 │ 0.814647 0.236149 a
Finally, the names
function has a method which takes a type as its second argument, which is handy for subsetting DataFrames by the element type of each column:
julia> df[!, names(df, String)]
2×1 DataFrame
Row │ y
│ String
─────┼────────
1 │ a
2 │ a
In addition to indexing with square brackets, there's also the select
function (and its mutating equivalent select!
), which basically takes the same input as the column index in []
-indexing as its second argument:
julia> select(df, Not(r"a"))
2×3 DataFrame
Row │ x1 x2 y
│ Float64 Float64 String
─────┼────────────────────────────
1 │ 0.124937 0.37532 a
2 │ 0.236149 0.468216 a
Original answer below
As @Reza Afzalan said, what you're trying to do returns an array of strings, while column names in DataFrames are symbols.
Given that Julia doesn't have conditional list comprehension, the nicest thing you could do I guess would be
data[:, filter(x -> x != :column1, names(df))]
This will give you the data set with column 1 removed (without mutating it). You could extend this to checking against lists of names as well:
data[:, filter(x -> !(x in [:column1,:column2]), names(df))]
UPDATE: As Ian says below, for this use case the Not
syntax is now the best way to go.
More generally, conditional list comprehensions are also available by now, so you could do:
data[:, [x for x in names(data) if x != :column1]]