It is possible that this question is already discussed somewhere on this site but I am not able to find it after several searches. Let us say my dataframe has only 2 columns which can be generated with below code
df=data.frame(brand=c('Audi','BMW','Audi','Toyota','Audi','Toyota'),
Model=c('A3','x', 'A4','Corolla','A5','Camry'))
Now I am interested in seeing for each brand, what are the models available. I do not want to sum or mean anything. This should be a simple solution but I am not able to find it.
Best I could find was using dcast
from reshape2
package. I do
dcast(df,brand+Model~.)
and I get this
brand Model .
1 Audi A3 A3
2 Audi A4 A4
3 Audi A5 A5
4 BMW x x
5 Toyota Camry Camry
6 Toyota Corolla Corolla
The first two columns are exactly what I want but I am thinking I am unnecessarily generating a third column and then discarding a later. Also, there is a chance this can be done without any additional library.
So my question is, is there a better or more efficient way of handling this?
(The actual data I have gives the count in the last column under '.' I do not know why the simplified example I created does not give the count but instead just repeats the value in second column. I will just take that up as a separate question if I have to!)
You can simply order the data.frame
by brand
and Model
. If you have duplicated values, you can use unique()
to only get distinct combinations.
unique(df[order(df$brand, df$Model), ])
returns
brand Model
1 Audi A3
3 Audi A4
5 Audi A5
2 BMW x
6 Toyota Camry
4 Toyota Corolla