Part of my data looks as follows:
> q[,c(1,3)]
Year Language
1 1 C++
2 1 C
3 1 C++
4 1 C
5 1 C++
6 1 JavaScript
7 1 C++
8 2 C++
9 2 inny
10 2 C++
11 2 Java
12 3 Java
13 3 Java
14 3 JavaScript
15 3 Java
16 3 JavaScript
17 3 .NET
18 3 inny
19 3 R
20 3 Python
21 3 .NET
22 3 Python
23 3 Java
24 3 Java
25 3 Java
26 3 Java
27 3 Java
28 3 Java
29 3 C#
30 3 C++
31 3 JavaScript
32 3 C++
33 3 JavaScript
34 3 Java
35 3 Java
36 3 Python
37 3 C#
38 4 R
39 4 C
40 4 Java
41 4 Python
42 4 C++
43 4 .NET
44 4 C#
45 5 inny
46 5 JavaScript
47 5 C#
48 5 Python
49 5 R
50 2 C
The entire dataset named q
also has other columns that are not relevant here.
What I want to achieve is for each year to count the languages that occurred most often.
Sometimes several languages occurred with the same highest max amount! That's why I want to list each such language.
Expected output:
Year Language
1 1 C++
2 2 C++
3 3 Java
4 4 .NET
5 4 C
6 4 C#
7 4 C++
8 4 Java
9 4 Python
10 4 R
11 5 C#
12 5 inny
13 5 JavaScript
14 5 Python
15 5 R
Using dplyr
:
q %>% group_by(Year) %>% summarise(language=names(which(table(Language)==max(table(Language)))))
output:
Year language
<int> <chr>
1 1 C++
2 2 C++
3 3 Java
4 4 .NET
5 4 C
6 4 C#
7 4 C++
8 4 Java
9 4 Python
10 4 R
11 5 C#
12 5 inny
13 5 JavaScript
14 5 Python
15 5 R