im working on the Google Data Analytics Capstone Project using my own path and i ended up having a large dataset 86k rows which look something like this
Date loc loc_type variable value
---------------------------------------
1 1 1 1 (num)
---------------------------------------
1 1 1 2 (num)
---------------------------------------
1 1 1 3 (num)
---------------------------------------
1 1 1 4 (num)
---------------------------------------
1 1 1 5 (num)
---------------------------------------
1 1 1 6 (num)
---------------------------------------
1 1 2 1 (num)
---------------------------------------
1 1 2 2 (num)
---------------------------------------
1 1 2 3 (num)
---------------------------------------
1 1 2 4 (num)
---------------------------------------
1 1 2 5 (num)
---------------------------------------
1 1 2 6 (num)
---------------------------------------
1 2 1 1 (num)
---------------------------------------
1 2 1 2 (num)
---------------------------------------
i have no clue where to start but i want something like this
Date loc loc_type var1 var2 var3 var4 var 5 var6
------------------------------------------------------------------
1 1 1 (num) (num) (num) (num) (num) (num)
------------------------------------------------------------------
1 1 2 (num) (num) (num) (num) (num) (num)
------------------------------------------------------------------
1 2 1 (num) (num) (num) (num) (num) (num)
------------------------------------------------------------------
1 2 2 (num) (num) (num) (num) (num) (num)
------------------------------------------------------------------
1 3 1 (num) (num) (num) (num) (num) (num)
------------------------------------------------------------------
1 3 2 (num) (num) (num) (num) (num) (num)
------------------------------------------------------------------
2 1 1 (num) (num) (num) (num) (num) (num)
------------------------------------------------------------------
using tidyr
in r:
# data mockup
df <- data.frame(Date = 1,
loc = rep(1:6, each = 12),
loc_type = rep(1:2, each = 6),
variable = 1:6,
value = 1:72)
df |> tidyr::pivot_wider(names_from = variable, names_prefix = "var", values_from = value)
yields:
Date loc loc_type var1 var2 var3 var4 var5 var6
<dbl> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 2 3 4 5 6
2 1 1 2 7 8 9 10 11 12
3 1 2 1 13 14 15 16 17 18
4 1 2 2 19 20 21 22 23 24
5 1 3 1 25 26 27 28 29 30
6 1 3 2 31 32 33 34 35 36
7 1 4 1 37 38 39 40 41 42
8 1 4 2 43 44 45 46 47 48
9 1 5 1 49 50 51 52 53 54
10 1 5 2 55 56 57 58 59 60
11 1 6 1 61 62 63 64 65 66
12 1 6 2 67 68 69 70 71 72