Hi i have the data like this
date 2020/06/10 2020/06/10 2020/06/10 2020/06/11 2020/06/11 2020/06/11
id x y z x y z
10432 0 0 0 0 0 0
10668 0 0 0 0 0 0
11088 0 0 0 0 0 0
And i want my output like this
id date x y z
10432 2020/06/10 0 0 0
10432 2020/06/11 0 0 0
10668 2020/06/10 0 0 0
10668 2020/06/11 0 0 0
11088 2020/06/10 0 0 0
11088 2020/06/11 0 0 0
I would like to get the output in R
This data is extremely messy, but with some work I have managed to get your desired output.
This is what the sample data you provided looks like after I imported it using read_table2()
from the readr
package:
library(readr) # 1.3.1
df <- read_table2("date 2020/06/10 2020/06/10 2020/06/10 2020/06/11 2020/06/11 2020/06/11
id x y z x y z
10432 0 0 0 0 0 0
10668 0 0 0 0 0 0
11088 0 0 0 0 0 0 ")
df[, 8] <- NULL
> df
# A tibble: 4 x 7
date `2020/06/10` `2020/06/10_1` `2020/06/10_2` `2020/06/11` `2020/06/11_1` `2020/06/11_2`
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 id x y z x y z
2 10432 0 0 0 0 0 0
3 10668 0 0 0 0 0 0
4 11088 0 0 0 0 0 0
Note that in R you can't have repeating column names so it changes them slightly. We have to resolve this. (You might not have to do these steps exactly, I cannot know precisely what your columns are called without you providing the data with dput()
.)
Now I paste together the current column names with the 'id' row, and then tidy the column names so that we can work with them later on:
names(df) <- c("id", paste(df[1, -1], names(df)[-1], sep = "_"))
library(stringr) # 1.4.0
names(df)[-1] <- str_remove(names(df)[-1], "_[1-9]$") # this gets rid of
# the "_1" or "_2" R added automatically when I imported the data
We now no longer need the first row as its contents have become repeated in the column names.
df <- df[-1,]
> df
# A tibble: 3 x 7
id `x_2020/06/10` `y_2020/06/10` `z_2020/06/10` `x_2020/06/11` `y_2020/06/11` `z_2020/06/11`
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 10432 0 0 0 0 0 0
2 10668 0 0 0 0 0 0
3 11088 0 0 0 0 0 0
Now we can actually work on this df
and go from wide to long format:
library(tidyr) # 1.1.0
df %>% pivot_longer(-id, names_to = c(".value", "date"),
names_pattern = "(.)_(..../../..)")
# A tibble: 6 x 5
id date x y z
<chr> <chr> <chr> <chr> <chr>
1 10432 2020/06/10 0 0 0
2 10432 2020/06/11 0 0 0
3 10668 2020/06/10 0 0 0
4 10668 2020/06/11 0 0 0
5 11088 2020/06/10 0 0 0
6 11088 2020/06/11 0 0 0
So there we have it - a tidying data solution using the readr
, stringr
and tidyr
packages from tidyverse
- a very nice example, I think, of what tidyverse
was precisely developed for! :)
DATA
Should you need it, here is the tidy df
that we have just before we implement pivot_longer()
on it:
structure(list(id = c("10432", "10668", "11088"), `x_2020/06/10` = c("0",
"0", "0"), `y_2020/06/10` = c("0", "0", "0"), `z_2020/06/10` = c("0",
"0", "0"), `x_2020/06/11` = c("0", "0", "0"), `y_2020/06/11` = c("0",
"0", "0"), `z_2020/06/11` = c("0", "0", "0")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))