Search code examples
rdataframedplyrdense-rank

Issues with "group_by" and "mutate"


I have the following dataframe:

df = structure(list(Day = c(19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 
21L, 21L), Month = c(9, 9, 9, 9, 9, 9, 9, 9, 9, 9), Year = c(2004, 
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004), Date = c("2004-09-19", 
"2004-09-20", "2004-09-20", "2004-09-20", "2004-09-20", "2004-09-21", 
"2004-09-21", "2004-09-21", "2004-09-21", "2004-09-21"), Outlet = c("Le Monde", 
"Financial Times", "Corriere della Sera", "Frankfurter Allgemeine Zeitung", 
"El Pais", "Financial Times", "La Tribune", "Financial Times", 
"Borsen-Zeitung", "Borsen-Zeitung"), Country = c("France", "International", 
"Italy", "Germany", "Spain", "Germany", "France", "Germany", 
"Germany", "Germany")), row.names = c("text1", "text2", "text3", 
"text4", "text5", "text6", "text7", "text8", "text9", "text10"
), class = "data.frame")

       Day Month Year       Date                         Outlet       Country
text1   19     9 2004 2004-09-19                       Le Monde        France
text2   20     9 2004 2004-09-20                Financial Times International
text3   20     9 2004 2004-09-20            Corriere della Sera         Italy
text4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany
text5   20     9 2004 2004-09-20                        El Pais         Spain
text6   21     9 2004 2004-09-21                Financial Times       Germany
text7   21     9 2004 2004-09-21                     La Tribune        France
text8   21     9 2004 2004-09-21                Financial Times       Germany
text9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany
text10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany


I want to have an idex for every row that has the same Date as below:


       Day Month Year       Date                         Outlet       Country   ID
text1   19     9 2004 2004-09-19                       Le Monde        France    1
text2   20     9 2004 2004-09-20                Financial Times International    2
text3   20     9 2004 2004-09-20            Corriere della Sera         Italy    2
text4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany    2
text5   20     9 2004 2004-09-20                        El Pais         Spain    2
text6   21     9 2004 2004-09-21                Financial Times       Germany    3
text7   21     9 2004 2004-09-21                     La Tribune        France    3
text8   21     9 2004 2004-09-21                Financial Times       Germany    3
text9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany    3
text10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany    3

To do it, I do:

library(dplyr)

df %>% group_by(Date) %>% mutate(id = row_number())

      Day Month  Year Date       Outlet                         Country          id
       
 1    19     9  2004 2004-09-19 Le Monde                       France            1
 2    20     9  2004 2004-09-20 Financial Times                International     1
 3    20     9  2004 2004-09-20 Corriere della Sera            Italy             2
 4    20     9  2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany           3
...

Yet, it doesn't work. I don't manage to understand why.

Can anyone help me solve this?

Thanks a lot!


Solution

  • We can use cur_group_id after grouping by 'Date'

    library(dplyr)
    df %>%
        group_by(Date) %>% 
        mutate(ID = cur_group_id()) %>%
        ungroup
    

    Or another option is match without grouping

    df %>%
       mutate(ID = match(Date, unique(Date)))
    

    Or using base R

    df$ID <- with(df, match(Date, unique(Date)))