Search code examples
rdplyrpivot

How can I aggregate the data on the same line


After a pivot_wider(), I get the following data.frame.
How can I aggregate the data on the same line?

  CODE_C   CODE_P LIB_COMPOSANT LIB_PRODUIT `2020-01-01` `2020-02-01` `2020-03-01` `2020-04-01` `2020-05-01` `2020-06-01`
  <chr>    <chr>  <chr>         <chr>              <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1 FABR**01 FABR** Abricot, 82   Abricot             1.32        NA           NA           NA           NA           NA   
2 FABR**01 FABR** Abricot, 82   Abricot            NA            1.10        NA           NA           NA           NA   
3 FABR**01 FABR** Abricot, 82   Abricot            NA           NA            3.33        NA           NA           NA   
4 FABR**01 FABR** Abricot, 82   Abricot            NA           NA           NA            4.71        NA           NA   
5 FABR**01 FABR** Abricot, 82   Abricot            NA           NA           NA           NA            4.38        NA   
6 FABR**01 FABR** Abricot, 82   Abricot            NA           NA           NA           NA           NA            3.25

To reproduce the data frame :

structure(list(CODE_C = c("FABR**01", "FABR**01", "FABR**01", 
"FABR**01", "FABR**01", "FABR**01"), CODE_P = c("FABR**", "FABR**", 
"FABR**", "FABR**", "FABR**", "FABR**"), LIB_COMPOSANT = c("Abricot, 82", 
"Abricot, 82", "Abricot, 82", "Abricot, 82", "Abricot, 82", "Abricot, 82"
), LIB_PRODUIT = c("Abricot", "Abricot", "Abricot", "Abricot", 
"Abricot", "Abricot"), `2020-01-01` = c(1.32446153846154, NA, 
NA, NA, NA, NA), `2020-02-01` = c(NA, 1.09984615384615, NA, NA, 
NA, NA), `2020-03-01` = c(NA, NA, 3.33157894736842, NA, NA, NA
), `2020-04-01` = c(NA, NA, NA, 4.70916279069767, NA, NA), `2020-05-01` = c(NA, 
NA, NA, NA, 4.37848648648649, NA), `2020-06-01` = c(NA, NA, NA, 
NA, NA, 3.24713953488372)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

The initial pivot_wider() looks like this :

pivot_wider(DONNEES_COMPOSANT,
            names_from = date,
            values_from = PRIX)

Expected output :

 CODE_C   CODE_P LIB_COMPOSANT LIB_PRODUIT `2020-01-01` `2020-02-01` `2020-03-01` `2020-04-01` `2020-05-01` `2020-06-01`
  <chr>    <chr>  <chr>         <chr>              <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1 FABR**01 FABR** Abricot, 82   Abricot             1.32        1.10           3.33           4.71           4.38           3.25   


Solution

  • I'll infer that your input data is

    DONNEES_COMPOSANT <- structure(list(CODE_C = c("FABR**01", "FABR**01", "FABR**01", "FABR**01", "FABR**01", "FABR**01"), CODE_P = c("FABR**", "FABR**", "FABR**", "FABR**", "FABR**", "FABR**"), LIB_COMPOSANT = c("Abricot, 82", "Abricot, 82", "Abricot, 82", "Abricot, 82", "Abricot, 82", "Abricot, 82"), LIB_PRODUIT = c("Abricot", "Abricot", "Abricot", "Abricot", "Abricot", "Abricot"), date = c("2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01"), PRIX = c(1.32446153846154, 1.09984615384615,  3.33157894736842, 4.70916279069767, 4.37848648648649, 3.24713953488372)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))
    DONNEES_COMPOSANT
    # # A tibble: 6 × 6
    #   CODE_C   CODE_P LIB_COMPOSANT LIB_PRODUIT date        PRIX
    #   <chr>    <chr>  <chr>         <chr>       <chr>      <dbl>
    # 1 FABR**01 FABR** Abricot, 82   Abricot     2020-01-01  1.32
    # 2 FABR**01 FABR** Abricot, 82   Abricot     2020-02-01  1.10
    # 3 FABR**01 FABR** Abricot, 82   Abricot     2020-03-01  3.33
    # 4 FABR**01 FABR** Abricot, 82   Abricot     2020-04-01  4.71
    # 5 FABR**01 FABR** Abricot, 82   Abricot     2020-05-01  4.38
    # 6 FABR**01 FABR** Abricot, 82   Abricot     2020-06-01  3.25
    

    To get what you want, we need to specify your first four columns as id_cols:

    pivot_wider(DONNEES_COMPOSANT, id_cols = c(CODE_C, CODE_P, LIB_COMPOSANT, LIB_PRODUIT), names_from = "date", values_from = "PRIX")
    # # A tibble: 1 × 10
    #   CODE_C   CODE_P LIB_COMPOSANT LIB_PRODUIT `2020-01-01` `2020-02-01` `2020-03-01` `2020-04-01` `2020-05-01` `2020-06-01`
    #   <chr>    <chr>  <chr>         <chr>              <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
    # 1 FABR**01 FABR** Abricot, 82   Abricot             1.32         1.10         3.33         4.71         4.38         3.25