Search code examples
rduplicatesaggregate

I'm trying to create a new column from an existing column and then delete any duplicate IDS


I'm trying to create a new column from an existing column and then delete any duplicate IDS. My df looks like this: This is what my dataframe currently looks like, it's three columns of unique ID, year, and annual income. The annual income column has annual income for years starting in 2006 and ending in 2023 all in one column, which has created many duplicates of the IDs. However I only want unique/distinct IDs.

dataframe2 <- data.frame(ID = c("1", "2", "3", "4", "4", "5", "5", "5" "6", "7"), 
                        year = c("2006", "2006", "2007", "2006", "2007", "2006", "2007", 
                        "2008",
"2010", 2014"),
                        Annual_Income = c("23389", "607002", "45000", "80000", '67000", 
                        "140000", "23389", "607002", "45000", "80000"))

I've tried a few different options, mainly creating seperate tables per year and  merging them back together, but that seems very messy and unnecessary. My main concern is keeping all the income data per year and setting it up so that its all in one row and then having no duplicate IDs.

I want the df to look like the below:

<pre>  ID. 2006 2007 2008 2009 2010 2014

#1 1 23389 NA NA NA NA NA <br/>
#2 2 607002 NA NA NA NA NA <br/>
#3 3 NA 45000 NA NA NA NA <br/>
#4 4 80000 67000 NA NA NA NA<br/>
#5 5 140000 23389 607002 NA NA NA <br/>
#6 6 NA NA NA NA 45000 NA <br/>
#7 7 NA NA NA NA NA 80000 <code>

I have read a few of the similar posts on here and none quite answer my question (i'm new to R studio having used SAS and Stata in the past). Thanks in advance


Solution

  • A tidyverse solution:

    library(tidyr)
    pivot_wider(dataframe2, id_cols = ID, names_from = year, values_from = Annual_Income)