Search code examples
rmergelevels

R - merge and resulting factor levels


I want to merge two data frames but have the resulting merged data frame have only the "necessary" number of levels in one of its variables. Like this:

df1 <- data.frame(country=c("AA", "BB"))
df2 <- data.frame(country=c("AA", "BB", "CC"), name=c("Country A", "Country B", "Country C"))
df3 <- merge(df1, df2, by="country")

Then:

> df3
  country      name
1      AA Country A
2      BB Country B

which is what I expected.

However, why are there 3 levels for factor 'name' if there are only 2 lines of data?

> str(df3)
'data.frame':   2 obs. of  2 variables:
 $ country: Factor w/ 2 levels "AA","BB": 1 2
 $ name   : Factor w/ 3 levels "Country A","Country B",..: 1 2

How do I get rid of 'Country C' in df3?

> table(df3)
       name
country Country A Country B Country C
     AA         1         0         0
     BB         0         1         0

Solution

  • You could try:

    table(droplevels(df3))
    #         name
    #country Country A Country B
    # AA         1         0
    # BB         0         1
    

    Here the levels of df2$name are not dropped while you do the merge. Another way would be to:

     df3$name <- factor(df3$name)
     table(df3)
     #     name
    #country Country A Country B
    # AA         1         0
    # BB         0         1