Search code examples

duplicate 'row.names' are not allowed - R

So, I am new in R and trying to implement a differential gene expression analysis. I'm trying to store gene names as rownames so that I can create a DGEList object.

asthma <- read.csv("Asthma_3 groups-Our study gene expression.csv")
head(asthma, 10)

asthma <- na.omit(asthma)

countdata <- asthma[,-1]

rownames(countdata) <- asthma[,1]
I am getting this error:

Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed


  • The first column in asthma likely has duplicate values. Two options I can think of

    1. Can the first column be combined with another column to generate a new column with unique values that can be used as the rownames?
    2. If not, you can probably use make.names().

    Here is a reproducible example.

    df = data.frame(col1 = c('A', 'A', 'B'), col2 = c(1, 2, 3))

    That defines a data.frame that looks like this

      col1 col2
    1    A    1
    2    A    2
    3    B    3

    The data.frame by default has rownames 1, 2, 3. If you try this

    rownames(df) = df[,1] 

    you get an error because df[,1] has 'A' twice, so it can't be used as a rowname without modification. You use make.names to create rownames with unique values like this

    unique.col1 = make.names(df[,1], unique=T)

    This results in

    "A"   "A.1" "B"  

    Note that the .1 was added to the second A to make it different from the first A. Then define the rownames as unique.col1:

    rownames(df) = unique.col1

    The data.frame df now looks like this

        col1 col2
    A      A    1
    A.1    A    2
    B      B    3