So, I am new in R and trying to implement a differential gene expression analysis. I'm trying to store gene names as rownames so that I can create a DGEList object.
asthma <- read.csv("Asthma_3 groups-Our study gene expression.csv")
head(asthma, 10)
dim(asthma)
asthma <- na.omit(asthma)
distinct(asthma)
countdata <- asthma[,-1]
head(countdata)
rownames(countdata) <- asthma[,1]
'''
I am getting this error:
Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed
The first column in asthma
likely has duplicate values. Two options I can think of
make.names()
.Here is a reproducible example.
df = data.frame(col1 = c('A', 'A', 'B'), col2 = c(1, 2, 3))
df
That defines a data.frame that looks like this
col1 col2
1 A 1
2 A 2
3 B 3
The data.frame by default has rownames 1, 2, 3. If you try this
rownames(df) = df[,1]
you get an error because df[,1]
has 'A' twice, so it can't be used as a rowname without modification. You use make.names
to create rownames with unique values like this
unique.col1 = make.names(df[,1], unique=T)
unique.col1
This results in
"A" "A.1" "B"
Note that the .1
was added to the second A
to make it different from the first A
. Then define the rownames as unique.col1
:
rownames(df) = unique.col1
df
The data.frame df
now looks like this
col1 col2
A A 1
A.1 A 2
B B 3