Search code examples
rdataframefread

Remove the first column name in a data frame from fread() in R


I am trying to remove the first name from colnames generated via fread(). The first column name acts only as the title of the row names. Later on in the workflow, this "title" really messes up my data since it's treated as one of the rows, so somehow, I need it to be ignored or non-existent.

a subset of my DGE_file looks like this:

            GENE ATGGCGAACCTACATCCC ATGGCGAGGACTCAAAGT
1: 0610009B22Rik                  1                  0
2: 0610009E02Rik                  0                  0

I tried to remove the first column name like this:

library(Matrix)
library("data.table")

# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)

colnames(DGE_file)<-colnames(DGE_file)[-1]
DGE_file<- as.matrix(DGE_file)

which understandably enough yields the error:

> colnames(DGE_file)<-colnames(DGE_file)[-1]
Error in setnames(x, value) : 
  Can't assign 10000 names to a 10001 column data.table

I have already tried to replace it with NA but it yielded an error in downstream processing that I couldn't work around.

How can I remove the title "gene" or make it "invisible" in downstream processing?


Solution

  • The following should work

    library(Matrix)
    library("data.table")
    
    # Read in the dge file
    DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)
    # Set the first column name to the empty string.
    names(DGE_file)[1] <- ""