Search code examples
rfill

Fill = T won't work with single letters (?) [R]


I'm using 'fill = T' on a file that has single letters separated by commas:

    Pred
1   T,T
2   NA
3   D
4   NA
5   NA
6   T
7   P,B
8   NA
9   NA  

using the command:

sift <- read.table("/home/pred.txt", header=F, fill=TRUE, sep=',', stringsAsFactors=F)

Which I was hoping the sift will turn out as:

    V1 V2
1    T  T
2 <NA>    
3    D    
4 <NA>   
5 <NA>   
6    T   
7    P  B
8 <NA>   
9 <NA>

However, it comes out like:

    V1 
1    T 
2 <NA>    
3    D    
4 <NA>   
5 <NA>   
6    T   
7    P 
8 <NA>   
9 <NA> 

This code works when there are multiple sampleIDs (separated by a comma) in each row - but not for single letters. Does 'fill' work for single letters? Stupid question, I know.


Solution

  • So here is a workaround:

    url  <- "https://dl.dropboxusercontent.com/s/bjb241s16t63ev8/pred.txt?dl=1&token_hash=AAEBzfCGgoeHgNTvhMSVoZK6qRGrdwwuDZB3h8lWTZNtkA"
    df.1 <- read.table(url,header=F,sep=",",fill=T,stringsAsFactors=F)
    dim(df.1)
    # [1] 149792      1     <-- 149,792 rows and ** 1 ** column
    
    df.2 <- read.table(url,header=F,sep=",",fill=T,stringsAsFactors=F, 
                       col.names=c("V1","V2"))
    dim(df.2)
    # [1] 149633      2     <-- 149,633 rows and ** 2 ** columns
    
    head(df.2[which(nchar(df.2$V2)>0),])
    #      V1 V2
    # 1000  T  T
    # 2419  T  T
    # 3507  T  T
    # 3766  T  D
    # 4308  T  D
    # 4545  T  D
    

    read.table(...) creates a data frame with number of columns determined by the first 5 rows. Since the first 5 rows in your file have only 1 column, that's what you get. Evidently, by specifying sep="," you force read.table(...) to add the "extra" data as extra rows.

    The workaround explicitly sets the number of columns by specifying column names, which could be anything, as long as length(col.names) = 2.