Search code examples
rcsvdata-cleaning

Extracting Column data from .csv and turning every 10 consecutive rows into corresponding columns


Below is the code I am trying to implement. I want to extract this 10 consecutive values of rows and turn them into corresponding columns .

This is how data looks like: https://drive.google.com/file/d/0B7huoyuu0wrfeUs4d2p0eGpZSFU/view?usp=sharing

I have been trying but temp1 and temp2 comes out to be empty. Please help.

library(Hmisc)     #for increment function

myData <- read.csv("Clothing_&_Accessories.csv",header=FALSE,sep=",",fill=TRUE) # reading the csv file

extract<-myData$V2 # extracting the desired column

x<-1    
y<-1

temp1 <- NULL       #initialisation    
temp2 <- NULL       #initialisation    
data.sorted <- NULL #initialisation

limit<-nrow(myData)  # Calculating no of rows

while (x! = limit) {    
  count <- 1    
    for (count in 11) {    
      if (count > 10) {    
         inc(x) <- 1    
         break    # gets out of for loop
      }    
      else {    
         temp1[y]<-data_mat[x]  # extracting by every row element    
      }
      inc(x) <- 1  # increment x
     inc(y) <- 1  # increment y                    
   }
   temp2<-temp1
   data.sorted<-rbind(data.sorted,temp2)  # turn rows into columns 
}

Solution

  • Your code is too complex. You can do this using only one for loop, without external packages, likes this:

    myData <- as.data.frame(matrix(c(rep("a", 10), "", rep("b", 10)), ncol=1), stringsAsFactors = FALSE)
    
    newData <- data.frame(row.names=1:10)
    for (i in 1:((nrow(myData)+1)/11)) {
      start <- 11*i - 10
      newData[[paste0("col", i)]] <- myData$V1[start:(start+9)]
    }
    

    You don't actually need all this though. You can simply remove the empty lines, split the vector in chunks of size 10 (as explained here) and then turn the list into a data frame.

    vec <- myData$V1[nchar(myData$V1)>0]
    
    as.data.frame(split(vec, ceiling(seq_along(vec)/10)))
    
    #    X1 X2
    # 1   a  b
    # 2   a  b
    # 3   a  b
    # 4   a  b
    # 5   a  b
    # 6   a  b
    # 7   a  b
    # 8   a  b
    # 9   a  b
    # 10  a  b