Search code examples
rcsvleading-zero

Reading only one column with leading 0 and retainng it in a csv file with over 300 columns


I have a large CSV file a preview of which is shown here:

ID,NUMBER,RLNUMBER,START_DATE,ID1,ID2,....................................................,ID305
1,0100000109,623,2012-01-01,TT,06,........................................................,ADD
2,200000109,515,2013-09-23,FF,009,........................................................,BCC
3,0600000109,611,2014-11-15,HH,90,..........................................................,DGG

As you can see, the column NUMBER has some values with leading '0' and some values without leading '0'. Similarly for the column ID2.

My requirement is that I have to move the contents of this CSV file to another CSV file. The contents of the OUTPUT CSV file should look something like this:

ID,NUMBER,RLNUMBER,START_DATE,ID1,ID2,....................................................,ID305
1,0100000109,623,2012-01-01,TT,6,........................................................,ADD
2,200000109,515,2013-09-23,FF,9,........................................................,BCC
3,0600000109,611,2014-11-15,HH,90,...........................................................DGG

Notice that the values of column NUMBER are retained in the output CSV file along with their leading '0', while all values in column ID2 have had their leading '0' stripped.

For this, I only need to read the column NUMBER and only that column as vector type 'character' into a dataframe and then write the dataframe into the output CSV file (I think).

I know that using

    data_frame<-read.csv("filename",Colclasses = c("integer","character","integer"......)

I can specify vector types for each column while reading the input CSV file. But doing this for more than 300 columns is very difficult. So is there any other way to do this?

I'm very new to Rscript (just started today) and any help would be greatly appreciated.


Solution

  • You could try (since, as far as I understood, you are only interested in the number column):

    data_frame <- read.csv("filename", colClasses=c("NUMBER" = "character"))