Search code examples
rcsvarules

How to remove the extra commas from a csv file?


I was trying to use a csv file in R in read.transactions() command from arules package.

The csv file when opened in Notepad++ shows extra commas for every non-existing values. So, I'm having to manually delete those extra commas before using the csv in read.transactions(). For example, the actual csv file when opened in Notepad++ looks like:

D115,DX06,Slz,,,,
HC,,,,,,
DX06,,,,,,
DX17,PG,,,,,
DX06,RT,Dty,Dtcr,,

I want it to appear like below while sending it into read.transactions():

D115,DX06,Slz
HC
DX06
DX17,PG
DX06,RT,Dty,Dtcr

Is there any way I can make that change in read.transactions() itself, or any other way? But even before that, we don't get to see those extra commas in R(that output I showed was from Notepad++)..

So how can we even remove them in R when we can't see it?


Solution

  • A simple way to create a new file without the trailing commas is:

    file_lines <- readLines("input.txt")
    writeLines(gsub(",+$", "", file_lines),
               "without_commas.txt")
    

    In the gsub command, ",+$" matches one or more (+) commas (,) at the end of a line ($).

    Since you're using Notepad++, you could just do the substitution in that program: Search > Replace, replace ,+$ with nothing, Search Mode=Regular Expression.