Search code examples
csvbatch-filecommandjrepl

remove thousand separator of number in CSV (batch command, using JREPL)


I have the following csv where I have to replace the thousand comma separator with nothing. In example below, when I have the amount "1,000.00" I should have 1000.00 (no comma, no quotes) instead.

I use JREPL to remove header from my csv

jrepl "(?:.*\n){1,1}([\s\S]*)" "$1" /m /f "csv/Transactions.csv" /o "csv/Transactionsfeed.csv")

I was wondering if I could do the process of removing header + dealing with the thousand comma in one step. I am also opened to the option of doing it with another command in a second step...

Tnx ID,Trace ID - Gateway,Profile,Customer PIN,Customer,Ext. ID,Identifier,Amount,Chrg,Curr,Processor,Type,Status,Created By,Date Created,RejectReason
1102845,3962708,SL,John,Mohammad Alo,NA,455015*****9998,900.00,900.00,$,Un,Credit Card,Rejected,Internet,2016-05-16 06:54:10,"-330: Fail by bank, try again later(refer to acquirer)"
1102844,3962707,SL,John,Mohammad Alo,NA,455015*****9998,"1,000.00","1,000.00",$,Un,Credit Card,Rejected,Internet,2016-05-16 06:52:26,"-330: Fail by bank, try again later(refer to acquirer)"

Solution

  • Yes, there is a very efficient and fairly compact and straight-forward solution:

    jrepl "\q(\d{1,3}(?:,\d{3})*(?:\.\d*)*)\q" "$1.replace(/,/g,'')" /x /j /jendln "if (ln==1) $txt=false" /f "csv/Transactions.csv" /o "csv/Transactionsfeed.csv"
    

    The /JENDLN JScript expression strips the header line by setting $txt to false if it is the first line.

    The search string matches any quoted number that contains commas as thousand separators, and $1 is the number without the quotes.

    The replace string is a JScript expression that replaces all commas in the matching $1 number with nothing.

    EDIT

    Note that the above will likely work with any CSV that you are likely to have. However, it would fail if you have a quoted field that contains a quoted number string literal. Something like the following would yield a corrupted CSV with the code above:

    ...,"some text ""123,456.78"" more text",...
    

    This issue can be fixed with a bit more regex code. You only want to modify a quoted number if the opening quote is preceded by a comma or the beginning of the line, and the closing quote should be followed by a comma or the end of line.

    A look-ahead assertion can be used for the trailing comma/EOL. But JREPL does not support look-behind. So the leading comma/BOL must be captured and preserved in the replacement

    jrepl "(^|,)\q(\d{1,3}(?:,\d{3})*(?:\.\d*)*)\q(?=$|,)" "$1+$2.replace(/,/g,'')" /x /j /jendln "if (ln==1) $txt=false" /f "csv/Transactions.csv" /o "csv/Transactionsfeed.csv"
    

    EDIT in response to changing requirement in comment

    The following will simply remove all quotes and commas from quoted CSV fields. I don't like this concept, and I suspect there is a much better way to handle this for import into mysql, but this is what the OP is asking for.

    jrepl "(^|,)(\q(?:[^\q]|\q\q)*\q)(?=$|,)" "$1+$2.replace(/,|\x22/g,'')" /x /j /jendln "if (ln==1) $txt=false" /f "csv/Transactions.csv" /o "csv/Transactionsfeed.csv"