Search code examples
regexperlmultilineregex-negation

Regex to keep only first N columns in every line of CSV file


I'm using Perl to process a CSV file.

How can I drop everything (except the newline character) on every line starting with the fifth comma?

E.g. "a,b,c,d,e,f,g,h,i,\n" would become "a,b,c,d,e\n".

$entire_csv_file_contents =~ s/what do I write here?//gm;

Because the data will not contain quoted fields etc., Text::CSV doesn't have to be used here.


Solution

  • For example:

    $entire_this_is_not_csv_file_contents =~ s/^(([^,]+,){4}[^,]+).*/$1/gm;
    

    If you don't need perl 5.8.x compatibility, you can use the \K escape, so no capturing is necessary (thanks to amon for the suggestion):

    $entire_this_is_not_csv_file_contents =~ s/^(?:[^,]+,){4}[^,]+\K.*//gm;
    

    Also, depending on whether the fields may be empty or not, you should replace the "+" here by "*" (also thanks to amon).