Search code examples
csvgnuplotnewline

How to handle CSV and multi-line labels in gnuplot?


Let's assume I have the following data. This is exported from LibreOffice as CSV, so I assume this is a correct CSV-format. When I import this CSV into LibreOffice again, I will correctly see the multi-line text in the cell.

Data: MultilineLabels.csv

1,Simple,1.3
2,Single line,2.3
3,"Multiline
label",3.3
4,Simple again,4.3
5,Multiline\nlabel,5.3
6,Simple again,6.3

Now, however, if I want to plot this with the following gnuplot script:

Script:

### How to handle CSV and multi-line labels in gnuplot?
reset session

FILE = "MultilineLabels.csv"
set datafile separator comma

set format x "\n"

plot FILE u 1:3:xtic(2) w lp pt 7 lc "red"
### end of script

I get the following output:

Result:

enter image description here

So, the point and label at x=3, i.e. line 3 and line 4 of the CSV are not plotted for obvious reasons: gnuplot simply interprets this as text file and has no special CSV input filter.

In principle, I could use some external tools (or maybe even gnuplot itself) to replace all newlines within matching double quotes by \n.

Would this be the only solution or are there better solutions?


Solution

  • Parsing a CSV-file can certainly get more complicated than for this simple example below. Linux users probably have some tools for this.

    I prefer gnuplot-only solutions (hence platform-independent), although they maybe cannot compete in terms of speed and efficiency with specialized external tools.

    Here is a very "simple" but not very robust gnuplot-only solution which simply adds two successive lines if the first line contains an odd number of double quotes (definitely room for improvements!). For this to work, you need to load the data 1:1 in a datablock, and furthermore, since you are indexing datablocks you need gnuplot>=5.2.0.

    Data: SO73704046.csv

    1,Simple,1.3
    2,Single line,2.3
    3,"Multiline
    label",3.3
    4,Simple again,4.3
    5,Multiline\nlabel,5.3
    6,Simple again,6.3
    

    Script: (requires gnuplot>=5.2.0)

    ### How to handle CSV and multi-line labels in gnuplot?
    reset session
    
    FILE = 'SO73704046.csv'
    
    FileToDatablock(f,d) = GPVAL_SYSNAME[1:7] eq "Windows" ? \
                           sprintf('< echo   %s ^<^<EOD  & type "%s"',d,f) : \
                           sprintf('< echo "\%s   <<EOD" & cat  "%s"',d,f)     # Linux/MacOS
    load FileToDatablock(FILE,'$DataCSV')
    
    oddDQ(s) = int(sum[j=1:strlen(s)] (s[j:j] eq '"'))%2    # returns 1 if string contains odd number of double quotes, otherwise 0
    
    set print $Data
        c = 1
        while c<=|$DataCSV| {
            if (oddDQ($DataCSV[c])) {
                s = $DataCSV[c]
                print s[1:strlen(s)-1].'\n'.$DataCSV[c+1]
                c=c+2
            }
            else {
                print $DataCSV[c]
                c=c+1
            }
        }
    set print
    
    set datafile separator comma
    set format x "\n"
    
    plot $Data u 1:3:xtic(2) w lp pt 7 lc "red"
    ### end of script
    

    Result:

    Datablock $Data:

    1,Simple,1.3
    2,Single line,2.3
    3,"Multiline\nlabel",3.3
    4,Simple again,4.3
    5,Multiline\nlabel,5.3
    6,Simple again,6.3
    

    enter image description here