Search code examples
gnuplot

gnuplot , non-numeric repeated x values


I have data set (filename 'data') like this:

a 10.1
b 10.1
c 10.2
b 15.56
a 3.20

and I would like to plot this data as points. When I try:

plot 'data' using 2:xticlabels(1)

I get plot with 5 x-axis values a,b,c,b,a but I wish to get only 3 (a,b,c (the order is not important)) on plot with all 5 y values. Is it possible?

My real data file looks like this:

2-8-16-17-18   962.623408
2-3-4-5-6      -97.527840
2-8-9-10-11    962.623408
2-8-9-10-11    937.101308
2-3-4-5-6       37.101308

and has about thousand records.


I don't know how to use mgilson's code, but he give me an idea. I add to data file additional column (index):

1 a 10.1 
2 b 10.1 
3 c 10.2 
2 b 15.56 
1 a 3.20

after which ploting in gnuplot is easy:

plot 'data' u 1:3 

I use perl, so my script lookls like this:

#!/usr/bin/perl 
$index_number = 0; 
while (<>) 
{ 
   $line = $_;
   @columns = split(" ",$line);
   $col1 = $columns[0];
   $col2 = $columns[1];
   if( not exists $non_numeric{$col1} )
   {
      $index_number++;
      $non_numeric{$col1} = $index_number;
   }
   print "".$non_numeric{$col1}."\t".$col1."\t".$col2."\n"; 
}

Solution

  • I doubt that you can come up with a gnuplot only solution. However, this should work as long as you have python2.5 or newer installed on your system. (It works with your test data).

    import sys
    import collections
    
    data = collections.defaultdict(list)
    keys = []
    
    # build a mapping which maps values to xticlabels (hereafter "keys")
    # Keep a second keys list so we can figure out the order we put things into
    # the mapping (dict)
    with open(sys.argv[1]) as f:
        for line in f:
            key,value = line.split()
            data[key.strip()].append( value )
            keys.append(key.strip())
    
    def unique(seq):
        """
        Simple function to make a sequence unique while preserving order.
        Returns a list
        """
        seen = set()
        seen_add = seen.add
        return [ x for x in seq if x not in seen and not seen_add(x) ]
    
    keys = unique(keys) #make keys unique
    
    #write the keys alongside 1 element from the corresponding list.
    for k in keys:
        sys.stdout.write( '%s %s\n' % (k, data[k].pop()) )
    
    # Two blank lines tells gnuplot the following is another dataset
    sys.stdout.write('\n\n')
    
    # Write the remaining data lists in order assigning x-values
    # for each list (starting at 0 and incrementing every time we get
    # a new key)
    for i,k in enumerate(keys):
        v = data[k]
        for item in v:
           sys.stdout.write( '%d %s\n' % (i, item) )
    

    Now the script to plot this:

    set style line 1 lt 1 pt 1
    plot '<python pythonscript.py data' i 0 u 2:xticlabels(1) ls 1,\
         '' i 1 u 1:2 ls 1 notitle
    

    Here's how this works. When you do something like plot ... u 2:xticlabels(1), gnuplot implicitly assigns sequential integer x-values to the data points (starting at 0). The python script re-arranges the data to make use of this fact. Basically, I create a mapping which maps the "keys" in the first column to a list of elements that correspond to that key. In other words, in your dummy datafile, the key 'a' maps to the list of values [10.1, 3.2]. However, python dictionaries (mappings) aren't ordered. So I keep a second list which maintains the order (so that you axes are labelled as 'a', 'b', 'c' instead of 'c','a','b' for instance). I make sure that the axes list is unique so that I can use it to print the necessary data. I write the data in 2 passes. The first pass prints only one value from each list along with the mapping "key". The second pass prints the rest of the values along with the x-value that gnuplot will implicitly assign to them. Between the two datasets, I insert 2 blank lines so that gnuplot can sort out the difference using the index keyword (here abbreviated to i). Now we just need to plot the two datasets accordingly. First we set a linestyle so that both passes will have the same style when plotted. Then we plot index 0 (the first dataset) with the xticlabels and index 1 using the x-value,y-value pairs the python script calculated (u 1:2). Sorry the explanation is long (and that the original version was slightly buggy). Good luck and happy gnuplotting!