I am currently analysing two character combinations in texts and I want to visualize the frequencies in a heatmap using gnuplot. My input file is in the format (COUNT stands for the actual number of this combination)
a a COUNT
a b COUNT
...
z y COUNT
z z COUNT
Now I'd like to create a heatmap (like the first one that is shown on this site). On the x axis as well on the y axis I'd like to display the characters from A-Z, i.e.
a
b
...
z
a b ... z
I am pretty new to gnuplot, so I tried plot "input.dat" using 2:1:3 with images
, which results in an error message "Can't plot with an empty x range". My naive approach to run set xrange['a':'z']
did not help much.
There are a bunch of related questions on SO, but they either deal with numeric x-values (e.g. Heatmap with Gnuplot on a non-uniform grid) or with different input data formats (e.g. gnuplot: label x and y-axis of matrix (heatmap) with row and column names)
So my question is: What is the easiest way to transform my input file into a nice gnuplot heatmap?
You need to convert the alphabet characters to integers. It might be possible to do this somehow in gnuplot, but it would probably be messy.
My solution would be to use a quick python script to convert the datafile (let's say it is called data.dat
):
#!/usr/bin/env python2.7
with open('data.dat', 'r') as i:
with open('data2.dat', 'w') as o:
lines = i.readlines()
for line in lines:
line = line.split()
x = str(ord(line[0].lower()) - ord('a'))
y = str(ord(line[1].lower()) - ord('a'))
o.write("%s %s %s\n" % (x, y, line[2]))
This takes a file like this:
a a 1
a b 2
a c 3
b a 4
b b 5
b c 6
c a 7
c b 8
c c 9
and converts it to:
0 0 1
0 1 2
0 2 3
1 0 4
1 1 5
1 2 6
2 0 7
2 1 8
2 2 9
Then you can plot it in gnuplot:
#!/usr/bin/env gnuplot
set terminal pngcairo
set output 'test.png'
set xtics ("a" 0, "b" 1, "c" 2)
set ytics ("a" 0, "b" 1, "c" 2)
set xlabel 'First Character'
set ylabel 'Second Character'
set title 'Character Combination Counts'
plot 'data2.dat' with image
It's a little clunky to set the tics manually that way, but it works fine.