Suppose I have the following data:
> print(data)
date gdp unemp_rate cpi_index rpi_index var1 var2 var3 var4
1 8/31/2009 23:00:00 0.002000575 0.0 0.006539081 0.008466604 0.041601305 0.193230747 0.002260496 0.016428674
2 12/1/2009 0:00:00 0.003890642 0.0 0.007278347 0.011660448 0.012048193 0.022703903 0.003004489 0.015541372
3 3/1/2010 0:00:00 0.005088272 0.2 0.007439852 0.011065007 0.028750000 -0.222946928 0.002789741 0.015225019
4 5/31/2010 23:00:00 0.009771946 -0.1 0.012874890 0.019151847 0.015448707 0.137959240 0.000843318 0.003532365
5 8/31/2010 23:00:00 0.006467518 -0.1 0.002928654 0.004474273 0.016524217 0.119414245 0.001498776 0.006978383
6 12/1/2010 0:00:00 0.000247441 0.1 0.010128833 0.011135857 -0.021860987 -0.098281638 -0.002076772 0.013506623
7 3/1/2011 0:00:00 0.005362386 -0.1 0.014669842 0.017180617 -0.008997135 -0.104039862 0.000737306 0.005057618
8 5/31/2011 23:00:00 0.002284132 0.1 0.015393251 0.017323517 -0.003816573 0.108217236 0.001119267 0.006603190
9 8/31/2011 23:00:00 0.006963250 0.4 0.006089083 0.005534270 0.019330121 0.191878865 0.001567801 0.006242028
10 12/1/2011 0:00:00 -0.000147759 0.1 0.009548705 0.010160881 -0.010990888 -0.079442157 0.002012014 0.010126316
11 3/1/2012 0:00:00 0.000677327 -0.2 0.003463403 0.004191115 -0.000230322 -0.091365159 0.004426378 0.009523975
12 5/31/2012 23:00:00 -0.001779548 -0.2 0.008184866 0.010851419 0.012325059 0.013284528 0.010746497 0.013690636
13 8/31/2012 23:00:00 0.008329224 -0.1 0.002730592 0.003715937 0.017636684 0.072261170 0.001646379 0.018186905
14 12/1/2012 0:00:00 -0.003377040 -0.1 0.012079435 0.011929247 -0.006708783 -0.005000292 0.000966773 0.012595370
15 3/1/2013 0:00:00 0.005957449 0.0 0.004513875 0.005691057 -0.000787978 -0.137470909 0.000978465 0.015526088
16 5/31/2013 23:00:00 0.006427064 0.0 0.007228126 0.009296686 0.018419422 0.225735629 0.001693297 0.014078954
17 8/31/2013 23:00:00 0.007166400 -0.2 0.003024506 0.004805767 0.024889381 0.189354653 0.002163410 0.012134669
18 12/1/2013 0:00:00 0.004061822 -0.4 0.006102001 0.006377043 0.011171074 0.039039948 0.004515371 0.011188655
19 3/1/2014 0:00:00 0.006772674 -0.4 0.000928235 0.005544554 0.022735763 -0.085462281 0.004334033 0.021969445
20 5/31/2014 23:00:00 0.007517419 -0.5 0.007057474 0.008270973 0.039503209 0.093873476 0.004611893 0.015191039
21 8/31/2014 23:00:00 0.006551699 -0.3 0.000405809 0.003515625 0.039508032 0.085234886 0.004022014 0.011791335
I want to create a correlation matrix / heatmap where I have var1, var2, var3 and var4 on the left hand side, whilst having gdp, unemp_rate, cpi_index and rpi_index on the top.
I have sketched out what I mean using Excel:
I have tried to use packages such as corrplot
and corrgram
to construct the cprrelation matrix, but so far have had no success. I do not need the output to be exactly like the Excel sketch - I just need it so that I have var1, var2, var3 and var4 on the left hand side, whilst having gdp, unemp_rate, cpi_index and rpi_index on the top. It doesn't have to be corrplot
or corrgram
either - any other packages that can get the desired output is perfectly fine.
Any help on this would be greatly appreciated.
Thanks in advance!
Here's the dput(data)
if you are looking to put this in R.
> dput(data)
structure(list(date = structure(c(16L, 1L, 6L, 11L, 17L, 2L,
7L, 12L, 18L, 3L, 8L, 13L, 19L, 4L, 9L, 14L, 20L, 5L, 10L, 15L,
21L), .Label = c("12/1/2009 0:00:00", "12/1/2010 0:00:00", "12/1/2011 0:00:00",
"12/1/2012 0:00:00", "12/1/2013 0:00:00", "3/1/2010 0:00:00",
"3/1/2011 0:00:00", "3/1/2012 0:00:00", "3/1/2013 0:00:00", "3/1/2014 0:00:00",
"5/31/2010 23:00:00", "5/31/2011 23:00:00", "5/31/2012 23:00:00",
"5/31/2013 23:00:00", "5/31/2014 23:00:00", "8/31/2009 23:00:00",
"8/31/2010 23:00:00", "8/31/2011 23:00:00", "8/31/2012 23:00:00",
"8/31/2013 23:00:00", "8/31/2014 23:00:00"), class = "factor"),
gdp = c(0.002000575, 0.003890642, 0.005088272, 0.009771946,
0.006467518, 0.000247441, 0.005362386, 0.002284132, 0.00696325,
-0.000147759, 0.000677327, -0.001779548, 0.008329224, -0.00337704,
0.005957449, 0.006427064, 0.0071664, 0.004061822, 0.006772674,
0.007517419, 0.006551699), unemp_rate = c(0, 0, 0.2, -0.1,
-0.1, 0.1, -0.1, 0.1, 0.4, 0.1, -0.2, -0.2, -0.1, -0.1, 0,
0, -0.2, -0.4, -0.4, -0.5, -0.3), cpi_index = c(0.006539081,
0.007278347, 0.007439852, 0.01287489, 0.002928654, 0.010128833,
0.014669842, 0.015393251, 0.006089083, 0.009548705, 0.003463403,
0.008184866, 0.002730592, 0.012079435, 0.004513875, 0.007228126,
0.003024506, 0.006102001, 0.000928235, 0.007057474, 0.000405809
), rpi_index = c(0.008466604, 0.011660448, 0.011065007, 0.019151847,
0.004474273, 0.011135857, 0.017180617, 0.017323517, 0.00553427,
0.010160881, 0.004191115, 0.010851419, 0.003715937, 0.011929247,
0.005691057, 0.009296686, 0.004805767, 0.006377043, 0.005544554,
0.008270973, 0.003515625), var1 = c(0.041601305, 0.012048193,
0.02875, 0.015448707, 0.016524217, -0.021860987, -0.008997135,
-0.003816573, 0.019330121, -0.010990888, -0.000230322, 0.012325059,
0.017636684, -0.006708783, -0.000787978, 0.018419422, 0.024889381,
0.011171074, 0.022735763, 0.039503209, 0.039508032), var2 = c(0.193230747,
0.022703903, -0.222946928, 0.13795924, 0.119414245, -0.098281638,
-0.104039862, 0.108217236, 0.191878865, -0.079442157, -0.091365159,
0.013284528, 0.07226117, -0.005000292, -0.137470909, 0.225735629,
0.189354653, 0.039039948, -0.085462281, 0.093873476, 0.085234886
), var3 = c(0.002260496, 0.003004489, 0.002789741, 0.000843318,
0.001498776, -0.002076772, 0.000737306, 0.001119267, 0.001567801,
0.002012014, 0.004426378, 0.010746497, 0.001646379, 0.000966773,
0.000978465, 0.001693297, 0.00216341, 0.004515371, 0.004334033,
0.004611893, 0.004022014), var4 = c(0.016428674, 0.015541372,
0.015225019, 0.003532365, 0.006978383, 0.013506623, 0.005057618,
0.00660319, 0.006242028, 0.010126316, 0.009523975, 0.013690636,
0.018186905, 0.01259537, 0.015526088, 0.014078954, 0.012134669,
0.011188655, 0.021969445, 0.015191039, 0.011791335)), .Names = c("date",
"gdp", "unemp_rate", "cpi_index", "rpi_index", "var1", "var2",
"var3", "var4"), class = "data.frame", row.names = c(NA, -21L
))
You can try the heatmap.2
function from the gplots
package which I like for heatmaps and it will give something very similar to the graph you are after (I rounded to second digit for the example below. Use as many digits as you want):
Some data manipulation initially:
mycor <- cor(df[-1])
mycor <- round(mycor[5:8,1:4], 2)
mycor
#the data to plot
> mycor
gdp unemp_rate cpi_index rpi_index
var1 0.53 -0.31 -0.54 -0.39
var2 0.33 -0.03 -0.08 -0.10
var3 -0.18 -0.49 -0.31 -0.23
var4 -0.04 -0.29 -0.51 -0.45
And for the plot:
#libraries needed
library(gplots)
library(RColorBrewer)
#create the colours you need. In your case red, white and again red.
#You can specify any combination you want.
#If you want to intensify white try the below with
#c('c('red','white','white','red') and see what happens
my_palette <- colorRampPalette(c('red','white','red'))
#use the function below to plot the heatmap according to mycor table
heatmap.2(mycor, cellnote= mycor, main='Correlation', notecol='black',
density.info='none', trace='none', col=my_palette, dendrogram='none',
Colv='NA', margin=c(10,6))
It is a very easy to use function, you can easily specify the colours you need and there are lots and lots of arguments you can use to change things in case you want something in a different way. Check ?heatmap.2
.