I am looking for a package/way that would allow me to download index compositions from various websites. Index compositions changes rarely and are easily available but I can't find any csv available online.
How can I load say the CAC 40 definition ?
PS: What I care about are the names/isin/sicovam not really the weights in the index
You can find the composition of the CAC40 at Wikipedia, and download and process with package XML
.
The function readHTMLTable()
is particularly useful, since it will find and parse all tables on the page. In this case the relevant table is the second, hence the index [[2]]
in the code. Try:
library(XML)
url <- "http://en.wikipedia.org/wiki/CAC_40"
dat <- readHTMLTable(url)[[2]]
head(dat[, 1:3])
Company ICB Sector Ticker symbol
1 Accor hotels AC
2 Air Liquide commodity chemicals AI
3 Alstom industrial machinery ALO
4 ArcelorMittal steel MT
5 AXA full line insurance CS
6 BNP Paribas banks BNP
The same code also works for the FTSE 100:
url <- "http://en.wikipedia.org/wiki/FTSE_100_Index"
dat <- readHTMLTable(url)[[2]]
head(dat[, 1:3])
Company Sector Market cap (£bn)
1 Royal Dutch Shell Oil and gas 135
2 HSBC Banking 129
3 BP Oil and gas 85
4 Vodafone Group Telecomms 83
5 GlaxoSmithKline Pharmaceuticals 73
6 British American Tobacco Tobacco 69