I am using the tm.plugin.webmining to get latest news about a company say microsoft using the following command
corpus<-WebCorpus(GoogleBlogSearchSource(stock))
When I run meta(corpus[[1]]) i get
Metadata:
author : character(0) datetimestamp: 2014-07-17 20:28:10 description : Microsoft Layoffs – What it Means for MSFT StockInvestorplace.comWhile the layoffs are obviously going to be hardest on the workers, as investors we still have to take a rational and objective look at the corporation to see what it means for MSFT – particularly if you are personally a Microsoft stock holder ...Why Microsoft (MSFT) Stock Is Up TodayTheStreet.comEarnings Preview: Microsoft Corporation (MSFT), Apple Inc (AAPL), Facebook ...International Business TimesWhat Do Microsoft's Layoff Plans Tell Us About Satya Nadella's Vision?Motley FoolTech Insider -Insider Monkey (blog)all 2,176 news articles » heading : Microsoft Layoffs – What it Means for MSFT Stock - Investorplace.com id : tag:news.google.com,2005:cluster=http://investorplace.com/2014/07/microsoft-layoffs-means-msft-stock/ language : character(0) origin : http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNEadqFvThyxvJU3O5uHa6wiyoWNEw&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52778559643673&ei=Cr3LU8jGNMnNkwX_lYCICQ&url=http://investorplace.com/2014/07/microsoft-layoffs-means-msft-stock/
So here I see that the different attributes are here but when I run
Headers<-sapply(meta(corpus,FUN=function(x){attr(x,"heading")})
Headers is a list of 100 items with null values. I am pretty sure this particular code was running a few days back. What changed in between was I reinstalled the packages on the new system and also updated R to 3.1.1 instead of R 3.1.0(earlier)
What can I do to get separate lists of headers, descriptions timestamp, etc, which I later want to convert into a 100X3 data frame.
With the newest R, Please try the following code:
Code :
headers<-meta(corpus,tag="heading")