Search code examples
rmachine-learningtreerandom-forestdata-analysis

Error in prune.tree: can not prune single node tree


I'm trying to do some fake news analysis on a dataframe called news_df. I fit a very simple model and tried to run some cross validation to find the optimal number of n but R says that it can not prune singlenode tree. Any idea why this might be happening?

library(tree)

news_df <- structure(list(title = c("China's Xi says will support Interpol raising its profile", 
"Clinton says Trump may have violated U.S. law on Cuba", "House Oversight head Chaffetz to leave Congress after 2018"
), text = c("BEIJING (Reuters) - China will support Interpol, raising the profile and leadership of the global police cooperation agency, Chinese President Xi Jinping said on Tuesday at the opening of Interpol s general assembly in Beijing, state media reported. Last year, Interpol elected a senior Chinese public security official, Vice Public Security Minister Meng Hongwei, as its president, prompting rights groups to ask whether Beijing could try and use the position to go after dissidents abroad.", 
"CHICAGO (Reuters) - U.S. Democratic presidential nominee Hillary Clinton said on Thursday that Republican opponent Donald Trump may have violated U.S. law, following a news report that one of his companies attempted to do business in Cuba. Newsweek said on Thursday that a hotel and casino company controlled by Trump secretly conducted business with Cuba that was illegal under U.S. sanctions in force during Fidel Castro’s presidency of the Communist-ruled island.", 
"WASHINGTON (Reuters) - U.S. Representative Jason Chaffetz, who chairs a House committee with broad investigative powers, on Wednesday announced his plans to leave Congress after the 2018 midterm elections, saying he had no intention of running for any political office. "
), type = c("Real", "Real", "Fake")), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`8971` = 8971L), class = "omit"))

fit1 <- tree(type~. , data = news_df)
cv.trees <- cv.tree(fit1) #error here
plot(cv.trees$size, cv.trees$dev, type = "b")

Solution

  • I believe the default for the tree package is to have at least ten observations per node. That data has only three observations. Also it admits at most 32 factors per categorical variable so it will probably not admit the title and text variables once you put more observations in.