Search code examples
apiweb-scrapingdata-miningyahoo-api

Classifying categories on Yahoo! Answer


now I have a seemingly easy but challenging task.I need to develop a data set of questions,and I classify the questions into two categories:

  1. Factoid questions: "who is the current president of France."
  2. Free questions: "Can you rate the cameras below for me,please?"

now I need to know the percentage of both categories on Yahoo! answer so that I could maintain my data set accordingly,but I don't know a good way of doing this statistic.Doing manually seems really impossible,does anyone have an idea?I would be really grateful,thanks.


Solution

  • You mean, recognize one from the other? Automatically, without any categorization from the site's end? That's probably going to be impossible.

    I think the best you can do is compare some metrics. "Free" questions will probably tend to have more contributions with more text; they would be more heavily discussed if Y!Answers had a discussion system... "Factoid" questions may start with "What is..." more often ... and so on.

    Maybe fetch 100 random questions, do a manual check and write down the percentages.