I am currently using pywikibot
to obtain the categories of a given wikipedia page (e.g., support-vector machine
) as follows.
import pywikibot as pw
print([i.title() for i in list(pw.Page(pw.Site('en'), 'support-vector machine').categories())])
The results I get is:
[
'Category:All articles with specifically marked weasel-worded phrases',
'Category:All articles with unsourced statements',
'Category:Articles with specifically marked weasel-worded phrases from May 2018',
'Category:Articles with unsourced statements from June 2013',
'Category:Articles with unsourced statements from March 2017',
'Category:Articles with unsourced statements from March 2018',
'Category:CS1 maint: Uses editors parameter',
'Category:Classification algorithms',
'Category:Statistical classification',
'Category:Support vector machines',
'Category:Wikipedia articles needing clarification from November 2017',
'Category:Wikipedia articles with BNF identifiers',
'Category:Wikipedia articles with GND identifiers',
'Category:Wikipedia articles with LCCN identifiers'
]
As you can see the results I am getting include lot of tracking and maintenance categories of wikipedia such as;
However, the categories I am only interested are;
I am wondering if there is a way to get all tracing or maintenance
wikipedia categories, so that I can remove them from the results to get only the informative categories.
Or, please suggest me if there are any other ways of eliminating them from the results.
I am happy to provide more details if needed.
pywikibot
currently does not provide some of the API features for filtering hidden categories. You can do that manually by searching for the hidden
key in categoryinfo
:
import pywikibot as pw
site = pw.Site('en', 'wikipedia')
print([
cat.title()
for cat in pw.Page(site, 'support-vector machine').categories()
if 'hidden' not in cat.categoryinfo
])
gives:
['Category:Classification algorithms',
'Category:Statistical classification',
'Category:Support vector machines']
See https://www.mediawiki.org/wiki/Help:Categories#Hidden_categories and https://en.wikipedia.org/wiki/Wikipedia:Categorization#Hiding_categories for more info.