I want to get the out-links of wikipedia articles. What I mean by out-linkes are the links in What links here
section in wikipedia articles.
For instance, consider the data mining
wikipedia article. What links here
section of this article is in: https://en.wikipedia.org/wiki/Special:WhatLinksHere/Data_mining
I tried to used pywikibot
as follows.
import pywikibot as pw
site = pw.Site('en', 'wikipedia')
print([
cat.title()
for cat in pw.Page(site, 'data mining').categories()
if 'hidden' not in cat.categoryinfo
])
However, it seems like the categories
in pywikibot is different to out-links of wikipedia articles. Therefore, I am wondering how to do this in python.
Note: I am not limited to pywikibot and happy to explore other libraries such as mediawiki
.
I am happy to provide more details if needed.
Try Page.embeddedin()
and Page.backlinks()
methods. You could also directly use the equivalent modules of MediaWiki's API: