Search code examples
mediawikiwikipediawikipedia-apimediawiki-api

Is there a way to parse wiki talk page?


I was looking to extract comments made by editors on the Wikipedia talk page along woth editor name and timestamp (e.g., https://en.wikipedia.org/wiki/Talk:Coronavirus). Is there any meaningful way to do this at all? Can the comments be extracted while preserving the tree structure - i.e., whether a comment was in response to another comment.

Thank you!


Solution

  • Only approximately. There are tools that attempt it, like python-mwchatter, but in general it's freeform wikitext so there's no reliable method of extracting structure from it.