I'm trying to get the links from a particular page in order as presented on page, or reasonably close. I believe I found the correct API call to do so using the parse request, however I'm noticing that I'm getting alot of what I consider "junk" links that are really links done in references. For example, for Albert Einstein, I do the request (http://en.wikipedia.org/w/api.php?action=parse&format=json&page=Albert%20Einstein&redirects=&prop=links) and I will get links that occur in the references like E. T. Whittaker and JSTOR. For my purposes, these links in references are "junk".
Alternatively, I looked at the query command but found that the query command with prop=link will end up just giving me the links alphabetized which loses part of the information I was wanting to look at. Additionally, this API query also includes these "junk" links from within references too.
Is there anyway for me to tell the parse command to ignore the links that are within reference tags or do I need to instead retrieve the text using the API and then do the parsing myself client-side?
I don't think there is a direct way to do this. One workaround would be to get the text of the page, remove the code that actually shows the references ({{reflist}}
or <references />
) and then use the API to parse that. This will add a "junk" link to Help:Cite errors/Cite error refs without references
, but it's easy to ignore that one page.