There is a "Recent changes" feed available on the Wikipedia homepage.
The same is also available as an ATOM feed. It is also possible to watch a single user by going to their user_account and selecting the feed. But is there any way to get to the feed excluding one (or two) users?
Update: Using xmllint I can extract the author names.
wget https://hunspell.s3.amazonaws.com/temp/out.txt
xmllint --xpath "//*[name() = 'feed']/*[name() = 'entry']/*[name() = 'author']/*[name() = 'name']" out.txt
But I want to exclude one or two authors from this feed. For example, Clarityfiend and Shortride.
Update:
When I tried xpath command, it worked very well with one parameter (english). But it failed with a Unicode parameter:
wget https://hunspell.s3.amazonaws.com/todel/out.txt
worked:
xpath -e "/feed/entry[author/name!='Aditya tamhankar' and author/name!='Sushant Madhale']" out.txt > a.txt
did not work:
xpath -e "/feed/entry[author/name!='Aditya tamhankar' and author/name!='संतोष गोरे']" out.txt > filtered.txt
The entry by the second author is still there in filtered output.
grep 'संतोष गोरे' filtered.txt
The second command is OK with Unicode, but it does not display one record correctly...
# (t1='Aditya tamhankar' ; t2='संतोष गोरे'; echo 'setns x=http://www.w3.org/2005/Atom'; echo "cat /x:feed/x:entry[not(x:author/x:name[.='$t1'] | x:author/x:name[.='$t2'])]/descendant::*[self::x:updated or self::x:title or descendant-or-self::x:name]/text()") | xmllint --shell out.txt | tail -n +4 | gawk '{ if(NR % 6 == 0){ print $0 "¬"} else { print $0 }}' |gawk 'BEGIN{FS="\n -------\n" ; RS="\n -------¬\n"; OFS="||"} { print $2,$1,$3 }END{ print FNR}'
All records except this one are correct:
152.238.27.63
/ >
||2021-07-15T20:14:03Z||
19
I suggest that you use xpath
tool from your terminal (Ubuntu package libxml-xpath-perl
). It supports XPath 2:
wget -O - https://hunspell.s3.amazonaws.com/temp/out.txt | xpath -e "/feed/entry[author/name!='Clarityfiend' and author/name!='Shortride']" > filtered.txt
UPD: If there is an out of memory error for input buffer, download the feed into a file rather than standard output:
wget https://hunspell.s3.amazonaws.com/temp/out.txt
xpath -e "/feed/entry[author/name!='Clarityfiend' and author/name!='Shortride']" out.txt > filtered.txt
The XPath query will list all entries with author's name not equal to Clarityfiend or Shortride. The entries will be saved in filtered.txt
.