I want to collect user names from member-list pages like this: http://www.marksdailyapple.com/forum/memberslist/
I want to get every username from all the pages,
and I want to do this in linux,with bash
where should I start,could anyone me some tips?
This is what my Xidel was made for:
xidel http://www.marksdailyapple.com/forum/memberslist/ -e 'a.username' -f '(//a[@rel="Next"])[1]'
With that simple line it will parse the pages with a proper html parser, use css selectors to find all links with names, use xpath to find the next page and repeat it until all pages are processed
You can also write it using only css selectors:
xidel http://www.marksdailyapple.com/forum/memberslist/ -e 'a.username' -f 'div#pagination_top span.prev_next a'
Or pattern matching. There you basically just copy the html elements you want to find from the page source and replace the text content with {.}
:
xidel http://www.marksdailyapple.com/forum/memberslist/ -e '<a class="username">{.}</a>*' -f '<a rel="next">{.}</a>'