Search code examples
javascriptfacebookgoogle-chromeweb-scrapinggoogle-chrome-extension

Positive Scraping


BACKGROUND: I'm part of a music sharing (links TO gs/yt on pirating) group on facebook. Each day up to 20 members each post a link to a song that they think the community will like, and so we grow in enjoying that genre of music. At the end of the month, we would like to create a list of all the titles of all the songs that we posted each month.

PROBLEM: Facebook does not offer this ability, it only offers a very light search functionality (no sub-filters), and even that search only returns the results of a string that was found in posts. So, ex. search: "B.B. King" will return posts with that string in it, and worse if the post was of that song, but in the post, the user did not comment with 'B.B.King,' the result will not return that link. Fb also does not allow tagging of posts.

MESSING AROUND: Through using a Chrome plugin called Stylish I able (sometimes) to hide most elements of a certain class on facebook pages (aka custom user-agent style) and visually collect what I need. Unfortunately this did not work with my example.

I did narrow down name of the class of the div who's innerHTML contains the track and artist info I need.

GOAL: I would like to create a Chrome plugin that will scrape the page for all instances of this div (with this class name), and then store its innerHTML content in an array that I can later export. Doing so will allow me to create a full month's list and post it as a file in the Facebook group's 'File' tab.

Point me in the right direction and I'll start tinkering!

Thanks in advance.


Solution

  • You need not even resort to scraping! The Graph API is here to help:

    https://developers.facebook.com/docs/reference/api/group/

    So, assuming you don't have a Facebook App yet, here's how you can check out what you can get:

    https://developers.facebook.com/tools/explorer/?method=GET&path=me/groups

    This is the Graph Explorer, a simple tool which shows you the data that is available via the Graph API, you'll want to click Get Access Token and check the user_groups box, then accept the Permissions dialog.

    This will return a JSON object containing all the groups that you are a member of. Grab the id of the one that you are trying to obtain all these links for, and goto it's Graph API feed node:

    https://graph.facebook.com/114817635246802/feed

    You'll need to use an access token here. You can just copy and paste the one from the Graph Explorer. This will return a JSON object containing the most recent posts in the group, as well as Pagination links. Using these, you can retrieve a full list in JSON containing all the links to your music.

    Now, things you'll want to learn: - How to create a Facebook App - How to generate your own access token - How to make API requests programmatically

    Read this tutorial and if you are even vaguely familiar with Javascript and HTML, you'll have something in about 10 minutes. Good luck!