Search code examples
pythonfacebookapiunicode

Facebook API: Scraping Problems as well as unknown Hebrew Unicode


I created a facebook app to integrate with my python scripts. I have permissions to get notifications and messages from user.

The problem is when I get notification, I doesn't scrape it well:

notification of Somebody is {u'title_text': u'Texas HoldEm Poker: Hurry! CLAIM your 
$25,000 FREE CHIPS now!'}

as you can see, the "{u'title_text': u' ...... "}" doesn't belong there. How can I get only the text message inside?

the second problem is when I'm trying to get message in hebrew, it looks like this:

{u'title_text': u'\u200e\u200e\u05d7\u05df \u05d1\u05dc\u05d7\u05e0\u05e1\u200e posted 
on Danie's\'s timeline\u200e: "Have a lot of good luck"'}

the "\u200e\u200e\u05d7\u05df \u05d1\u05dc\u05d7\u05e0\u05e1\u200e" is a name of someone in hebrew, how can I encode it to look perfect as the name itself?

Thank you.

Edit: I found that the unicode is "utf-8" and I need to add "u" before the string but what if my program gets a string .. how do I add the "u" to the existing string? Thanks.

Edit: Updated Code:

def insertNewNotification(notification_list, owner):

for notification in notification_list:
    notification = repr(notification['title_text'])
    notification = str(notification)
    notification = unicode(notification, 'unicode-escape')
    notification = notification.encode("UTF-8").decode("UTF-8")
    print "notification of " + owner + " is " + notification
    response = json.load(urllib.urlopen((url + "add_notification&message=" + notification + "&owner=" + owner).encode("UTF-8")))
return 1

Solution

  • {u'title_text': u'Texas HoldEm Poker: Hurry! CLAIM your $25,000 FREE CHIPS now!'} is a representation (repr()) of a dictionary with Unicode key and value.

    u"" is Unicode literal in Python. You type u only inside Python source code.

    You probably has the following code somewhere print "notification of Somebody is", data

    You should use print data['title_text'] instead.

    Learn the difference between:

    >>> s = u"\u2324"
    >>> s
    u'\u2324'
    >>> s.encode('utf-8')
    '\xe2\x8c\xa4'
    >>> print repr(s)
    u'\u2324'
    >>> print s
    ⌤