Search code examples
pythonicalendar

Python ics language encoding


I have this .ics file that I would like to import into python.

I am using the requests and ics libraries to do so.

When I try printing the contents of the calendar (Which include Hebrew letters) I am getting gibberish. however, I tried doing the same procedure with another calendar which also includes Hebrew letters it worked perfectly fine.

This is my code.

import requests as rq
from ics import Calendar

url = "https://example.com/cal/"

c = Calendar(rq.get(url).text)

for event in list(c.events):
    print(event)

I am getting this output

BEGIN:VEVENT
DTSTAMP:20210322T123000Z
DESCRIPTION: ×××× ××××¢× ××××©× ×'\nק×××¦× 11\nתר××× ×ספר 1\n×תר××/ת: ×ר×× ××\n14:30-16:30
DTEND:20210322T143000Z
LOCATION:××××× 805
DTSTART:20210322T123000Z
SUMMARY:תר××× 234114
TRANSP:OPAQUE
UID:202002.234114.LG11.1.×.14:30 
URL:
END:VEVENT

How can I fix this?


Solution

  • requests' .text tries to guess the encoding of the contents (previously using chardet, these days using charset_normalizer). It can be wrong sometimes, resulting in gibberish.

    According to a quick google, the likely encoding for Hebrew is ISO-8859-8. (EDIT: as figured out in comments, it was actually plain UTF-8.)

    Try

    resp = requests.get(url)
    resp.raise_for_status()
    text = resp.content.decode("ISO-8859-8")
    c = Calendar(text)
    

    to be explicit about the encoding?