Search code examples
pythonicalendar

issue with *.ics splitting strings with more than one line *Python*


I have tried as many methods I could find, and always got the same result, but there must be a fix for this?

I am downloading an ICS from a website, were one of the lines "Summary", is split in two. When I load this into a string these two lines get automaticly joined into 1 string, unless there are "\n".

so I have tried to replace both "\n" and "\r", but there is no change on my issue.

Code

from icalendar import Calendar, Event
from datetime import datetime
import icalendar
import urllib.request
import re
from clear import clear_screen

cal = Calendar()

def download_ics():
    url = "https://www.pogdesign.co.uk/cat/download_ics/7d903a054695a48977d46683f29384de"
    file_name = "pogdesign.ics"
    urllib.request.urlretrieve(url, file_name)

def get_start_time(time):
    time = datetime.strftime(time, "%A - %H:%M")
    return time

def get_time(time):
    time = datetime.strftime(time, "%H:%M")
    return time

def check_Summary(text):
    #newline = re.sub('[\r\n]', '', text)
    newline = text.translate(str.maketrans("", "", "\r\n"))
    return newline

def main():
    download_ics()
    clear_screen()
    e = open('pogdesign.ics', 'rb')
    ecal = icalendar.Calendar.from_ical(e.read())
    for component in ecal.walk():
        if component.name == "VEVENT":
            summary = check_Summary(component.get("SUMMARY"))
            print(summary)
            print("\t Start : " + get_start_time(component.decoded("DTSTART")) + " - " + get_time(component.decoded("DTEND")))

            print()
    e.close()

if __name__ == "__main__":
    main()

output

Young Sheldon S06E11 - Ruthless, Toothless, and a Week ofBed Rest Start : Friday - 02:00 - 02:30

The Good Doctor S06E11 - The Good Boy Start : Tuesday - 04:00 - 05:00

National Treasure: Edge of History S01E08 - Family Tree Start : Thursday - 05:59 - 06:59

National Treasure: Edge of History S01E09 - A Meeting withSalazar Start : Thursday - 05:59 - 06:59

The Last of Us S01E03 - Long Long Time Start : Monday - 03:00 - 04:00

The Last of Us S01E04 - Please Hold My Hand Start : Monday - 03:00 - 04:00

Anne Rice's Mayfair Witches S01E04 - Curiouser and Curiouser Start : Monday - 03:00 - 04:00

Anne Rice's Mayfair Witches S01E05 - The Thrall Start : Monday - 03:00 - 04:00

The Ark S01E01 - Everyone Wanted to Be on This Ship Start : Thursday - 04:00 - 05:00

I have looked through all kinds of solutions, like converting the text to "utf-8" and "ISO-8859-8". I have tried some functions I found in the icalendar. have even asked ChatGPT for help.

as you might see on the first line on the output: Young Sheldon S06E11 - Ruthless, Toothless, and a Week ofBed Rest and National Treasure: Edge of History S01E09 - A Meeting withSalazar

These two lines in the downloaded ics, is on two seperate lines, and i cannot manage to make them split, or not join at all...


Solution

  • So far as the icalendar.Calendar class is concerned, that ical is incorrectly formatted.

    icalendar.Calendar.from_ical() calls icalendar.Calendar.parser.Contentlines.from_ical() which is

        def from_ical(cls, ical, strict=False):
            """Unfold the content lines in an iCalendar into long content lines.
            """
            ical = to_unicode(ical)
            # a fold is carriage return followed by either a space or a tab
            return cls(uFOLD.sub('', ical), strict=strict)
    

    where uFOLD is re.compile('(\r?\n)+[ \t]')

    That means it's removing each series of newlines that is followed by one space or tab character – not replacing it with a space. The ical file you're retrieving has e.g.

    SUMMARY:Young Sheldon S06E11 - \\nRuthless\\, Toothless\\, and a Week of\r\n Bed Rest\r\n
    

    so when of\r\n Bed is matched it becomes ofBed.

    This line-folding format is defined in RFC 2445 which gives the example

    For example the line:

    DESCRIPTION:This is a long description that exists on a long line.
    

    Can be represented as:

    DESCRIPTION:This is a lo
     ng description
      that exists on a long line.
    

    which makes clear that the implementation in from_ical() is correct.

    If you're quite sure that the source ical will always fold lines on words, you could adjust for that by adding a space after each line fold, like:

        ecal = icalendar.Calendar.from_ical(e.read().replace(b'\r\n ', b'\r\n  '))