Search code examples
bashawkicalendar

Process DTEND in a partial ICS file


what I have is an already processed ICS file which contains these three lines for each event:

SUMMARY:"Description"
DTSTART;VALUE=DATE:20230612
DTEND;VALUE=DATE:20230201

The dates can be in two forms:

DATE:20230201
DATE:20220908T153000

I need a bash script that sutracts one day from DTEND date if it's in the short form. DTEND;VALUE=DATE:20230201 would be DTEND;VALUE=DATE:20230131 after the script is through. Dates in long format should be unchanged.

Tried with awk:

#!/bin/bash

# Input file
input_file="input_file.txt"
output_file="output_file.txt"

# Process the file with awk
awk '
/^DTEND;/ {
    # Extract the date part after "DTEND;"
    split($0, arr, ";")
    date = arr[length(arr)]
    
    # Check if the date format is YYYYMMDD (8 digits)
    if (match(date, /^[0-9]{8}$/)) {
        # Parse year, month, day
        year = substr(date, 1, 4)
        month = substr(date, 5, 2)
        day = substr(date, 7, 2)
        
        # Calculate the previous day using date command
        prev_date = strftime("%Y%m%d", mktime(year " " month " " day " 00 00 00") - 86400)
        
        # Replace the original date with the previous day date
        sub(date, prev_date, $0)
    }
}
{ print }
' "$input_file" > "$output_file"

echo "Processing complete. Output saved to $output_file"

But after running the script nothing happend to the dates in short format. Guess i miss something.

Any help would be great ;-) Ps.: No need to use awk, could also be sed or any other command that is available in bash.

Additional Info:
I also tried with another approach: Leave the last 4 digits of dates as 4 digit integer numbers and subtract 1 from that number. Therefor I need a case selection for month boundaries:
case 0201 then 0131
and so forth.
I can omit the january first problem (which would also alter the year since 20220101 - 1 -> 20211231) since there can not be a DTEND on the first of january.
But could not get this second approach to work neither.


Solution

  • You split your array on ; (semicolon). So you get 2 elements and the second is something like VALUE=DATE:20230612. Split on : (colon), instead. And you could use a more accurate regex to select the lines of interest:

    awk -F: -v OFS=: '/^DTEND;VALUE=DATE:[0-9]{8}\r?$/ {
      y=substr($2,1,4); m=substr($2,5,2); d=substr($2,7,2); r=substr($2,9)
      $2 = strftime("%Y%m%d", mktime(y " " m " " d " 0 0 0") - 86400) r}
      1' "$input_file" > "$output_file"
    

    Note the \r? at the end of the regexp and the use of the r=substr($2,9) variable, such that this works with UNIX file format (lines terminated with \n) and DOS file format (lines terminated with \r\n).

    Note: just like your own version this requires GNU awk for the strftime function.