I am trying to parse an RSS feed and condense down the info on the line so that I still have the date and time of the entry, but without milliseonds or wasted spaces because I am feeding the file to the xscreensaver text crawl that is limited on readable screen width. I could change my code to not add the 2 heading lines until after the text is formatted if that would be much easier. Thanks for any ideas...
The input file at this point looks like this:
ABC World News Feed
RSS Data retrieved from https:--abcnews.go.com-abcnews-headlines
05-24 18:48:16 Truckers' strike leads to fuel shortages in Brazil
05-24 18:48:16 The marathon atop the world's deepest lake
^^^^^^
Remove these character positions starting from 12 to 17
from each title line, with colon in 12 but not from the heading lines
So the result should look like:
ABC World News Feed
RSS Data retrieved from https:--abcnews.go.com-abcnews-headlines
05-24 18:48 Truckers' strike leads to fuel shortages in Brazil
05-24 18:48 The marathon atop the world's deepest lake
My take would be to replace a colon followed by two digits followed by at least one space with a single space:
$ sed 's/:[[:digit:]][[:digit:]] */ /' file
ABC World News Feed
RSS Data retrieved from https:--abcnews.go.com-abcnews-headlines
05-24 18:48 Truckers' strike leads to fuel shortages in Brazil
05-24 18:48 The marathon atop the world's deepest lake
If you want to be really specific about the position, you can anchor the search with ^
to the start of the line and use parentheses with backreference \1
. Here the dot .
matches an arbitrary character:
$ sed 's/^\(..-.. ..:..\):[[:digit:]][[:digit:]] */\1 /' file
ABC World News Feed
RSS Data retrieved from https:--abcnews.go.com-abcnews-headlines
05-24 18:48 Truckers' strike leads to fuel shortages in Brazil
05-24 18:48 The marathon atop the world's deepest lake