This question pertains to regular expressions that can be processed by bash.
I have a regular expression which finds in a text all matches of a date in the notation d.m.yyyy or dd.m.yyyy or d.mm.yyyy or dd.mm.yyyy if it happens to be between tabs or at least two white spaces:
(?<=\t|\s{2,})(\d{1,2}\.\d{1,2}\.\d{4})(?=\t|\s{2,})
How can I replace all the findings of this (let's assume first) capture group by a date formatted according to ISO 8601, i.e. in the notation yyyy-mm-dd?
Since the delimiting tabs or t least double spaces are in a lookaround condition they do not belong to my capture group. They would remain as they were in the original string.
The problem decomposes to:
1. how to address the n-th match of $1
2. how do I rearrange the three components separated by dots in this case?
If you want to process it with bash
, would you please try the following:
#!/bin/bash
str=$'foo\t27.6.2021 bar' # example of the input line
pat=$'^(.*)(\t| {2,})([0-9]{1,2})\.([0-9]{1,2})\.([0-9]{4})(\t| {2,})(.*)$'
if [[ $str =~ $pat ]]; then
a=("${BASH_REMATCH[@]:1}") # assign array "a" to the matched substrings excluding "${BASH_REMATCH[0]}" (entire match)
y=${a[4]}; a[4]=${a[2]}; a[2]=$y; # swap year and date
printf "%s%s%04d-%02d-%02d%s%s\n" "${a[@]}" # print the formatted result
fi
As commented, bash regex does not support lookarounds. You need to capture whole line as substrings and reuse them.