I need to separate the key and values from the text that looks like below
Student ID: 0
Department ID = 18432
Name XYZ
Subjects:
Computer Architecture
Advanced Network Security 2
In the above example Student ID, Department ID and Name are the keys and 0,18432, XYZ are values. The keys are separated from the values either by :,= or multiple spaces. I tried reg ex such as
$line =~ /(([\w\(\)]*\s)*)([=:\s?]?)\s*(\S.*)?$/;
$key = $2;
$colon=$3;
$value = $4;
The problem I am facing is identifying when a word is separated with single space and when it is separated by more than one.
The output I get is line is Student ID: 0 key is Student , value is ID: 0 while I want key is Student ID and value is 0. For lines like Subjects: and Computer Architecture, the key should have Subjects and Computer Architecture. I have logic later when there is no value or colon, I append the strings to the previous key so it will look like Subjects=Computer Architecture;Advanced Network Security 2
Update: Thanks Ikegami for indicating that I use look behind operator. But I still seem to have problem solving it.
$line=~/^(?: ( [^:=]+ ) (?<!\s\s)\s* [:=]\s*|\s*)(.*)$/x;
So When I say (?<!\s\s)\s* [:=]\s*|\s*
I mean when there more than two spaces, consume all the spaces and when there are no two consecutive spaces look for : or = and consume spaces. So if you pass below line to the expression, Shouldnt I be getting $1=Name and $2=ABC XYZ?
Name ABC XYZ
What I seem to be getting is key is empty and value is Name ABC XYZ.
If
Name Eric Brine
Computer Architecture x86
means
key: Name Eric value: Brine
key: Computer Architecture value: x86
then you want
# Requires 5.10
if (/
^
(?: (?<key> [^:=]+ (?<!\s) ) \s* [:=] \s* (?<val> .* )
| (?<key> .+ (?<!\s) ) \s+ (?<val> \S+ )
)
\s* $
/x) {
my $key = $+{key};
my $val = $+{val};
...
}
or
if (/
^
(?: ( [^:=]+ (?<!\s) ) \s* [:=] \s* ( .* )
| ( .+ (?<!\s) ) \s+ ( \S+ )
)
\s*
( .* )
/x) {
my ($key,$val) = defined($1) ? ($1,$2) : ($3,$4);
...
}
If
Name Eric Brine
Computer Architecture x86
means
key: Name value: Eric Brine
key: Computer value: Architecture x86
then you want
# Requires 5.10
if (/
^
(?: (?<key> [^:=]+ (?<!\s) ) \s* [:=]
| (?<key> \S+ ) \s
)
\s*
(?<val> .* )
/x) {
my $key = $+{key};
my $val = $+{val};
...
}
or
if (/
^
(?: ( [^:=]+ (?<!\s) ) \s* [:=]
| ( \S+ ) \s
)
\s*
( .* )
/x) {
my $key = defined($1) ? $1 : $2;
my $val = $3;
...
}
Note that you can remove all the space and line breaks. For example, the last snippet can be written as:
if (/^(?:([^:=]+(?<!\s))\s*[:=]|(\S+)\s)\s*(.*)/) {
my $key = defined($1) ? $1 : $2;
my $val = $3;
...
}