I have this converted dictionary to use in Pure Data. It consists of a series of 3 things: the word, how to pronounce it, and a semicolon to finish. In the converted dictionary, some semicolons are missing, so I want AWK to find the missings and put semicolons for me. I used delimiters before, but this one is difficult for me, so any help will be appreciated. See the text file: the first 3 are good, the last three are wrong, there is a semicolon missing at the end. I think the AWK delimiter will be between non-capital letters and capital letters, and the action is to put a semicolon if there is no semicolon already. How can I put this in AWK code?
ELFKIN
Elf
kin;
ELFLAND
Elf
land
;
ELFLOCK
Elf
lock
;
ELGIN
El
gin
ELICIT
E
lic
it
ELICIT
E
lic
it
I used some Delimiters before, but i do not know how to specify between in AWK. So the Delimiter is non-capital letters and Capital letters, and put a semicolon there. so some code would look like this awk 'length($0)>1 && line with All capitals put semicolon before this line' or awk 'line with non-capitals if Next line is Capitals put semicolon after line I have tryed this
awk 'length($0>1) && /[:^, upper:]/{l=l";"}NR>1{print l}{l=$0}END{print l}' file2
This is not good working.
Or am i pointing is the wrong direction.
I would harness GNU AWK
for this task following way, let file.txt
content be
ELFKIN
Elf
kin;
ELFLAND
Elf
land
;
ELFLOCK
Elf
lock
;
ELGIN
El
gin
ELICIT
E
lic
it
ELICIT
E
lic
it
then
awk 'BEGIN{RS=""}{print gensub(/([[:lower:]])\n([[:upper:]])/,"\\1;\n\\2","g")}' file.txt
gives output
ELFKIN
Elf
kin;
ELFLAND
Elf
land
;
ELFLOCK
Elf
lock
;
ELGIN
El
gin;
ELICIT
E
lic
it;
ELICIT
E
lic
it
Explanation: setting RS
to empty string engage paragraph mode, as file.txt
has not blank line, it is treated as 1 row. Then I use gensub
string function to replace all (g
like globally) occurences of lowercase letter followed by newline followed by uppercase letter by 1st of that letters followed by semicolon followed by newline followed by 2nd letter.
(tested in GNU Awk 5.1.0)