Search code examples
awksedzsh

zsh - caching quoted strings in an array, efficiently


I'm trying to find quoted strings in a file. Occasionally, those strings might have special characters including slashed quotes (e.g. \").

Using a zsh command, on macOS Catalina (gnu sed, not bsd; although awk, etc... is fine too), what's the most efficient way for me to cache those values in an array?

Sample Input:

a file that contains...

The "quick" "\(brown)" fox
jumps "over \n\"the $?@%\"" fence

Expected Output:

the array below...

echo -E - ${array[@]}
"quick" "\(brown)" "over \n\"the $?@%\""

EDIT

I'm willing to forgo the efficient part, and just focus on something that will work.

Also I’m not trying to handcuff anyone to awk or sed. The script needs to be able to run on a vanilla macOS system, any commands available there are fine.

EDIT

So here's where I'm currently at...

while read line; do 
    echo -E - $line | sed 's/\\*(/\\\(/g' | awk -F\" '{print $2}'
done < SampleInput 

...which outputs:

quick
over n

At this point, I need two things to be fixed to print the values that I'd be storing in the array:

(1) I need to preserve the special characters.

(2) I need to keep more than just the second field. Thinking I need to count the quotes while ignoring the escaped quote, then print every other field.

From there, loading those printed fields into an array using xargs shouldn't be too hard to figure out.

Had some other similar questions recently, so I think it's possible to preserve the special characters; what will be ugly is skipping every other fields.

Eventually I'll get this, but I would appreciate the help from anyone who knows these commands better.

Thanks in advance.


Solution

  • Here is an attempt with awk but it needs more testing, I only tested for the sample input.

    > cat test.awk
    
    BEGIN { RS="\"" }
    p { printf "%s", $0 }
    ($0 ~ /\\$/) { if (p) { printf "%s", "\"" }; next }
    { if (p) { p=0 } else { p=1; printf "\n" } }
    

    p is the printing mode and RS is the double quote. We do not switch the printing mode if we find an escaping double quote, that means a record ending with backlash.

    > cat file
    The "quick" "\(brown)" fox
    jumps "over \n\"the $?@%\"" fence
    > awk -f test.awk file
    
    quick
    \(brown)
    over \n\"the $?@%\"