Search code examples
stringbashshellsplitdelimited

Various ways of splitting strings not respecting spaces and replacing with newlines in a bash script


DISCLAIMER: Total bash noob so sorry for anything blatantly dumb in this question. I am really starting to like it, however, it reminds me of my first months with C when I was 7 or 8 years old :)

Introduction: This is a portion of a script which will be used to do certain automated recovery/restoration tasks on machines specified by name (i.e. --device mba) should they require a clean install. Not only will it be a massive time saver - despite the time its taking me to write - and serves as a way to learn the language and its feature, capabilities and limitations.

I setup my devices array of key value pairs as follows (side question: is devices=()necessary?. I stopped being lazy and tried myself and no it is not necessary.)

# Devices
devices=()
devices[0]="mba=Lee's Macbook Air"
devices[1]="mms=Lee's Mac Mini Server"
# devices[2]="mml=Lee's Mac Mini (Living Room)"
# devices[3]="mmb=Lee's Mac Mini (Bedroom)"
# devices[4]="mps=Lee's Mac Pro Server"

My first attempt which makes most sense is to just iterate the array and output the device parameter but its only suitable for this method. In other parts of the script I need to split the string to get the short name.

for device in ${devices[@]}; do

    echo $device

done

Makes no sense! Why is it splitting on spaces?

Second attempt (I like this one as later I'm going to need to get the key so that I can perform additional actions based on it, the portion of the method here simple is called when the --devices flag is set when calling the script and then exists):

for device in ${devices[@]//=/}; do

    echo ${device[0]} ${device[1]} 

done

Or:

for device in ${devices[@]}; do

    deviceKeyValuePair=(${device//=/})

    echo ${deviceKeyValuePair[0]} ${deviceKeyValuePair[1]} 

done

Produces the following:

mbaLee's Macbook Air mmsLee's Mac Mini Server

3rd attempt:

for device in ${devices[@]}; do

    deviceKeyValuePair=(`echo $device | tr "[:alnum:]" "[:alnum:]"`)

    echo ${deviceKeyValuePair[0]} ${deviceKeyValuePair[1]} 

done

Produces:

mba=Lee's Macbook Air mms=Lee's Mac Mini Server

I've also tried the TR method:

deviceKeyValuePair=(`echo $device | tr "[:alnum:]" "[:alnum:]"`)

Output:

mba=Lee's Macbook Air mms=Lee's Mac Mini Server

I could go with associative arrays but they are only available in BASH 4 and even thought all machines have been updated to El Capitan this week BASH is still version 3.2.57(1) so its a no go otherwise the following would most likely have worked. Although I'm not a fan of this method ... simply an anal dislike of iterating and array and then doing a looking based on a key - it's just a me thing very weird at times! I do it with .NET Dictionaries and Lists all the time. The following would most likely work:

declare -A devices

devices["mba"]="Lee's Macbook Air"
devices["mms"]="Lee's Mac Mini Server"
devices["mml"]="Lee's Mac Mini (Living Room)"
devices["mmb"]="Lee's Mac Mini (Bedroom)"
devices["mps"]="Lee's Mac Pro Server"

for device in ${!devices[@]}; do

    echo ${device} ${devices[${device}]}

done

Is it safe to upgrade bash on OS X El Capitan? Will I break anything? Actually I would still like to know...

Figured it out: IFS!!!!

function listDevices() {

    oldIFS=IFS

    IFS=''

    for device in ${devices[@]}; do

        deviceKeyValuePair=(${device//=/})

        printf "${deviceKeyValuePair[0]}=${deviceKeyValuePair[1]}\n"

    done    

    IFS=$oldIFS

    exit 0
}

The above would be my preferred way of doing it but produces:

mbaLee's Macbook Air= mmsLee's Mac Mini Server= mmlLee's Mac Mini (Living Room)= mmbLee's Mac Mini (Bedroom)= mpsLee's Mac Pro Server=

I guess my understanding of the //=/ in (${device//=/}) is incorrect? I thought is was bash's own built-in way of splitting a string based on the specified delimiter. Seems more like the bash way of removing a character so another question there as I can't find reference to it in the bash string manipulation page!

Anyway I've settled on the following for now:

function listDevices() {

    # Pointless since this method exits once complete
    # oldIFS=IFS

    IFS=''

    for device in ${devices[@]}; do

        deviceKey=${device%%=*}
        deviceName=${device##*=} 

        echo "$deviceKey: ${deviceName}"

    done    

    # Pointless since this method exits once complete
    # IFS=$oldIFS

    exit 0
}

Which outputs:

mba: Lee's Macbook Air mms: Lee's Mac Mini Server mml: Lee's Mac Mini (Living Room) mmb: Lee's Mac Mini (Bedroom) mps: Lee's Mac Pro Server

I've spend ages writing this post and there are still questions to be answered (implied and explicit) so those are two reasons for not scrapping it plus it will serve to educate other noobs who don't read manuals fully, I have a reason in the form of severe ADHD without the hyperactivity so ADD but not everyone knows what that means and I never read manuals for devices nor did I fence look at a Lego® instruction leaflet except for Lego@ Technics. Also someone may show me a far nicer way not involving IFS...

Thanks guys!!!

Coming comment: I've got a long way to go proficient with bash and shell scripting! Plus I am learning Groovy and SmartThings variants and Lua all at the same time. That's ADD for you.

EDIT: Hopefully this satisfies Cyrus' concerns:

#!/bin/bash    

# Devices
devices[0]="mba=Lee's Macbook Air"
devices[1]="mms=Lee's Mac Mini Server"
devices[2]="mml=Lee's Mac Mini (Living Room)"
devices[3]="mmb=Lee's Mac Mini (Bedroom)"
devices[4]="mps=Lee's Mac Pro Server"

function listDevices() {

    # Pointless since this method exits once complete
    # oldIFS=IFS

    IFS=''

    for device in ${devices[@]}; do

        deviceKey=${device%%=*}
        deviceName=${device##*=} 

        echo "$deviceKey: ${deviceName}"

    done    
    # Pointless since this method exits once complete
    # IFS=$oldIFS

    exit 0
}

listDevices

Actually it doesn't. I'll make more of an effort in future, try to be less verbose and ensure other users can copy and paste the code into a file and execute it.


Solution

  • One of the standard gotchas in shell scripting is that (almost) anything not in quotes will be split into words (based on the characters in IFS) and also wildcard-expanded into a list of matching filenames. Unless you specifically want this to happen (and you usually don't), you should put things like variable references in double-quotes. (Setting IFS to "" effectively disabled splitting, but leaves wildcard expansion, so it's not as good. It also has other, potentially unpleasant, side effects.) So leave ISF alone and write your loop command like this:

    for device in "${devices[@]}"; do
    

    Second, a substitution like ${device//=/} will replace each "=" with ... nothing. Essentially, it will delete the equal signs out of the string. If you used ${device//=/ } that would turn them into spaces, but that's not what you want either, because then they're indistinguishable from the spaces within the right-hand-side. The approach you settled on is almost right, but deviceKey=${device%%=*} trims starting at the leftmost "=", and deviceName=${device##*=} trims through the rightmost "=". If there's always exactly one "=" in the string this is ok, but if the device name might possibly contain an "=", you should use deviceName=${device#*=} (note only one "#") so it trims the smallest match, i.e. through the leftmost "=". BTW, this is one of the few cases where it's safe to leave variable references unquoted, but IMO it's easier and safer to just double-quote anyway than to try to remember exactly when it's safe and when it isn't. Thus:

    deviceKey="${device%%=*}"
    deviceName="${device#*=}"
    

    And actually, there is an array-based way to do the splitting:

    IFS="=" read -a deviceKeyValuePair <<<"$devices"
    

    ...here the double-quotes around "$devices" prevent word-splitting and wildcard expansion before the string is handed to read. read splits the string into an array based on IFS. Note that the IFS setting is used as a prefix to the read command so it only applies to that one command (i.e. you don't have to reset it afterward). But like your version this has a problem if the device name contains "=" because it'll split it into extra array elements based on that... so actually I'd recommend not using this.