Search code examples
functionzshifs

Use values in input csv to search a second csv and return values


I have an input csv with 3 columns and many rows. For each row, I want to use the string in each column to sequentially search a second csv. When a match is found I want to return columns from the row where the match occurred. My code works for the most part, but for some reason refuses to return the output for the 3rd column of each row before moving to the next. I'm stumped. TYIA!

Input.csv:
A1,A2,A3
B1,B2,B3

Search.csv:
A1,Name2,C2,UniqueID1,C2,C3,C4
A3,Name2,C2,UniqueID2,C2,C3,C4
A2,Name4,C2,UniqueID3,C2,C3,C4
B2,Name4,C2,UniqueID4,C2,C3,C4
B3,Name3,C2,UniqueID5,C2,C3,C4
B1,Name2,C2,UniqueID6,C2,C3,C4

Desired Output:  
/Name2/UniqueID1/  
/Name4/UniqueID3/  
/Name2/UniqueID2/  
/Name2/UniqueID6/  
/Name4/UniqueID4/  
/Name3/UniqueID5/  

Current Output:  
/Name2/UniqueID1/  
/Name4/UniqueID3/  
/Name2/UniqueID6/  
/Name4/UniqueID4/  
#!/bin/zsh

# Function to process CSV files
process_csv() {
    local input_file="$1"
    local search_column="$2"
    local search_file="$3"
    local output_column1="$4"
    local output_column2="$5"

    # Loop through the input CSV file
    while IFS= read -r input_line || [[ -n "$input_line" ]]; do
        # Extract values from input CSV using IFS as a separator
        IFS=',' read -r -A input_values <<< "$input_line"
        input_var1="${input_values[1]}"  # Adjust column index as needed
        input_var2="${input_values[2]}"  # Adjust column index as needed
        input_var3="${input_values[3]}"  # Adjust column index as needed

        # Initialize a variable to store the result for the current row
        row_result=""

        # Search for each input variable sequentially
        search_for_input_var "$input_var1" "$search_file" "$search_column" "$output_column1" "$output_column2" row_result
        search_for_input_var "$input_var2" "$search_file" "$search_column" "$output_column1" "$output_column2" row_result
        search_for_input_var "$input_var3" "$search_file" "$search_column" "$output_column1" "$output_column2" row_result

        # Output the concatenated result for the current row
        #echo "Matches found for '$input_var1', '$input_var2', '$input_var3': $row_result"
    done < "$input_file"
}

# Function to search for an input variable in the search CSV file
search_for_input_var() {
    local input_var="$1"
    local search_file="$2"
    local search_column="$3"
    local output_column1="$4"
    local output_column2="$5"
    local result_var="$6"  # Result variable passed by name

    # Reset the result for the current input variable
    result_var=""

    # Loop through the search CSV file
    while IFS= read -r search_line; do
        # Extract values from search CSV using IFS as a separator
        IFS=',' read -r -A search_values <<< "$search_line"
        search_var="${search_values[$search_column]}"  # Adjust column index as needed

        # Check if the current input value matches the search value
        if [ "$input_var" = "$search_var" ]; then
            # Concatenate desired columns with '/' separator 
            result="/${search_values[$output_column1]}/${search_values[$output_column2]}/"
            echo $result
            if [ -n "$result_var" ]; then
                result_var="${result_var}/$result"
            else
                result_var="$result"
            fi
            break  # Stop searching the search CSV once a match is found
        fi
    done < "$search_file"
}

# Usage: ./process_csv.sh Input.csv 1 Search.csv 2 4

input_csv="$1"
search_column="$2"
search_csv="$3"
output_column1="$4"
output_column2="$5"

process_csv "$input_csv" "$search_column" "$search_csv" "$output_column1" "$output_column2"

Solution

  • As we discovered in the comments, the input csv files do not have line terminations that match the platform, so the last value ends up with some extraneous whitespace at the end. One option is to trim whitespace from the values after parsing the `csv' string:

        IFS=',' read -r -A input_values <<< "$input_line"
        input_values=(${(MS)input_values##[[:graph:]]*[[:graph:]]})
    

    This particular substitution (from here) is a little overdone - it will trim all leading and trailing whitespace from all of the values in the input_values array. But that might help with other ways that Excel likes to confuse things when creating csv files.

    You could remove just the CR character being added here with this, after reading input_line:

        input_line=${input_line%$'\r'}