Search code examples
pythonralteryx

Replace a value on one data set with a value from another data set with a dynamic lookup


This question relates primarly to Alteryx, however if it can be done in Python, or R in Alteryx workflow using the R tool then that would work as well.

I have two data sets.

Address (contains address information: Line1, Line2, City, State, Zip)

USPS (contains USPS abbreviations: Street to ST, Boulevard to BLVD, etc.)

Goal: Look at the string on the Address data set for Line1. IF it CONTAINS one of the types of streets in the USPS data set, I want to replace that part of the string with its proper abbreviation which is in a different column of the USPS data set.

Example, 123 Main Street would become 123 Main St

What I have tried: Imported the two data sets. Union the two data sets with the instruction of Output All Fields for When Fields Differ. Added a formula, but this is where I am getting stuck. So far it reads:

if [Addr1] Contains(Sting, Target)

Not sure how to have it look in the USPS for one of the values. I am also not certain if this sort of dynamic lookup can take place.

If this can be done in python (I know very basic Python so I don't have code for this yet because I do not know where to start other than importing the data) I can use python within Alteryx.

Any assistance would be great. Please let me know if you need additional information.

Thank you in advance.


Solution

  • The overall logic that can be used is here: Using str_detect (or some other function) and some way to loop through a list to essentially perform a vlookup

    However, in order to expand to Alteryx, you would need to add the Alteryx R tool. Also, some of the code would need to be changed to use the syntax that Alteryx likes.

    read in the data with:

    read.Alteryx('#Link Number', mode = 'data.frame')
    

    After, the above linked question will provide the overall framework for the logic. Reiterated here:

    usps[] = lapply(usps, as.character)
    ##Copies the original address data to a new column that will
    ##be altered.  Preserves the orignal formatting for rollback 
    ##if necessary
    vendorData$new_addr1 = as.character(vendorData$Addr1)
    
    ##Loops through the dictionary replacing all of the common names
    ##with their USPS approved abbreviations for the Addr1 field.
    for(i in 1:nrow(usps)) {
      vendorData$new_addr1 = str_replace_all(
        vendorData$new_addr1,
        pattern = paste0("\\b", usps$Abbreviation[i], "\\b"),
        replacement = usps$USPS_Abbrv_updated[i]
      )
    }
    

    Finally, in order to be able to see the output, we would need to write a statement that will output it in one of the 5 output slots the R tool has. Here is the code for that:

    write.Alteryx(data, #)