I have mismatched ZIP codes in Dataprep. I need to add two digits to columns where it was entered improperly.
In Dataprep I get a suggestion to replace:
'{start}{digit}{3}{end}' with ' '
In the replace dialog I can only put a string, not '{start}{digit}{3}{end}'
because it reads it as text. How do I replace:
'{start}{digit}{3}{end}'
with 12 + '{start}{digit}{3}{end}'
Replacing or inserting, either would work but I haven't found a way to insert either.
Any help at all? Thanks.
What you are looking for is the Merge Transform. It allows two merge several columns or values into a new column, also allowing the possibility to insert a delimiter between the values that you are merging.
From your question, I understand that you want to turn a data value like 345
into 12345
, i.e. attaching 12
as a prefix. To do so, you can run the merge()
formula just like this:
merge(['12',zip])
In this case, zip
is just the name of the column that you want to merge with the value 12
. Write the name of your column in your specific case.
The image below shows the results of running the merge()
formula.
UPDATE:
Given the case that you have both correct and incorrect ZIP codes in your data, you can apply a conditional logic function IF
in order to determine which formula to apply to your data. The IF()
function works like this:
IF(test_expression, true_expression, false_expression)
So in your case, you should evaluate over the length of the items in the column and only add 12
when there are exactly three digits in the row, as you said in your comment. The formula I used is the following one:
if(len(zip) == 3, merge(['12',zip]), zip)
And the results (compared when using IF()
[right_new_zip column] and when not using it [wrong_new_zip column]) are these ones:
Finally, let me refer you to this nice documentation page where you will find all the information you need about the language used in Dataprep and the functions you can find useful, categorized by type (Logical, Aggregate, String, Date, etc.).