Search code examples
excelvalidationexcel-formulaexcel-2010data-cleaning

Excel conditional formating based on the multiple cells and values


I am trying to implement various conditional formatting to a specific data base. Looked for answer around here but can not find anything similar. Might not be possible but it is worth a try.
I am preforming various data cleansing and validation.
Here is the case: (small sample, working with 100k data entries in this particular file)

enter image description here

Ultimately what I want is the formula that will compare the low-level Description characters after the last "UNDERSCORE" to the characters after last "UNDERSCORE" of the higher level(highlighted). If it does not match then highlight the cell?

Asking for too much, yes, no, maybe? I am open to any other suggestions on how can I perform various data cleaning and validation!

Thank you!


Solution

  • If you must use the last "UNDERSCORE" character, and can't depend on the suffixes being four characters, the formula becomes quite complex. For simplicity's sake, I assumed the higher level is always missing the last five characters of the lower level, if you must go by the last "DASH" character, then this will be a lot longer.

    Use this formula to highlight the cells, defining the two names LEVELS and DESCRS to be the two columns:

    =IFNA(MID(B2,FIND("[]",SUBSTITUTE(B2,"_","[]",LEN(B2)-LEN(SUBSTITUTE(B2,"_",""))))+1,999)<>MID(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),FIND("[]",SUBSTITUTE(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),"_","[]",LEN(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1))-LEN(SUBSTITUTE(INDEX(DESCRS,MATCH(LEFT(A2,LEN(A2)-5),LEVELS,0),1),"_",""))))+1,999),FALSE)
    

    This uses a very nice trick with SUBSTITUTE to find the last occurrence of a character.

    BTW, I would probably write a Perl program to parse the data and find errors.