Search code examples
excelxpathexcel-formulaxpath-1.0filterxml

How to fix the #SPILL! Error by displaying only the second value?


I have a column with some info displayed like that:

Product Info
I am the 3rd product from 2020
I was created in 1995 and I went public in 2021
I am a not sure if I'm from 2019 2020 2021

I have a formula to extract the year in the above column that is:

=IFERROR(FILTERXML("<k><m>"&SUBSTITUTE([@[Product Name]]," ","</m><m>")&"</m></k>","//m[.=number() and string-length()=4]"),"")

The problem with this formula is that it works fine with the first case, but it gives me a #SPILL! Error on the other two cases. My ideal output would be:

Product Info Year
I am the 3rd product from 2020 2020
I was created in 1995 and I went public in 2021 2021
I am a not sure if I'm from 2019 2020 2021
  • Basically, for the first case, just return the 4 digits. EVERY time that I only have one sequence of 4 digits, I want to return that sequence.
  • For the second case, I want to return ONLY the second year. EVERY time I have 2 sequences of 4 digits, I want to return ONLY the second year.
  • For the third case, I want to return nothing. EVERY time I have more than 2 sequences of 4 digits, I want to return blank.

The last thing I tried to add was position()>5 and that would cut off the 1995 in the second example, but I would continue having the Error on the third example. Also, my list is quite huge, and I am not sure if the position()>5 thing would work for ALL products that fall in the same second example.

I am not very good with XPATH, so any help would be greatly appreciated. Thank you!


Solution

  • Disclaimer: Below solution is written on the assumption that when 'count of years < 3', return the last given year. If 'count >= 3' then only return the last year if years come in pairs of two. Hence the use of 'modulus 2 == 0'.


    You can expand the xpath for sure if you so desire. However, I'd rewrite it a little bit. Each predicate, the structure between the opening and closing square brackets, is a filter of a given nodelist. To write multiple of these structures is in fact anding such predicates. To get a better understanding of what most common xpath 1.0 functions can do within FILTERXML(), I'd like to redirect you to this post.

    So to write a consecutive pattern of predicates I'd opt for:

    • [.*0=0] - First return a filtered nodelist of all numbers where a node multiplied by zero equals zero;
    • [string-length()=4] - Then return only those that are 4 characters long‡‡;
    • [position() = last() and (position() = 1 or position() mod 2 = 0)] - The 3rd and last predicate is the trickiest for your query. This is done with a first check that position() = last() meaning the node needs to be the last node in the filtered nodelist of step 2 and (position() = 1 or position() mod 2 = 0) means we want to check that this node is also at the 1st index or the modulus 2 of the indexed position equals 0‡‡‡.

    enter image description here

    Formula in B2:

    =IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=4][position() = last() and (position() = 1 or position() mod 2 = 0)]"),"")
    

    Whilst the above would work for Excel 2013 and higher‡‡‡‡, you do talk about spilled behaviour. If you happen to work with the current channel in ms365 you could also try:

    =LET(x,TEXTSPLIT(A2," "),y,--FILTER(x,ISNUMBER(-(x&"**0"))*(LEN(x)=4),{1,2,3}),z,COUNT(y),IF(OR(z=1,MOD(z,2)=0),TAKE(y,,-1),""))
    

    If you need to simply return the last year if 'count < 3' then you can use xpath "//s[.*0=0][string-length()=4][position()<3 and position() = last()]" or ms365 formula =LET(x,TEXTSPLIT(A2," "),y,FILTER(x,ISNUMBER(-(x&"**0"))*(LEN(x)=4),""),IF(COUNTA(y)>2,"",TAKE(y,,-1))).

    ‡‡ Note that you can be more strict about this if you'd wish to validate that a year is between say 1900-2050 or so. One could replace the 1st and 2nd predicate with [.*1>1899][.*1<2051].

    ‡‡‡ Note that the order or writing your and/or statements in xpath do matter. We need to use explicit parentheses to control the precedence. See this

    ‡‡‡‡ This is not true for Excel Online or Excel for Mac