Search code examples
xpathgoogle-sheetsgoogle-sheets-formulaxpath-2.0xpath-1.0

Xpath, fetching table with text and images in Google sheets


I'm trying to parse this table into Google Sheets: https://exvius.gamepedia.com/Chaining/Bolting_Strike And getting the title text from where there are images.

I can't figure out how to get the text from the full table, as well as img/@alt in cases where it's available. I can get the table with

=IMPORTXML("https://exvius.gamepedia.com/Chaining/Bolting_Strike","//table[@class='wikitable']/tbody/tr[position()>=3]")

And only the image texts

=IMPORTXML("https://exvius.gamepedia.com/Chaining/Bolting_Strike","//table[@class='wikitable']/tbody/tr[position()>=3]/td/a/img/@alt")

But I can't seem to do both, is that a limitation of Google Sheets IMPORTXML?

I've tried with OR and other bool operators with no luck. Tried with axes but that was also a no go for me.


Solution

  • I propose something like this :

    GP

    Sheet

    Description:

    In B1 we have the url of the webpage. In B3 we have the following formula to import the first part of the table :

    =QUERY(IMPORTHTML(B1;"table";1);"select Col1,Col2,Col3 OFFSET 2";0)
    

    Columns L to O contain the following formulas to get the element names and the ability names (which will be used as a key in a VLOOKUP step). 4 formulas because an ability could have 2 element names. In L3,M3,N3,03 we have :

    =IMPORTXML(B1;"//td/a[1]/img[@srcset]/ancestor::td[1]/preceding::a[1][@title]")
    =IMPORTXML(B1;"//td/a[1]/img[@srcset]/@alt")
    =IMPORTXML(B1;"//td/a[2]/img[@srcset]/ancestor::td[1]/preceding::a[1][@title]")
    =IMPORTXML(B1;"//td/a[2]/img[@srcset]/@alt")
    

    Formula in E4 is a one liner where the results of 2 VLOOKUP are merged together. We use VLOOKUP to pair each ability name with an element.

    =ARRAYFORMULA(REGEXREPLACE(ARRAYFORMULA(IFERROR(VLOOKUP(C4:INDIRECT("C"&COUNTA(C:C)+2);L:M;2;FALSE);"")&"|"&ARRAYFORMULA(IFERROR(VLOOKUP(C4:INDIRECT("C"&COUNTA(C:C)+2);N:O;2;FALSE);"")));"^\||\|$";"")) 
    

    To finish, in H3 we have the last part of the table :

    =QUERY(IMPORTHTML(B1;"table";1);"select Col5,Col6 OFFSET 2";0)
    

    The rest (colours, borders,..) is standard and conditionnal formatting.

    Side note : I'm based in Europe so you might have to change ; with , in the formulas.