Search code examples
google-sheetsxpathsplitgoogle-query-languagetextjoin

Formatting IMPORTXML Xpath query into readable data for Google Sheets


I am importing XML data to a google sheet which has the following structure: edit: URL: https://sonicstate.com/news/tools/revive_stats/client_camp_3785.xml

<data>
    <campaignId>15802</campaignId>
    <campaignName>Some name</campaignName>
      <startDate>
       <year>2021</year>
       <month>12</month>
       <day>02</day>
     </startDate>
     <endDate>
       <year>2021</year>
       <month>12</month>
       <day>13</day>
     </endDate>
</data>
<data>
.... another record
</data>

I want the results of multiple rows to be imported with date values concatenated so they can appear in manageable rows in the sheet I can then query while the other values will appear in their own cells with a row for each record. eg

15802 | Some name | 2021/12/02 | 2021/12/13

15803 | Another name | 2021/11/30 | 2021/12/04

I have tried:

IMPORTXML("myurl" , "//data/campaignId | //data/campaignName | //data/startDate/year | //data/startDate/month|//data/startDate/day")

But each value returns on a separate row with cells for year, month, day eg:

15802
Some Name
Year | Month | Day
15802
Another Name
Year | Month | Day
etc

I also tried:


IMPORTXML("myurl" , "concat(//data/campaignId , //data/campaignName , //data/startDate/year,'/', //data/startDate/month,'/',//data/startDate/day")

But that only returns a single record. I'm struggling to find the right terms to search for what I am trying to achieve. Also Sheets XMLIMPORT uses XPath 1.0 which limits the functions available


Solution

  • try:

    =INDEX(SUBSTITUTE(TRIM(SPLIT(FLATTEN(SPLIT(QUERY(FLATTEN(QUERY(TRANSPOSE(
     IFERROR(IF(0=MOD(ROW(A:A)-1, 5)-{0, 1, 4, 3, 2}, {"♦","","","",""}&TEXT(
     IMPORTXML(A1, "//data/campaignId|//data/campaignName|//data/startDate/year|//data/startDate/month|//data/startDate/day"), 
     {"@","@","@","@","@"})&{"♠","♠","","♣","♣"}, ))),,9^9)),,9^9), "♦")), "♠")), "♣ ", "/"))
    

    enter image description here

    formula explanation


    shorter fx:

    =INDEX(SUBSTITUTE(TRIM(SPLIT(FLATTEN(SPLIT(QUERY(IFNA(
     CHOOSE(MATCH(MOD(SEQUENCE(999)-1, 5), {0, 1}), "♦", )&
     IMPORTXML(A1, "//data/campaignId|//data/campaignName|//data/startDate/year|//data/startDate/month|//data/startDate/day")&
     CHOOSE(MATCH(MOD(SEQUENCE(999)-1, 5), {0, 1, 2, 3, 4}), "♠","♠","♣","♣","")),,9^9), "♦")), "♠")), "♣ ", "/"))
    

    enter image description here