Search code examples
google-sheetsgoogle-sheets-querygoogle-sheets-formula

Filter IMPORTHTML data


When I import data, it comes in this format (image 1), with blank spaces. I would like to know if there is any way to adjust so that these blanks disappear, the two models expected (image 2 and 3) if there was any way to reach them would be important to me.

Remembering that all dates have / and all times have :

I tried to filter from QUERY, but when trying to "Select Col1, Col2, Col4 Where Col2 is not null" the dates disappear and only the times remain, I tried via REGEXMATCH to separate the dates from the times using / and : but also I was not successful.

I also tried it via IMPORTXML, but some data ends up not being imported correctly on some pages of the site, for IMPORTHTML these errors do not happen. The XML's I used were:

"//tr[@class='no-date-repetition-new' and ..//td[@class='team team-a']] | //tr[@class='no-date-repetition-new live-now' and ..//td[@class='team team-a']]"
"//td[@class='team team-a']/a | //td[@class='team team-a strong']/a"

The current formula is as follows:
=IMPORTHTML("https://int.soccerway.com/national/austria/1-liga/20192020/regular-season/r54328/","table",1)

IMPORTHTML Original: enter image description here

Expected formats:
enter image description here --- enter image description here


Solution

  • Rather than filtering what you need is to restructure the imported data.

    Anyway, I think that the easier solution to get the final result is to use multiple IMPORTXML formulas.

    URL
    A1: https://int.soccerway.com/national/austria/1-liga/20192020/regular-season/r54328/

    Headers
    A2: //table[contains(@class,'matches')]/thead/tr/th

    Day
    A3: //td[contains(@class,'date')]/parent::tr

    Teams and Score
    A4: //td[contains(@class,'team-a')]/parent::tr

    A6: =transpose(IMPORTXML($A$1,A2))
    A7: =IMPORTXML($A$1,A3)
    B7: =IMPORTXML(A1,A4)

    You might want to replace the formula on A6 by static values in order to place them properly.