Search code examples
google-sheetsgoogle-sheets-formulaqgis

Google ImportXML from QGIS metadata file


I am trying to capture elements of an qmd file (that is xml markup) using Google Sheets importxml. Based on How to use importXML function with a file from Google Drive? I think I've got the file importing correctly but can't seem to capture any of the tags.

Here's what I am trying -

=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier")

Here's what the qmd/xml file looks like

<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
<qgis version="3.9.0-Master">
  <identifier>Z:/My Drive/Mangoesmapping/Spatial Projects/2019/DSC/132_Ongoing_Asset_Updates/Working/Sewerage_Updates/Sewerage_Manholes_InspectionShafts.TAB</identifier>
  <parentidentifier>Sewerage Manhole Infrastructure</parentidentifier>
  <language>AUS</language>
  <type>dataset</type>
  <title>Sewerage Manholes within Douglas Shire Council</title>
  <abstract>Sewerage Manholes within Douglas Shire Council. Most data has been updated based on field work, review of existing AsCon files and discussion with council staff responsible for the assets in 2018/2019. In Port Douglas most of the infrastructure has been surveyed in. </abstract>
  <keywords vocabulary="gmd:topicCategory">
    <keyword>Infrastructure</keyword>
    <keyword>Sewerage</keyword>

If I use

=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","*")

I get enter image description here

But I really would like to just get the elements I want by placing the importxml for each tag in the cell I need it in.


Solution

    • You want to retrieve ### of <identifier>###</identifier> from https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download

    I could understand like above. If my understanding is correct, how about this answer?

    Issue:

    In your question, the formula of =importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier") uses \\identifier as the xpath. From your data you want to retrieve the values, it seems that you are trying to retrieve ### of <identifier>###</identifier>.

    In this case, in order to Selects nodes in the document from the current node that match the selection no matter where they are, // is required to be used instead of \\. This can be seen at the document of here.

    Modified formula:

    So =importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier") can be modified as follows.

    =importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","//identifier")
    

    As other xpath, from your data in your question, you can also use the xpath of /qgis/identifier instead of //identifier. So you can also use the following formula.

    =importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","/qgis/identifier")
    

    References: