Search code examples
xmlweb-scrapinggoogle-sheetsgoogle-sheets-formulaimporterror

Google Sheets =importXML() returns "Imported Content is Empty"


I'm trying to import the most common Library of Congress identifier for a list of books using Google Sheets. The XML file for the ISBN is http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true. XML is pasted below for ease. I want to get lcc/mostPopular[@nsfa] but the formula =importxml("http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true","lcc/mostPopular[@nsfa]") returns "Imported content is empty."

Am I entering the xpath_query wrong?

I know the link is valid because I can import the whole thing with =importdata("http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true"), but that gives a garbled mess of data in the spreadsheet.

<classify xmlns="http://classify.oclc.org">
<response code="0"/>
<!-- Classify is a product of OCLC Online Computer Library Center: http://classify.oclc.org -->
<work author="Piper, John, 1946-" editions="4" eholdings="97" format="Book" holdings="184" itemtype="itemtype-book" owi="769061307" title="Bloodlines : race, cross, and the Christian">696100305</work>
<authors>
<author lc="n78072014" viaf="109537817">Piper, John, 1946-</author>
</authors>
<orderBy>thold desc</orderBy>
<input type="isbn">1433528525</input>
<recommendations>
<ddc>
<mostPopular holdings="280" nsfa="270.089" sfa="270.089"/>
<mostRecent holdings="280" sfa="270.089"/>
<latestEdition holdings="280" sf2="22" sfa="270.089"/>
</ddc>
<lcc>
<mostPopular holdings="280" nsfa="BT738.27" sfa="BT738.27"/>
<mostRecent holdings="280" sfa="BT738.27"/>
</lcc>
</recommendations>
</classify>

Solution

  • try:

    =REGEXEXTRACT(QUERY(FLATTEN(SPLIT(QUERY(IMPORTDATA(
     "http://classify.oclc.org/classify2/Classify?isbn=1433528525&summary=true"),,9^9), 
     "<lcc>", 0)), 
     "where Col1 contains 'mostPopular' offset 1"), 
     "nsfa=""([^\s]+)""")
    

    enter image description here