Search code examples
pythonapidataframenullnonetype

How to proceed API Data Input Process to Python's DataFrame if NoneType / missing value is found


I am trying to access data from Goodreads API. However, when the iteration input found a NoneType data, the input process stop and does not proceed to the next iteration as I intended it to.

I could think of either;

  1. skipping the rest of the current row data input that are following the NoneType data, then cleaning them all rows that contain null data, or
  2. deleting directly the entire current row the time iteration find NoneType data.

Below is the complete code:

import pandas as pd
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET

columns = ['id',
           'title',
           'authors/author/name',
           'average_rating',
           'isbn',
           'isbn13',
           'language_code',
           'num_pages',
           'ratings_count',
           'text_reviews_count',]
index = list(range(6,7))
table = pd.DataFrame(index=index, columns=columns)


#Here's the mining to DataFrame
for row in index:
    for path in columns:        
        serviceurl1 = 'https://www.goodreads.com/book/show.xml?'
        parameters1 = {'key': 'J9l5JsnPRm..............',
                      'id': row,
                      'format': 'xml',
                      }
        url = serviceurl1 + urllib.parse.urlencode(parameters1)
        access = urllib.request.urlopen(url, context = ctx)
        data = access.read().decode('utf-8')


        #DATA MINING PROCESS
        #parsing the 'data' string into readable xml
        tree = ET.fromstring(data)

        inlst = tree.findall('book/%s' %path)
        instr = []
        for element in inlst:
            instr.append(element.text)
        datum = '-'.join(instr)

        table[path][row] = datum      

print(table)

Below is the part where i want a skipping/deleting null data input to happen:

for element in inlst:
    instr.append(element.text)
datum = '-'.join(instr)

Below is the XML file which data I want to extract:

<book>
  <id>6</id>
  <title>Harry Potter and the Goblet of Fire (Harry Potter, #4)</title>
  <isbn></isbn>
  <isbn13></isbn13>
</book>

as can be seen from above xml, 'isbn' and 'isbn13' are empty data and program is stopping when the iteration accessing them.

as for the 1st solution, I only know how to delete rows contain null data but idk how to skip when iteration find null data. For the 2nd solution idk at all how to do it.

Is there any way to implement this solution or is there any other suggestion? Thanks all, appreciate the help..


Solution

  • Below is snippet containing the condition for checking NoneType:

    for element in inlst:
        if element.text:
            instr.append(element.text)
    datum = '-'.join(instr)
    

    Also posting the whole testing code:

    import pandas as pd
    import urllib.request, urllib.parse, urllib.error
    import xml.etree.ElementTree as ET
    
    columns = ['id',
               'title',
               'authors/author/name',
               'average_rating',
               'isbn',
               'isbn13',
               'language_code',
               'num_pages',
               'ratings_count',
               'text_reviews_count',]
    index = list(range(6,7))
    table = pd.DataFrame(index=index, columns=columns)
    
    
    #Here's the mining to DataFrame
    for row in index:
        for path in columns:        
    
            data = '''
            <book>
              <id>6</id>
              <title>Harry Potter and the Goblet of Fire (Harry Potter, #4)</title>
              <isbn>4</isbn>
              <isbn13></isbn13>
            </book>
            '''
    
    
            #DATA MINING PROCESS
            #parsing the 'data' string into readable xml
            tree = ET.fromstring(data)
    
            inlst = tree.findall('%s' %path)
            instr = []
            for element in inlst:
                if element.text:
                    instr.append(element.text)
            datum = '-'.join(instr)
    
            table[path][row] = datum      
    
    print(table)
    

    Run the above snippet and observe the output, I think it's exactly aligned to the case you were asking.

    Also I had updated the findall condition as below:

    inlst = tree.findall('%s' %path)