Search code examples
pythonsocrata

Trying to get all fields in a Socrata python API call


I am using the python api (sodapy) to get data from https://dev.socrata.com/foundry/data.energystar.gov/ebvx-pb7r. How do I get all the fields shown on this page through the api? Specifically, I need the additional_model_information field, though I think there will be others. I have tried:

  1. "where additional_model_information is not null": the field is included, but I also want the rows that have null.
  2. "select='additional_model_information' the field is returned, but nothing else.
  3. "select=*" does not add any fields.
  4. "where='additional_model_information is not null or additional_model_information is null'" This seems to work.
  5. "select=list all fields" Should work, but seems unwieldy.
  6. Could make 2 calls, one to get the * data and a second to get other fields.

I suspect I am missing something. Any help is appreciated.

import pprint
import sodapy

client = sodapy.Socrata(domain='data.energystar.gov', app_token=None)
rows = client.get('ebvx-pb7r', where='additional_model_information is not null or additional_model_information is null')
for row in rows:
    pprint.pprint(row)

Solution

  • If you just leave off the where parameter entirely, you'll an unfiltered version of the dataset, which it sounds like is what you want. You'll also want to include the limit parameter to make sure you get all of the records, since the total count is 1058, which is larger than the default page size of 1000:

    import pprint
    import sodapy
    
    client = sodapy.Socrata(domain='data.energystar.gov', app_token=None)
    rows = client.get('ebvx-pb7r', limit=5000)
    for row in rows:
        pprint.pprint(row)
    

    I think part of the confusion may come from the fact that if there is no value for additional_model_information for a given record, the JSON object that we return will omit that field.