Search code examples
jsonpluginsckan

Is there a way to exclude datasets in CKAN plugin ckanext-datajson output?


We have installed ckanext-datajson to export our datasets into U.S. Project Open Data metadata specification v.1.1 compatible format. However, there are data we do not want to appear in the output because the datasets are confidential. We thought about making the datasets private which does exclude them, but it also prevents anyone who is not in the organization from seeing the dataset, which we don't want.

Does anyone know a way to prevent datasets from outputting to JSON that doesn't involve making them private?


Solution

  • This answer requires coding, so if this is an option for you, you might find it helpful.

    I don't know specifically about the datajson extension, but we had a similar use case for the dcat extension. ckanext-dcat provides metadata about all datasets as RDF through an endpoint (/catalog.xml). We have one user who wanted to exclude specific datasets, so we implemented a custom extension with a modified version of this endpoint that filters out certain datasets. Specifically, we filter out all datasets that have the value harvest-fisbroker for the (custom) attribute berlin_source.

    Down the line, the DCAT catalog endpoint works by calling package_search. package_search provides an fq-parameter (filter query), which can be used to filter the result by any attribute. We set this parameter in our extension as part of a data_dict, which then gets passed to the dcat_catalog_show action, which in turn calls package_search to generate its response.

    from flask import Blueprint, make_response
    from ckan.plugins import toolkit
    from ckanext.dcat.utils import CONTENT_TYPES
    
    def read_catalog(format):
    
        data_dict = {
            'page': toolkit.request.params.get('page'),
            'modified_since': toolkit.request.params.get('modified_since'),
            'format': format,
            'fq': '-berlin_source:harvest-fisbroker',
        }
    
        response = toolkit.get_action('dcat_catalog_show')({}, data_dict)
    
        response = make_response(response)
        response.headers['Content-type'] = CONTENT_TYPES[format]
    
        return response
    
    
    no_fisbroker_api = Blueprint('no_fisbroker_api', __name__)
    no_fisbroker_api.add_url_rule(u'/catalog_no_fb.<format>',
                            methods=[u'GET'], view_func=read_catalog)
    
    

    The code above is stripped down to the bare minimum the illustrate the idea. The complete code for our custom endpoint (it's implemented as a Flask Blueprint) is a little too much to show here, but you can find it in the repository for our extension: no_fisbroker_blueprint.py