Search code examples
wikidataopenrefinereconcile

Create wikidata items from records in OpenRefine (and not rows)?


I read that OpenRefine Wikidata plugins always operates in row mode.

I am in a situation where I have data in records mode : The record is a serial/magazine, and the rows in this records are the various formats of the same serial/magazine (typically, paper and electronic version). Each row has a unique ISSN identifier.Wikidata considers there is only one item for the serial/magazine (my records), but no separate items for each of the formats (my rows).

When reconciling data to Wikidata, all rows of the same record will typically match the same wikidata item, or none of the rows will match, or sometines only one row of the record will match (e.g. if only one ISSN of the format - say paper format - is known in Wikidata, but not the others).

enter image description here

What I would like to do is create items in Wikidata for each records for which no reconciliation result was found (iow, for which no rows has matched), and not for each row. And, when creating this item, I would like to add the ISSNs of all the rows in this record.

I am wondering if it is possible to do that ? and how ?

Thanks


Solution

  • Yes, it is possible. You need to perform the reconciliation operation on the first column instead.

    • As mentioned by the documentation, use the Fill down operation on the first column, which defines your records;
    • Reconcile the column to Wikidata;
    • Then, the Create one new item for similar cells action (in the Reconcile -> Actions menu)
    • Create a schema where the first column is used as subject id.

    Assuming the values in your first column are initially distinct (which is the case in your example), this will create one item per record.

    In your example, because your first column contains ISSNs and not titles, I would first create a root column with titles instead (before the process explained above). In rows mode, facet to keep the first row of each record by selecting non-blank values in the first column, and then copy your column with titles, and move this new column in first position. This should ensure that reconciliation picks up existing items. Note that if the same title is used by multiple journals this will create a single item for both of them, unless you add other properties in your reconciliation configuration (such as ISSN).