I'm starting a project on knowledge bases and wanted to start by downloading a recent dump of Wikidata. I found a data dump called "truthy", but I am not sure if I can trust it.
My understanding from pop culture is that a "truthy" statement is one that is not true and based only on intuition and perception. Thanks, Mr. Colbert.
Why would Wikidata produce a "truthy" data dump where the data is not accurate?
What's also confusing is that there are conflicting definitions. For example, here is the definition of "truthy" data directly from the WikiMedia organization:
Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy.
To me, that quote means that a truthy statement (fact triple) is the preferred one.
This other webpage says this about "truthy":
This contains only “truthy” or “best” statements, without qualifiers or references.
What am I got make of this? Is this "truthy" data reliable and believable or not?
In Wikidata, each statement has an associated rank: preferred rank, normal rank, deprecated rank. The default value is normal rank but everybody (registered and anonymous users) can change the rank to one of the other values. There are no rules enforced how to assign the ranks. Generally, deprecated rank is used for proven faults. Preferred rank is often used for the most up-to-date value in time series.
The "truthy" data dump does not contain any statements with deprecated rank and if there are statements with normal and preferred rank, only the statements with preferred rank are in the dump.
If you want to get in touch with the Wikidata community, go to the Wikidata project chat. If you prefer to communicate directly with the developpers of Wikidata/Wikibase, go to this page.