I'm planning a system that combines various data sources and lets users do simple queries on these. A part of the system needs to act as an abstraction layer that knows all connected data sources: the user shouldn't [need to] know about the underlying data "providers". A data provider could be anything: a relational DBMS, a bug tracking system, ..., a weather station. They are hooked up to the query system through a common API that defines how to "offer" data. The type of queries a certain data provider understands is given by its "offer" (e.g. I know these entities, I can give you aggregates of type X for relationship Y, ...).
My concern right now is the unification of the data: the various data providers need to agree on a common vocabulary (e.g. the name of the entity "customer" could vary across different systems). Thus, defining a high level representation of the entities and their relationships is required.
So far I have the following requirements:
I need to be able to define objects and their properties/attributes. Further, arbitrary relations between these objects need to be represented: a verb that defines the nature of the relation (e.g. "knows"), the multiplicity (e.g. 1:n) and the direction/navigability of the relation.
It occurs to me that RDF is a viable option, but is it "the right tool" for this job?
What other solutions/frameworks do exist for semantic data modeling that have a machine readable representation and why are they better suited for this task?
I'm grateful for every opinion and pointer to helpful resources.
If you need cardinality restrictions on relations (for example "a Person knows 1:n Languages"), then RDF is not enough (see http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#richerschemas). You will need ontology languages (at least OWL-DL for cardinalities greater than 1: http://www.w3.org/TR/owl-guide/#owl_cardinality)