I am new to SANSA-STACK and I am using SPARQL Query to perform some operations on Triples RDD , I am using Select with some column names, but when I am completing the query, the column names are getting changed to some random values.
val query = s""" PREFIX ns0: <https://www.example.com/discovery/catalog/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?ColumnRef
WHERE
{
{<https://www.example.com/db/h2/fred/2020/table/FRED.FRED.US_REGIONS}> ns0:column ?ColumnRef .}
}
"""
val result : sql.DataFrame = triples.sparql(query)
result.show()
The output of result.show() has the column name getting changed.
+--------------------+
| o|
+--------------------+
|https://www.examp...|
|https://www.examp...|
|https://www.examp...|
|https://www.examp...|
+--------------------+
I am new to this technology stack, please let me know what I am doing wrong.
Here is a temporary solution that works for my purposes that returns a dataframe with the expected column names. By decomposing rdd.sparql, the rewrite object can be used to obtain the column mappings: https://gist.github.com/JNKHunter/c16caa882993facb31a273ec274cb8e3
Warning: Sansa query often returns more than one column for each sparql selected element. In those cases this code simply concatenates those columns since columns after the first tend to contain empty strings. Concatenation may not be what you want, however I've yet to discover what those empty string dataframe columns represent.