For example, I have a SQL query
SELECT id, "delta.`/example/table/path`" FROM delta.`/example/table/path` WHERE str LIKE "%delta.`/example/table/path`"
How can I only replace the occurrences of
FROM delta.`/example/table/path`
to
FROM new_table_name
By walking down the logical plan tree lp
generated by
val plan = spark.sessionState.sqlParser.parsePlan(query)
var lp = spark.sessionState.analyzer
.executeAndCheck(plan, new QueryPlanningTracker)
I know that all table are represented by LogicalRelation
node and from there, I lost direction.
Project [ID#1619]
+- Join Inner, (fk#1620 = ID#1643)
:- SubqueryAlias T1
: +- SubqueryAlias spark_catalog.delta.`T1`
: +- Relation [ID#1619,fk#1620] parquet
+- SubqueryAlias T2
+- SubqueryAlias spark_catalog.delta.`T2`
+- Relation [ID#1643,amount#1644,cost#1645] parquet
Does anyone have suggestion on how can I only replace the table name? Thank you in advance!
I have tried to retrieve all the table names by walking down the tree
def getTableNamesOnlyWrapper(root: LogicalPlan): Set[String] = {
var nameSet: Set[String] = Set()
def getTableNamesOnly(root: LogicalPlan, prevAlias: String): Unit = {
root match {
case a: SubqueryAlias => {
getTableNamesOnly(a.child, a.alias)
}
case r: LogicalRelation => {
val name = r.catalogTable.map(_.identifier.unquotedString).getOrElse(prevAlias)
nameSet += name
}
case p: LogicalPlan =>
for (c <- p.children) {
getTableNamesOnly(c, prevAlias)
}
}
}
getTableNamesOnly(root, "")
println("kebing name set is: " + nameSet)
nameSet
}
And then match the table in the original SQL and replace them with new_table_name
. However, this doesn't handle the case where the same table name appear as string literals.
SELECT id, "delta.`/example/table/path`" FROM delta.`/example/table/path` WHERE str LIKE "%delta.`/example/table/path`"
String literals appear as an instance of the Literal expression.
In pattern matching you can use StringLiteral to simplify.
There are many combinations of structure in the Spark plans that aren't easy to predict and may change in each release. To navigate / change things you should prefer the in-built operations on QueryPlan/TreeNode (transformAllExpressionsWithSubqueries for literals seems appropriate and transformWithSubqueries to wrap).
It's often best to look at the source code for examples of transformations, the query optimisations are great for this kind of insight, CTESubstitution looks pretty complete for dealing with cte's, subqueries, aliases etc.