I know how to compare two lists in Scala using zip
+ forall
.
My question is how do we compare two DataFrame
schemas. That is, we want to match column names with their nullable property.
My idea is to use hash map to store {column name: nullable}, and do the comparison. I guess it works, but is there any other idiomatic way?
First you should retrieve the elements you want to compare as Tom Lous said in his answer:
val s1 = df1.schema.fields.map(f => (f.name, f.nullable))
val s2 = df2.schema.fields.map(f => (f.name, f.nullable))
Then you can just make use of the diff
method from Lists, which will return the differences, if that method returns and empty list, then there is no difference, otherwise there is:
s1.diff(s2).isEmpty
returns: true if no difference was found, false otherwise
Consider that the diff method returns no difference when a field is present in one list but not in the other one. So you may need to attach a second condition to compare lengths
s1.diff(s2).isEmpty && s1.length == s2.length