I basically have a procedure where I make multiple calls to an API and using a token within the JSON return pass that pack to a function top call the API again to get a "paginated" file.
This all works fine until I get to the final call, because my statement is expecting a column (json value) that no longer exists because its the end of the paginated collection.
How can I test for the existence of the field before I attempt to do a dataframe.select that doesn't return the column and thus fails my procedure.
Schema Example
root
|-- d: struct (nullable = true)
| |-- __next: string (nullable = true)
| |-- results: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- __metadata: struct (nullable = true)
| | | | |-- type: string (nullable = true)
| | | | |-- uri: string (nullable = true)
| | | |-- assignmentClass: string (nullable = true)
| | | |-- assignmentIdExternal: string (nullable = true)
| | | |-- compInfoNav: struct (nullable = true)
My code
df = df.select(col('d.__next').alias("nexttoken"), explode(col('d.results')).alias("result"))
Essentially during the loop at some point the __next value will disappear, but I still use this code it obviously then doesn't find it and errors.
Any help would be appreciated.
Since you want to check for the existence of __next
field before using DataFrame.select()
, you can use the following code. This code specifically works for the schema that you havee provided.
d_fields = df.schema['d'].dataType.fieldNames()
# Type of d_fields is 'list', its values are String type
# In your case, d_fields has values ['__next', 'results']
if('__next' in d_fields):
df = df.select(col('d.__next').alias("nexttoken"), explode(col('d.results')).alias("result"))
When we use df.schema[‘d’].dataType.fieldNames()
it returns a list of all the fields present in the d:struct
column. So, you can use if conditional statement
to check if '__next'
exists in this list or not. At some point in the loop, when the d.__next
field is no longer available, the if condition fails and does not throw an error.