Search code examples
javaapache-sparkapache-spark-sqlspark-java

How to get index/position of column in dataframe (Spark sql Java)


I am using Spark Java (not scala, python).

I have to change my code so that my spark query will select all columns rather than a specific set of columns. (Like using select *). Before when I had a specific set of columns, it is easy for me to know the exact position/index of each column because it is in the order of my select. However, since I am now selecting all, I do not know the order exactly.

I need the position/index of particular columns so that I can use the function .isNullAt() because it requires position/index and not the string column name.

I am wondering does using dataframe.columns() give me an array which the exact same index/position I can use for the dataframe methods that require an index/position? And then I can search the array using my string column name to get back the correct index?


Solution

  • From your question I'm guessing you're trying to get the index of a field in a row so you can check nullity.

    Indeed you could use ds.columns() as it will give you the ordered columns and then use the index from here.

    Nevertheless, I would advice to use another method though as you keep the logic inside row processing and it will be more robust. You can use .fieldIndex(String fieldName)

    row.isNullAt(row.fieldIndex("my_column_name"))
    

    See more https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Row.html#fieldIndex(java.lang.String)