Search code examples
pythonapache-sparkpysparkrow

Pyspark create Row with non alphanumeric characters in name


Is there a way in pyspark to create a Row whose fields contain non alphanumeric characters?

E.g.

from pyspark.sql import Row
Row(my-field='myvalue') # does not work because my-field can't be parsed by python
Row(**{'my-field':'myvalue'}) # I was expecting this workaround to work but 
# it gives "TypeError: Can not infer schema for type: <class 'str'>"

Solution

  • It is possible:

    >>> from pyspark.sql import Row
    >>> P = Row("foo-bar", "date")  # use it as a class factory
    >>> P("a", "b")
    Row(foo-bar='a', date='b')
    

    Mind you, not every serialization format (e.g. Parquet, ORC) will deal properly with certain special characters in column names. Better stick with ASCII.