I have the following PySpark DataFrame df
:
df.printSchema()
|-- yearday: integer (nullable = true)
|-- month: integer (nullable = true)
|-- dayofweek: integer (nullable = true)
|-- year: integer (nullable = true)
When I apply VectorAssembler
, the features
are converted into string
values instead of original integer
values.
from pyspark.ml.feature import VectorAssembler
vectorAssembler = VectorAssembler(inputCols = ['yearday', 'month', 'dayofweek', 'year'], outputCol = 'features')
df = vectorAssembler.transform(df)
df.select(['features']).show()
This is how the output looks like:
How can I get integers in features
?
I suspect it's a display bug... it should be an integer. Try the code below to confirm what type the vectors contain.
from pyspark.ml.param import TypeConverters
print(TypeConverters.toList(df.select('features').take(1)[0][0]))