I'm trying to run PCA on a matrix that contains n columns of unlabeled doubles. My code is:
SparkSession spark = SparkSession
Dataset<Row> data = spark.read().format("csv")
.option("sep", ",")
.option("inferSchema", "true")
.option("header", "False")
PCAModel pca = new PCA()
// .setInputCol("features")
// .setOutputCol("pcaFeatures")
Dataset<Row> result = pca.transform(data).select("pcaFeatures");
Running this results in a "java.lang.IllegalArgumentException: Field "features" does not exist." exception. I've found posts: How to merge multiple feature vectors in DataFrame?
How to work with Java Apache Spark MLlib when DataFrame has columns?
Which led me to the VectorAssembler docs here: https://spark.apache.org/docs/latest/ml-features.html#vectorassembler
In each of those examples the labeled column headers are manually being added as features. I haven't been able to figure out how to use the VectorAssembler to turn all of my n unlabeled columns into features. Any insight would be appreciated. Thanks
found the .columns() function
SparkSession spark = SparkSession
Dataset<Row> data = spark.read().format("csv")
.option("sep", ",")
.option("inferSchema", "true")
.option("header", "False")
VectorAssembler assembler = new VectorAssembler()
Dataset<Row> output = assembler.transform(data);
PCAModel pca = new PCA()