Search code examples
javaapache-sparkdatasetfilteringapache-spark-dataset

How to filter a Column and delete a Row in Dataset Spark using Java


I need to filter a Dataset searching for Special Characters and remove the row where it was found. I tried to replace the special character with " ", but it doesn't worked either.

Dataset<row> dataset;
dataset.withColumn("nameColumn", function.regex_replace(dataset.col("nameColumn"), "[^\\p{ASCII}]", "")); 

Solution

  • You can just filter them:

    
    filitered_ds = dataset.where(!col("nameColumn").rlike("[^\p{ASCII}]"))