Find and remove non-printing characters from data

I was expecting both the records to return in case of following query. But I got back only second record.

WITH students_results(student_id, result) AS (VALUES
(1,  '️सिनेमा'  ),
(2, 'सिनेमा'))
SELECT
    student_id,
    result
FROM students_results
where result = 'सिनेमा'

The first record has a non printing unicode at the beginning \ufe0f

How do I find and remove such characters from data?

Solution

This is how you find such data.

WITH students_results(student_id, result) AS (VALUES
 (1, '️सिनेमा'),
 (2, 'सिनेमा')
)
SELECT student_id
FROM students_results
WHERE char2hexint(result) like '%FE0F%'

As for removing it, this is not quite what Athena is for. As you will probably be aware, it's an analytics, not an ETL service. I would advise to cleanse your data using AWS Glue, or even locally if it's applicable to your use case.