We have a Samba server that is backing up to an S3 bucket. Come to find out that a large number of file names contain inappropriate characters and the AWS CLI won't allow the transfer of those files. Using the "worst offender" I build a quick regex check, tested in rubular against another file name to try and generate a list of files that need to be fixed:
([ä¸æ–‡ç½‘页我们的团队å™é¹â€“¦]+)
The command I'm running is:
find . -regextype awk -regex ".*/([ä¸æ–‡ç½‘页我们的团队å™é¹â€“¦]+)"
This brings back a small list of files that contain the above string, in order, not individual characters contained throughout the name. This leads me to believe that either my regextype is incorrect or something is wrong with the formatting of the list of characters. I've tried types emacs and egrep as they seem most similar to regex I've used outside of a Unix environment to no luck.
My test file name is: this-is-my€™s'-test-_ folder-name.
which, according to my rubular tests, should be returned but isn't. Any help would be greatly appreciated.
Your regex .*/([ä¸æ–‡ç½‘页我们的团队å™é¹â€“¦]+)
expects one of the special characters after the slash and your test file doesn't start with one of these characters.
You might try something more like .*[ä¸æ–‡ç½‘页我们的团队å™é¹â€“¦]+.*
instead.