Is there a way to check if a string is alphanumeric in erlang

I am collecting tweets from twitter using erlang and I am trying to save only the hashtags to a database. However when I'm converting the bitstrings to list-strings all the non-latin-letter tweets converts to strange symbols. Is there any way to check if a string is only containing alphanumeric characters in erlang?

Solution

The easiest way is to use regular expressions.

StringAlphanum = "1234abcZXYM".
StringNotAlphanum = "1ZXYMÄ#kMp&?".

re:run(StringAlphanum, "^[0-9A-Za-z]+$").
>> {match,[{0,11}]}

re:run(StringNotAlphanum, "^[0-9A-Za-z]+$").
>> nomatch

You can easily make a function out of it...

isAlphaNum(String) -> 
    case re:run(String, "^[0-9A-Za-z]+$") of
        {match, _} -> true;
        nomatch    -> false
    end.

But, in my opinion, the better way would be to solve the underlying Problem, the correct interpretation of unicode binary strings.

If you want to represent unicode-characters correctly, do not use binary_to_list. Use the unicode-module instead. Unicode-binary strings can not be interpreted naiveley as binary, the UTF-8 character encoding for example has some special constraints that prevent this. For example: the most significant bit in the first character determines, if it is a multi-byte character.

I took the following example from this site, lets define a UTF8-String:

Utf8String = <<195, 164, 105, 116, 105>>.

Interpreted naiveley as binary it yields:

binary_to_list(Utf8String).
"Ã¤iti"

Interpreted with unicode-support:

unicode:characters_to_list(Utf8String, utf8).
"äiti"