Problem trying to extract words from string in PHP

I'm trying to extract all words from a string into an array, but i am having some problems with spaces ( ).

This is what I do:

//Clean data to text only
$data = strip_tags($data);
$data = htmlentities($data, ENT_QUOTES, 'UTF-8');
$data = html_entity_decode($data, ENT_QUOTES, 'UTF-8');
$data = htmlspecialchars_decode($data);
$data = mb_strtolower($data, 'UTF-8');

//Clean up text from special chrs I don't want as words
$data = str_replace(',', '', $data);
$data = str_replace('.', '', $data);
$data = str_replace(':', '', $data);
$data = str_replace(';', '', $data);
$data = str_replace('*', '', $data);
$data = str_replace('?', '', $data);
$data = str_replace('!', '', $data);
$data = str_replace('-', ' ', $data);
$data = str_replace("\n", ' ', $data);
$data = str_replace("\r", ' ', $data);
$data = str_replace("\t", ' ', $data);
$data = str_replace("\0", ' ', $data);
$data = str_replace("\x0B", ' ', $data);
$data = str_replace("&nbsp;", ' ', $data);

//Clean up duplicated spaces
do {
   $data = str_replace('  ', ' ', $data);
} while(strpos($data, '  ') !== false);

//Make array
$clean_data = explode(' ', $data);

echo "<pre>";
var_dump($clean_data);
echo "</pre>";

This outputs:

array(58) {
  [0]=>
  string(5) " "
  [1]=>
  string(5) " "
  [2]=>
  string(11) "anläggning"
  [3]=>
  string(3) "med"
  [4]=>
  string(3) "den"
  [5]=>
  string(10) "erfarenhet"
  [6]=>
  string(3) "som"
}

If i check source for output i see that the first 2 array values is  .
No matter how I try, I can't remove this from the string. Any ideas?

UPDATE:
After some tweaking with code i manage to get following output:

array(56) {
  [0]=>
  string(1) "�" //Notice change. Instead of string length 5 it now says 1. But still its garbage.
  [1]=>
  string(1) "�"
  [2]=>
  string(11) "anläggning"
  [3]=>
  string(3) "med"
  [4]=>
  string(3) "den"
  [5]=>
  string(10) "erfarenhet"
  [6]=>
  string(3) "som"
  [7]=>
  string(5) "finns"
  [8]=>
  string(4) "inom"

Thanks!

ANSWER (for lazy people):

Even thou this is a slightly different approach to the problem, and it never really answers why I had the problems I had above (like leftover   and other extra weird spaces), I like it and it is a lot better than my original code.

Thanks to all who contributed to this!

//Clean data to text only
$data = strip_tags($data);
$data = html_entity_decode($data, ENT_QUOTES, 'UTF-8');
$data = htmlspecialchars_decode($data);
$data = mb_strtolower($data, 'UTF-8');

//Clean up text from special chrs
$data = str_replace(array("-"), ' ', $data);    

$clean_data = str_word_count($data, 1, 'äöå');

echo "<pre>";
var_dump($clean_data);
echo "</pre>";

Solution

Ok, the only thing you would have to do is to replace   with a space as you already do (only if the string really still contains   check @Andy E's answer to make sure that that your data does not contain any HTML entities.):

$data = str_replace("&nbsp;", ' ', $data);

Then you can use str_word_count to get the words:

$words = str_word_count($data, 1, 'äöåÄÖÅ');

P.S.: What is the sense of calling htmlentities first and then revert it again in with html_entity_decode anyway?

Update: Example:

$str = '      anläggning med den      erfahrenhet som åååÅ ÅÅ';
print_r(str_word_count($str, 1, 'äöåÄÖÅ'));

prints

Array
(
    [0] => anläggning
    [1] => med
    [2] => den
    [3] => erfahrenhet
    [4] => som
    [5] => åååÅ
    [6] => ÅÅ
)

Reading documentation helps :)