I'm trying to parse minimal mark-up text by lines. Currently I have a for loop that parses letter by letter. See the code below:
Text:
<element id="myE">
This is some text that
represents accurately the way I
have written my html
file.
</element>
code:
var list = document.getElementById("myE").innerHTML;
var tallie = 0;
for (i=1;i<list.length;i++) {
if (/*list[i] == " "*/ true) {
list += 1;
console.log(list[i]);
}
}
console.log(tallie);
As expected, the text embedded in the element renders in the DOM as though it were a continuous, properly formatted string. But what I'm finding is that the console recognizes the difference between a non-breaking space and a new line. where " "
and
"
"
represent the two respectively.
Since the console appears to know the difference, it seems there should be a way to test for the difference. If you unlock the commented condition, it will start testing for non-breaking spaces. I think there is another way to do this using the character encoding string (not  , another one). It seems reasonable then to expect to be able to find a character code for a breaking space. Unfortunately I can not find one.
Long story short, how can I achieve a true line by line parsing of an html file?
Newline characters are encoded with \n
. Sometimes you will also find combinations of carriage return and new line \r\n
(see wikipedia on Newline). These should not be confused with a Non Breaking Space
or  
which are used if you want the browser to not word wrap but still display a space or if you want the browser to not collapse multiple spaces together.