Trying to get this charset:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
When I print the response headers:
console.log(response.headers);
I don´t get the charset, it should be inside the Content-type:
{
server: 'Apache',
'content-type': 'text/html',
expires: 'Mon, 19 Jan 2015 11:53:58 GMT',
'content-language': 'en', etag: '"95c66e83dfd2080ec86ec4e20964788d"',
'x-pal-host': 'pal115.telhc.bbc.co.uk:80',
'content-length': '120599',
date: 'Mon, 19 Jan 2015 11:53:44 GMT', connection: 'keep-alive',
...
}
How can I get the charset on html 4 web sites in node.js? Thanks in advance.
<meta>
tags are not headers and so their value will not show up in the HTTP response's headers
property. You'll need to parse the response body. This does raise an issue: how do you know how to parse something without knowing its encoding?
This is how web browsers roughly handle files that don't properly define their content type in the Content-Type
header, last time I checked:
UTF-8
.UTF-8
but they happen to be commonly used in Shift JIS
, then you're probably dealing with that. <meta http-equiv="Content-Type">
<meta charset="">
You can see why it's a good idea to always include the Content-Type
header with a character set. For your application, you could leave out step 2, if you're not too worried about some documents being garbled.
As a nice example, StackOverflow itself sets a Content-Type
header and so it has no (need for) <meta charset>
or <meta http-equiv="content-type">
tags.