My Tcl source files are in utf-8. Tclhttpd would not send national characters properly, so I modified it a bit. However, I also send binary stuff like jpg images and sometimes binary chunks are present in my otherwise utf-8 HTML. I have difficulty calculating the proper Content-length to match exactly what the browser receives (otherwise some trailing characters clobber the next-request headers or the browser keeps waiting 30 sec per request, until a timeout).
In other words, can I please know how many bytes did puts $socket
write into the socket?
I have discovered a particular 11-byte sequence that messes up counting:
proc dump3 string {
binary scan $string c* c
binary scan $string H* hex
return [sdump $string]\n$c\n$hex
};#dump3
proc Httpd_ReturnData {sock type content {code 200} {close 0}} {
global Httpd
upvar #0 Httpd$sock data
#...skip non-pertinent code...
set content \x4f\x4e\xc2\x00\x03\xff\xff\x80\x00\x3c\x2f
#content=ONÂÿÿ�</
#79 78 -62 0 3 -1 -1 -128 0 60 47
#4f4ec20003ffff80003c2f
puts content=[dump3 $content]
puts utf8=[dump3 [encoding convertto utf-8 $content]]
if {[catch {
puts "string length=[string length $content] type=$type"
puts "stringblength=[string bytelength $content]"
set len [string length $content]
if [string match -nocase *utf-8* $type] {
fconfigure $sock -encoding utf-8
set len [string bytelength $content]
}
puts "len=$len fcon=[fconfigure $sock]"
HttpdRespondHeader $sock $type $close $len $code
HttpdSetCookie $sock
puts $sock ""
if {$data(proto) != "HEAD"} {
##fconfigure $sock -translation binary -blocking $Httpd(sockblock)
##native: -translation {auto crlf}
fconfigure $sock -translation lf -blocking $Httpd(sockblock)
puts -nonewline $sock $content
}
Httpd_SockClose $sock $close
} err]} {
HttpdCloseFinal $sock $err
}
}
The output on console is:
content=ONÂÿÿ�</ 79 78 -62 0 3 -1 -1 -128 0 60 47 4f4ec20003ffff80003c2f utf8=ONÃ�ÿÿÂ�</ 79 78 -61 -126 0 3 -61 -65 -61 -65 -62 -128 0 60 47 4f4ec3820003c3bfc3bfc280003c2f string length=11 type=text/html;charset=utf-8 stringblength=17 len=17 fcon=-blocking 0 -buffering full -buffersize 16384 -encoding utf-8 -eofchar {{} {}} -translation {auto crlf} -peername {128.0.0.71 128.0.0.71 55305} -sockname {128.0.0.8 gen 8016} HttpdRespondHeader 17
The resultant Content-Length: 17 is too much, the browser keeps waiting. If I only could know beforehand, how many bytes puts
will make out of my string, the rest would be easy. Is there a way?
For data going over HTTP, the content length should be the number of bytes in the data as observed on the wire. When working with Httpd_ReturnData
you need to ensure that you provide it the binary data to transfer; it does not handle encoding the data for you.
To send binary data with a length it's actually easy, and you do:
set binaryData [...]
Httpd_ReturnData $sock "application/octet-stream" $binaryData
# There are many other binary encodings; that's just the most universal one
# Choose the right one for your application, of course
To send text data with a length, you need to do a little more work with encoding convertto
:
set textData [...]
Httpd_ReturnData $sock "text/plain; charset=utf-8" \
[encoding convertto utf-8 $textData]
# Similarly, text/plain is a decent fallback here too
(Yes, if you choose a different encoding then you should mention that in both places. You probably ought to use UTF-8 for all text content in this day and age.)
If you can pull the data from a file, you should do so; Httpd_ReturnFile
is more efficient than Httpd_ReturnData
as it can move the data using efficient data transfer techniques. If sending a text file, you need to be careful to describe the encoding of the file correctly. By far the easiest way to do that is by convention, such as deciding that all text files on your system are UTF-8...
You should virtually never use string bytelength
, as that reports in units that are one of Tcl's internal-only encodings (a lightly-denormalized almost-UTF-8). The measure it returns is only correct when you're doing something very weird like generating C code that needs to know buffer sizes that contain strings that will be fed into Tcl's implementation, which is very much not what you're doing (I've only done that sort of thing once in more than 20 years of using Tcl; I've never heard of another legitimate use). I believe it is deprecated precisely because it has a bunch of subtle bugs in how it is used by all too many people.