java coldfusion chunked-encoding coldfusion-11

ColdFusion - HTTP chunk missing first character

This question is a continuation to my previous question regarding HTTP chunk transfer in ColdFusion. Now, I have used java.net.URL to read the chunks and this is what I have tried:

<cfset local.objURL = createObject("java", "java.net.URL")
                         .init(javaCast("string", "https://test.com/abc.xml"))>

<!--- Open Connection --->
<cfset local.objConnection = local.objURL.openConnection()>

<!--- Input Stream --->
<cfset local.inputStream = local.objConnection.getInputStream()>

<!--- Read Chunks --->
<cfloop condition="true">
    <!--- Get Chunk Length --->
    <cfset local.chunkLength = local.inputStream.read()>
    <cfif local.chunkLength LT 0>
        <cfbreak>
    </cfif>

    <!--- Byte Array --->
    <cfset local.chunk = getByteArray(local.chunkLength)>
    <cfset local.offset = 0>

    <!--- Read Chunk Data --->
    <cfloop condition="local.offset LT local.chunkLength">
        <cfset local.bytesRead = local.inputStream.read(local.chunk, local.offset, local.chunkLength - local.offset)>
        <cfif local.bytesRead LT 0>
            <cfbreak>
        </cfif>
        <cfset local.offset += local.bytesRead>
    </cfloop>
    <!--- Chunk --->
    <cfdump var="#charsetEncode( local.chunk, 'utf-8' )#"><br />
</cfloop>

Using the code above, I am able to read the data but the problem I am facing is that the first character in each chunk is missing i.e.,

First Chunk is: <?xml version="1.0" encoding="utf-8" ?> <root> but I am only getting ?xml version="1.0" encoding="utf-8" ?> <root>

Any suggestions?

Solution

I don't think that this part is correct:

<!--- Get Chunk Length --->
<cfset local.chunkLength = local.inputStream.read()>
<cfif local.chunkLength LT 0>
    <cfbreak>
</cfif>

You expect the chunk length to be at the start of the stream. Why? is this your own protocol? If you are talking about http chunking, you should check if the http response header Transfer-Encoding even has the valuechunked. Otherwise it's simply wrong to assume the content is chunked. Also, you only read one byte. That would mean that the chunk length can be a maximum of 255 bytes, which not very flexible. HTTP chunks can be longer than that and the chunk size is made up from all digits until a line break, such as 1234\r\n.

I strongly suspect that the read() above is always consuming your < and returns a chunkLength of 60, which is the ascii value of <.