I'm seeing some occasional missing data with a datagram channel in a tool I'm developing. UDP is part of the requirement here, so I'm mostly just trying to troubleshoot the behavior I'm seeing. The tool is being developed with Java 7 (another requirement), but the computer on which I'm seeing the behavior occur is running on a Java 8 JRE.
I have a decorator class that decorates a call to DatagramChannel.send with some additional behavior, but the call effectively boils down to this:
public int send( ByteBuffer buffer, SocketAddress target ) throws
{
// some additional decorating code that can't be shared follows
int bytesToWrite = buffer.remaining();
int bytesWritten = decoratedChannel.send(buffer, target);
if (bytesWritten != bytesToWrite) {
// log the occurrence
return bytesWritten;
}
}
There is an additional bit of decoration above this that performs our own fragmentation (as part of the requirements of the remote host). Thus the source data is always guaranteed to be at most 1000 bytes (well within the limit for an ethernet frame). The decorated channel is also configured for blocking I/O.
What I'm seeing on rare occasions, is that this routine (and thus the DatagramChannel's send method) will be called, but no data is seen on the wire (which is monitored with Wireshark). The send routine always returns the number of bytes that should have been written in this case too (so bytesWritten == bytesToWrite).
I understand that UDP has reliability issues (for which we have our own data reliability mechanism that accounts for data loss and other issues), but I'm curious about the behavior of the Datagram channel's implementation. If send is returning the number of bytes written, should I not at least see a corresponding frame in Wireshark? Otherwise, I would expect the native implementation to possibly throw an exception, or at least not return the number of bytes I expected to write?
I actually ended up discovering the cause with more fiddling in Wireshark. I was unintentionally filtering out ARP requests, which seem to be the cause of the problem, as mentioned in this answer:
ARP queues only one outbound IP datagram for a specified destination address while that IP address is being resolved to a MAC address. If a UDP-based application sends multiple IP datagrams to a single destination address without any pauses between them, some of the datagrams may be dropped if there is no ARP cache entry already present. An application can compensate for this by calling the Iphlpapi.dll routine SendArp() to establish an ARP cache entry, before sending the stream of packets.
It appears the ARP entries were going stale really quick and the occasional ARP request would cause the dropped packet. I increased the ARP timeout for the interface on the PC and the dropped packet happens much less often now.