Search code examples
data-distribution-service

RTI DDS subscriber not getting data from publisher


Short story: My DDS subscriber cannot see data from my DDS publisher. What am I missing?

Long story:

QNX 6.4.1 VM A -- Broken Publisher. IP ends with .113
QNX 6.4.1 VM B -- Working Publisher. IP ends with .114
Windows 7      -- Subscriber/Main Dev box. IP ends with .203
RTI DDS 5.0    -- Middleware version

I have a QNX VM (hosted on the network, not on my machine) that is publishing some data via RTI DDS. The data never shows up in my Windows 7 subscriber application.

Interestingly enough, I can put the same code on VM B, and the subscriber gets data. Thinking this must be a Windows 7 firewall issue, I swapped VM A's IP address with VM B, but this did not help.

Using Wireshark, I can see some heartbeat traffic from VM A, but no data. From VM B, I see the heartbeat traffic and the data. Below is a sanitized Wireshark snippet. Wireshark Output

NDDS_DISCOVERY_PEERS is set to include both multicast and the explicit IP address of the other side of each conversation. The QOS profiles are the same, and the RTI Analyzer indicates the Match Analysis was successful (all green).

VM A: NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.203

VM B: NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.203

Windows 7 box: NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.113,udpv4://BLAH.114

When I include them in the NDDS_DISCOVERY_PEERS line, other folks on the network can see DDS traffic from VM A with DDS SPY on their Windows 7 box. My Windows 7 box can not.

Windows 7 event log does not appear to show any firewall or WFP stopping the data packets.

RTI DDS Spy run from my Windows 7 machine shows that VM A (0A061071) writers are visible on the network, but no data is being received. It also shows that the readers in my subscriber on my Windows 7 machine are visible, though it shows up at an odd IP address.

Bonus question (out of curiosity only, NOT the primary question): why does traffic on my local machine show up in DDS SPY as 192.168.11.1 instead of my machine's IP or even 127.0.0.1?

RTI DDS SPY Output

Main Question: What am I missing?

Update: route print on my Windows 7 box appears to show that I have joined a multicast group with VM A. netsh interface ip show joins seemed to concur.

Investigation Update:

  1. I rebooted the VM to no effect.

  2. I rebooted the Windows box to no effect.

  3. I removed the multicast from the NDDS_DISCOVERY_PEERS environment variables on both sides to no effect.

  4. The Windows 7 box has three network interfaces (plus loopback): 1 LAN connection and 2 (unrelated) VM adapters. We are working with the LAN connection. The QNX VM has one network interface (plus loopback). Note that the working VM and the broken VM use a different ethernet driver than each other, as they are slightly different flavors of QNX 6.4.1. The broken one has wm0 as the interface, and the working one has en0 as the interface. I don't believe this is the issue, but it is a difference.

  5. I ran DDS SPY on the "broken" QNX VM while it was playing back, and I got DDS data. I don't have a good method to sniff the network between where the VM is hosted and my Windows 7 machine to see if it makes it out of the interface, but looking at the transmitted packet count out of the ethernet interface on the QNX VM indicates that it is definitely transmitting something, and the Wireshark captures on the Windows 7 machine itself show that at least some traffic is making it through.

  6. Other folks on the LAN here can see the DDS traffic from the "broken" VM, which leads me to believe it is a Windows setup issue, rather than a broken VM--I just can't see what it could be. I've re-checked the firewall, to no avail. I would have thought that if it were a firewall issue, the problem would have gone away when I swapped IP addresses between VM A and VM B. In any case, the Windows 7 firewall is currently off, to no avail.

  7. Below are several screens of Wireshark output. I skipped a bunch between the third and the fourth, as after the fourth, the traffic tended to look like the bottom of the fourth until the end.

Image 1 Image 2 Image 3 (Skipped a bunch here) Image 4 (Pretty much continues on like the last 11 lines above)

What else should I try?

Update: To answer Rose's question below, using rtiddsping -publisher on the bad VM and rtiddsping -subscriber works appropriately.

I wonder if this issue is caused by the weird IP address. The IP address it happens to publish and somehow latch on to is a local VM ethernet adapter (separate from VM A). See the screenshot below.

Win7 Ipconfig

The address I would like it to attach to is 10.6.6.203. Any way I can specify that?


Solution

  • More than a year later this happened to me again with a different virtual machine. I had it working yesterday, so I was very suspicious. I scoured all my code changes for the past 24 hours for issues, but didn't find any. Then I decided to see if IT had pushed any patches to my computer.

    Guess what? The Windows Firewall had been aggressively updated since the day before. Rules missing or changed, etc. The log said packets were being dropped. I opened the firewall filters up a bit, and suddenly, everything worked again. I hesitate to close this issue, as I am not 100% this was exactly the same as last year, but it feels very similar. I suspect that last year the settings in the firewall were not logging the packet drops.

    Long and short of it: if DDS suddenly stops working, check your firewall settings.