Search code examples
socketsethernetlwipstm32f4

lwip stm32 - http requests failing


I running freeRTOS and lwip 1.4.1 with the socket api in use on an stm32 processor (stm32f407). Overall it works pretty fine. I can send and receive data with udp and tcp.

But in a timewindow of 3 to 7 days I see a strange behavior.

My Problem

Every 3 to 7 days my client (Windows 10, which sends 1-2 HTTP-Requests per second) fails to send those requests. When this happens, there are ~10 Requests successively, which are failing. In very few moments, the stack won't regenerate at all.

My Guess

I think I have possibly missconfigured something in my LWIP config. Because the stack is well used and shouldn't have any bugs in this direction

My Ethernet settings

server and client are directly connected, no switch,hub or router in between.

server (stm32/lwip):

  • static, 192.168.168.2
  • netmask, 255.255.255.0

client (win10) eth0:

  • static, 192.168.168.1
  • netmask, 255.255.255.0

client (win10) eth1:

  • dhcp, to normal working network

My Tries

At the moment I have tests running which are sending ~7-8 Requests per second, but the error doesn't apply more often. I played around with the lwip config:

  • more memory for the stack
  • more pbufs
  • bigger pbufs
  • with/without backlog

But everything without improving of this connection problem. Could it be because of the often reused port numbers from the client, which could make this problem?

Here I have the relevant part of the lwip debuging output:

tcp debugging output

https://pastebin.com/a9JabhET

Here the Wireshark log:

orig screenshot

hole wireshark log:

https://www.file-upload.net/download-12682664/debug_tcp_00001_20170828172950.html

And here my lwipopts.h:

lwip configuration:

https://pastebin.com/cW0v4hF6


Solution

  • I have a new Job and am no longer working on this issue.

    Befor I stated my new job I could show that it was not a memory Problem on LwIP (I defined unreasonable large pbufs and memorypools) they never reached their limits.

    The problem was in the DMA driver for the ETH. When once reached the memory chain end of the DMA driver the chain elements never got freed, so I run into RBU (Receive Buffer Underrun) problems and the RBU Flag never got reseted again and the DMA ETH driver was hanging in this RBU interrupt (even if there where enough LwIP buffs to write to from DMA chain). So I added a sledgehammer fix to the DMA driver and disabled the RBU interrupt (I am polling the RBU flag in multiple situations and clear it if needed, and start to read again from ETH).

    I think since then the problem is more or less "solved". Not nice, but it worked.

    I've got some information of my coworker at my old working place: The RBU Interrupt and the clear did not work, because our used CAN stack did not work very well with FreeRTOS, the CAN stack used on busy systems much over 90% of CPU time, which let to the strange behaviour in ETH driver and LWIP.