Search code examples
networkingtcpkeepalived

Keepalived gets into a bad state where a single packet become repeatedly flooded


I have two servers running Keepalived with failover and load balancing using direct routing. The setup will work fine for some time. Eventually, it will stop responding. When I look at tcpdump I see a flood of messages like this:

15:14:55.943992 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944173 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944183 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944370 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944379 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944571 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944581 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944755 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944764 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944952 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944967 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945140 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945150 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945322 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945331 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945506 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945514 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945701 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945710 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0

10.31.109.208 is my address. The packets continue even when I close my browser. Restarting keepalived or Nginx does not fix the issue. Rebooting is the only thing that seems to fix it. When this happens the server can't even talk to itself on that interface which makes me think its not a routing issue.


Solution

  • Follow the instructions here. They're old but they still apply. http://gcharriere.com/blog/?p=339

    You need to add an IPTables prerouting rule to the second system so packets don't bounce back and forth.

    Something like this with 192.168.9.100 being the VIP:

    iptables -A PREROUTING -t nat -d 192.168.9.100 -p tcp -j REDIRECT
    

    Make sure to remove it whenever that machine becomes the master. IPTables rules can be added more than once so make sure you check if it doesn't already exist before adding it.