CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


First, I hope you're all well and staying safe.
Second, I want to give a "heads up" that you should see more activity here shortly, and maybe a few cosmetic changes.
I'll post more details to the "Announcements" forum soon, so be on the lookout. -E

 

Results 1 to 15 of 15

Thread: Checkpoint VRRP failover issues

  1. #1
    Join Date
    2012-08-06
    Posts
    63
    Rep Power
    9

    Default Checkpoint VRRP failover issues

    Hi there,

    First let me say that I don't possess the 100% knowledge about the Checkpoint products. Yet, so far, I have managed to find my way around. Bear with me.

    Now to our setup and the problem.

    Two IP290s (6.2-GA) running R71 form a VRRP cluster. (If that is the term you call it.) There's VRID1 (not used anywhere else in our network) with about 20 adresses/VLANs in it. There is a synchronization link too = an extra VLAN on our core switches since the CPs are at different sites.

    Code:
    # cphaprob state
    Cluster Mode:   Sync only (IPSO cluster)
    Number     Unique Address  Firewall State (*)
    1 (local)  192.168.97.8    Active
    2          192.168.97.9    Active
    (*) In IP Clustering/VRRP FW-1 also monitors the cluster status
    When I have large NFS transfers across our switches (same VLAN, firewall not involved) our network starts to buckle. But only after a certain time.

    Meaning: on the Checkpoint VRRP some VLANs become MASTER on the backup member.
    1) Not all become master,
    2) As far as I can tell the previous master in that case also does not relinquish its master role,
    therefore creating a hell of a mess.

    Looking at the logs of the VRRP backup I see that it first becomes MASTER then BACKUP router again very quickly. It does this a few times, then seems to stay in MASTER mode for some networks whereas it should not be. The log of the actual master router/fw is clean. There are no signs of any vrrp_vr_... things going on.

    Usually the mess is on the transit network to the internal LAN. Therefore I don't have packet captures because it's difficult to start captures on some inside host when the firewall is not letting you through... Obviously next time I believe this could happen, I can start that in advance.

    Now, I am not sure if the problem is related to our switches too or if it solely is a firewall issue. In any case the switch shows the VRRP MAC address (00-00-5E-...) behind the wrong port, obviously.

    What could lead to such an issue in your opinion? What would be the most common pitfalls in the CP configuration? Is there anything in the CP config that I can optimize to definitely factor out these split brains?

    Best regards,

    Marki


    PS. I tried inspecting the setup now that everything works again:

    Code:
    STANDBY# fw monitor -e "dst=224.0.0.18;"
    <no output>
    Code:
    STANDBY# tcpdump -s 9999 -i eth2 ip proto 112
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth2, link-type EN10MB (Ethernet), capture size 9999 bytes
    14:55:25.582177  I IP 192.168.98.9 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 253, authtype none, intvl 1s, length 20
    Tcpdumps only shows VRRP HELLOs on interface eth2. FWMonitor shows no packets toward the multicast address.

    Oddly, on the VRRP MASTER, neither FWMonitor nor tcpdump dump any VRRP packets at all.

    Yet, the ones I see on the standby must come from somewhere....???

    All of this isn't making me feel very well right now.

  2. #2
    Join Date
    2005-08-14
    Location
    Gig Harbor, WA, USA
    Posts
    2,499
    Rep Power
    18

    Default Re: Checkpoint VRRP failover issues

    If the secondary does not see VRRP Hello packets of a higher priority, then it assumes the master has died and starts advertising VRRP Hello packets.
    If you're only seeing VRRP Hello packets from some interfaces but not others, that's an issue you'll need to sort out in your network.
    However, the gateway should never be "partially" master: it should either be all master or all backup.

    I believe there is a fix for this in the current IPSO 6.2 MR4/MR4a release, which it's not clear if you're running or not since you did not post the build number.
    http://phoneboy.org
    Unless otherwise noted, views expressed are my own

  3. #3
    Join Date
    2012-08-06
    Posts
    63
    Rep Power
    9

    Default Re: Checkpoint VRRP failover issues

    Quote Originally Posted by PhoneBoy View Post
    If you're only seeing VRRP Hello packets from some interfaces but not others, that's an issue you'll need to sort out in your network.
    Not seeing packets because they get lost on the network is one thing, seeing packets on one Checkpoint incoming but not seeing them being produced/outgoing on the other checkpoint is what worries me. (see the "PS." section) (This is MR2)
    BTW the TX and RX advertisement counters for the master and backup router respectively in 'show vrrp stats' seem to increase correctly.
    Last edited by jeronimo; 2013-12-31 at 22:49.

  4. #4
    Join Date
    2006-09-26
    Posts
    3,199
    Rep Power
    18

    Default Re: Checkpoint VRRP failover issues

    Quote Originally Posted by jeronimo View Post
    When I have large NFS transfers across our switches (same VLAN, firewall not involved) our network starts to buckle. But only after a certain time.
    I last touched the Nokia about five years ago, with IPSO 4.2 and NGx R65.

    However, according to what you stated above, look like you have a network issue and not related to firewall. The firewall VRRP master/backup flapping because somewhere in your network, you have a layer-2 issue.

    to confirm that it is not a firewall/checkpoint issue, I suggest you do the following during a maintenance windows:

    1- perform cpstop on both Nokia firewalls,
    2- enable ip forwarding in IPSO with "ipsofwd admin on" or something like that to turn the Nokia into a routing device. Do this on both Nokia,
    3- at this point, Checkpoint is not part of the equation anymore. Only IPSO is in play here. Use "show vrrp" and "show vrrp interface" and see if your interface is still flapping. If the interface is still flapping, then you know you have a layer-2 issue.

    Keep in mind that this is vrrp NOT ClusterXL so when you stop checkpoint process, VRRP is still in play. At that point, VRRP should communicate without any issues without Checkpoint firewall interference. If you still have VRRP issue, then your layer-2 is not correct.

    If your VRRP is stable at that point, then you can troubleshoot with "cpstart" turn on on both firewalls.

    my 2c.

  5. #5
    Join Date
    2012-08-06
    Posts
    63
    Rep Power
    9

    Default Re: Checkpoint VRRP failover issues

    Thanks cciesec2006 for your opinion.
    Indeed I too believe this is rather network than firewall related.

    However, I still wonder why I don't see what I'd like to see using tcpdump which doesn't show outgoing VRRP hellos on the interfaces.
    The backup however receives VRRP HELLOs but according to tcpdump, only on one interface.

    Both master an backup firewalls have interfaces with only tagged vlans on them
    - eth1c...
    - eth2c...
    - ...
    - eth6c...
    This setup is symmetrical on both firewalls.
    Running tcpdump on eth2 of the backup router shows VRRP HELLOs from an IP that should be on eth1r1 on the master...
    It seems all really messed up. But if it really were that messed up, probably nothing would be working, so it's probably more correctly working than it seems.

    In any case, the VRRP counters I previously mention constantly increase.
    I must be doing something wrong or something here is really buggy.
    Last edited by jeronimo; 2014-01-01 at 13:41.

  6. #6
    Join Date
    2006-09-26
    Posts
    3,199
    Rep Power
    18

    Default Re: Checkpoint VRRP failover issues

    Quote Originally Posted by jeronimo View Post
    Thanks cciesec2006 for your opinion.
    Indeed I too believe this is rather network than firewall related.

    However, I still wonder why I don't see what I'd like to see using tcpdump which doesn't show outgoing VRRP hellos on the interfaces.
    The backup however receives VRRP HELLOs but according to tcpdump, only on one interface.

    Both master an backup firewalls have interfaces with only tagged vlans on them
    - eth1r...
    - eth2r...
    - ...
    - eth6r...
    This setup is symmetrical on both firewalls.
    Running tcpdump on eth2 of the backup router shows VRRP HELLOs from an IP that should be on eth1r1 on the master...
    It seems all really messed up. But if it really were that messed up, probably nothing would be working, so it's probably more correctly working than it seems.

    In any case, the VRRP counters I previously mention constantly increase.
    I must be doing something wrong or something here is really buggy.
    1- That's why I suggested that you performed "cpstop" on both firewalls to remove Checkpoint from the equation so that only VRRP is in play
    2- you are not seeing outgoing VRRP hello from the master interface. That is because the checkpoint policy is blocking you from doing so. Again, you can use step #1 to eliminate checkpoint from the equation or you can add the following rule to the firewall policy at the very top:
    source: All Firewalls
    destination: vrrp_224.0.0.18
    service: vrrp
    track: log
    This rule will allow VRRP to be communicated between firewall cluster members

    3- It could be that your firewall topology is not configured correctly. Again, you can perform step #1 to eliminate Checkpoint from the equation

  7. #7
    Join Date
    2012-08-06
    Posts
    63
    Rep Power
    9

    Default Re: Checkpoint VRRP failover issues

    Quote Originally Posted by cciesec2006 View Post
    you are not seeing outgoing VRRP hello from the master interface because the checkpoint policy is blocking you from doing so
    Hmm, I don't get this part. Using tcpdump, I am sniffing the wire, am I not? So if checkpoint would block the outgoing HELLOs, and that would be the reason why tcpdump doesn't show them, why would there be incoming HELLOs on the backup?

    Oh, and the first object in the FW rules is a rule that allows service VRRP from the FW Cluster Object to anywhere. That rule doesn't seem to be hit though.
    Last edited by jeronimo; 2014-01-01 at 10:08.

  8. #8
    Join Date
    2006-09-26
    Posts
    3,199
    Rep Power
    18

    Default Re: Checkpoint VRRP failover issues

    Quote Originally Posted by jeronimo View Post
    Hmm, I don't get this part. Using tcpdump, I am sniffing the wire, am I not? So if checkpoint would block the outgoing HELLOs, and that would be the reason why tcpdump doesn't show them, why would there be incoming HELLOs on the backup?
    I don't know your environment. I am only suggesting a way to troubleshoot this. That's why I suggest you do "cpstop" on both firewalls. If that is not possible, make sure that from FW1 you can ping FW2 and vice versa. That will confirm that everything is reachable by both firewalls.

    Quote Originally Posted by jeronimo View Post
    Oh, and the first object in the FW rules is a rule that allows service VRRP from the FW Cluster Object to anywhere. That rule doesn't seem to be hit though.
    I suggest you do what I advised earlier like this

    source: All Firewalls
    destination: vrrp_224.0.0.18
    service: vrrp
    track: log
    This rule will allow VRRP to be communicated between firewall cluster members

    do you also "log" the global implied rule as well? If you put Originate from firewall in the global rule as "first", you will not see it in the explicit rule.

  9. #9
    Join Date
    2005-08-14
    Location
    Gig Harbor, WA, USA
    Posts
    2,499
    Rep Power
    18

    Default Re: Checkpoint VRRP failover issues

    Independent of figuring out the Layer 2 issue, which definitely causes the split brain issue, I still recommend upgrading to MR4/MR4a to resolve the issue where the gateway becomes partially active/standby.
    http://phoneboy.org
    Unless otherwise noted, views expressed are my own

  10. #10
    Join Date
    2012-08-06
    Posts
    63
    Rep Power
    9

    Default Re: Checkpoint VRRP failover issues

    Okidoki,

    I have now made headway.
    First some VLAN config on the switches was wrong. Damn, who made those? ;-)

    But it was only marginally wrong like:
    VLAN a -> port x untagged (oups)
    VLAN a -> port y tagged
    VLAN b -> port y tagged
    instead of only
    VLAN a -> port y tagged
    VLAN b -> port y tagged

    Still I can't filter the vrrp packets using any tcpdump filter (be it 'vrrp' or 'ip proto 112') correctly, but the packets seem to be there (outgoing and incoming)

    Code:
    ]# tcpdump -s 9999 -i eth2 | grep -i vrrp
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth2, link-type EN10MB (Ethernet), capture size 9999 bytes
    17:52:37.824995  O vlan 56, p 0, IP 172.31.x > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 253, authtype none, intvl 1s, length 20
    17:52:37.825001  O vlan 55, p 0, IP 172.16.y > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 253, authtype none, intvl 1s, length 20
    17:52:37.825006  O vlan 14, p 0, IP 192.168.a > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 253, authtype none, intvl 1s, length 20
    17:52:37.825012  O vlan 37, p 0, IP 192.168.b > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 253, authtype none, intvl 1s, length 20
    \o/

    Maybe now the more correct switch config will pose less problems but I still doubt that because I wouldn't know what that has to do with large NFS transfers (lots of small files).

  11. #11
    Join Date
    2005-08-14
    Location
    Gig Harbor, WA, USA
    Posts
    2,499
    Rep Power
    18

    Default Re: Checkpoint VRRP failover issues

    Lots of small files may mean lots of small packets.
    Which is considered the "worst case" type of packet in terms of performance.
    And if you're getting a lot of them, you could be running into some sort of performance limitation either on the firewalls themselves or in the networking gear.
    In terms of the IP290, I would be monitoring for interface errors and the like.

    The fact you're not seeing VRRP packets with tcpdump may be an artifact of SecureXL being on (and a bug).
    http://phoneboy.org
    Unless otherwise noted, views expressed are my own

  12. #12
    Join Date
    2012-08-06
    Posts
    63
    Rep Power
    9

    Default Re: Checkpoint VRRP failover issues

    There simply seems to be a problem with tcpdump capturing tagged vrrp packets:

    Code:
    # tcpdump -s 9999 -i eth2c1 'vrrp'
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth2c1, link-type EN10MB (Ethernet), capture size 9999 bytes
    ^C
    0 packets captured
    278 packets received by filter
    Code:
    # tcpdump -s 9999 -i eth2c1 'vlan and vrrp'
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth2c1, link-type EN10MB (Ethernet), capture size 9999 bytes
    17:42:30.851982  I vlan 49, p 6, IP 172.xxx > 224.0.0.18: VRRPv2, Advertisement, vrid 49, prio 255, authtype none, intvl 1s, length 20
    17:42:31.280764  O vlan 56, p 0, IP 172.yyy > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 253, authtype none, intvl 1s, length 20
    ....
    ^C
    28 packets captured
    42 packets received by filter
    0 packets dropped by kernel
    Also, upgrading the IPs doesn't seem like a bad idea because at least the support for R71 that we have installed seems to run out very soon now.
    At the same time we can probably upgrade to the latest IPSO release, even if support for all builds is still active.

  13. #13
    Join Date
    2005-08-14
    Location
    Gig Harbor, WA, USA
    Posts
    2,499
    Rep Power
    18

    Default Re: Checkpoint VRRP failover issues

    Quote Originally Posted by jeronimo View Post
    There simply seems to be a problem with tcpdump capturing tagged vrrp packets:
    VLAN + VRRP, guess that requires a different tcpdump filter string :)
    Generally speaking, you should be using either R75.47 or R77(.10) at this point.
    http://phoneboy.org
    Unless otherwise noted, views expressed are my own

  14. #14
    Join Date
    2012-08-06
    Posts
    63
    Rep Power
    9

    Default Re: Checkpoint VRRP failover issues

    Quote Originally Posted by PhoneBoy View Post
    VLAN + VRRP, guess that requires a different tcpdump filter string :)
    The funny thing is, doesn't matter if I capture on the tagged (ethXcY) or parent interface (ethX), I always need to specify 'vlan' and the packets always look as if they are tagged, and even when sniffing on ethXcY I get packets from all the VLANs whereas I'd expect sniffing ethXcY would show packets from that VLAN untagged to the sniffer. I hope that sentence wasn't too long :D

  15. #15
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,668
    Rep Power
    11

    Default Re: Checkpoint VRRP failover issues

    Quote Originally Posted by jeronimo View Post
    The funny thing is, doesn't matter if I capture on the tagged (ethXcY) or parent interface (ethX), I always need to specify 'vlan' and the packets always look as if they are tagged, and even when sniffing on ethXcY I get packets from all the VLANs whereas I'd expect sniffing ethXcY would show packets from that VLAN untagged to the sniffer. I hope that sentence wasn't too long :D
    yup, i don't know if that is a freebsdism or an ipsoism, but that is the way its always been. It must have to do with where tcpdump lives in the packet chain. Like it alwasy sits infront of the phsyical interface even when running on a virtual interface (-ni eth0c3 as an example). From what i've seen splat and gaia do what you would expect.

    btw you can also give vlan the vlan number

    vlan 600 and net 224.0.0.0/8
    as an exmaple

Similar Threads

  1. monitoring vrrp failover
    By achillesxv in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 1
    Last Post: 2010-09-04, 20:53
  2. VRRP Failover, Lost Sessions
    By ziggy9mm in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 3
    Last Post: 2009-10-22, 12:16
  3. VRRP FailOver is Not Working
    By robori in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 1
    Last Post: 2006-12-26, 13:26
  4. Manually failover VRRP via CLI
    By klouse in forum Check Point IP Appliances and IPSO (Formerly Sold By Nokia)
    Replies: 4
    Last Post: 2006-06-08, 10:02
  5. VRRP based failover not working
    By dnolan in forum Check Point IP Appliances and IPSO (Formerly Sold By Nokia)
    Replies: 5
    Last Post: 2005-11-24, 09:03

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •