CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


First, I hope you're all well and staying safe.
Second, I want to give a "heads up" that you should see more activity here shortly, and maybe a few cosmetic changes.
I'll post more details to the "Announcements" forum soon, so be on the lookout. -E

 

Results 1 to 14 of 14

Thread: R70.3 Failover during policy install

  1. #1
    Join Date
    2006-05-20
    Posts
    52
    Rep Power
    18

    Default R70.3 Failover during policy install

    We recently upgraded one of our firewall clusters from R65 to R70.3.

    The two firewalls in the cluster are named FW3 and FW4. We've found that when we push policy, fw3 will always become the active member unless it is down for some other reason. Or, to phrase it another way, when FW4 is active and we push policy, we have a failover event back to FW3.

    This is less than ideal, as a failover does have a 1-2 second interruption and we run a rather high traffic audio response system.

    In SmartDashboard Cluster Properties "Cluster Members", fw3 shows a higher priority than fw4. In the ClusterXL tab, "Upon Cluster Member recovery" we have "Maintain current active Cluster Member".

    In R65 a policy install wouldn't cause a failover. Has anybody else seen this behavior in R70.3?

  2. #2
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,252
    Rep Power
    18

    Default Re: R70.3 Failover during policy install

    Try filtering for control/wrench events in the Type column of SmartView Tracker and look around during the time of your last policy push; these events should show which critical HA device "failed" during the policy install causing the failover. You can see the list of all critical devices and their current states by running a cphaprob -i list on the gateway.

    Let us know what you find there and we can troubleshoot further. I certainly hope it wasn't a "filter" (no security policy loaded) failure.

  3. #3
    Join Date
    2009-04-14
    Location
    Ohio
    Posts
    405
    Rep Power
    15

    Default Re: R70.3 Failover during policy install

    If your cluster is really busy, CPU usage is high, or your policy is large, during a policy install there may be a brief timeout between the cluster members and they get out of sync and change state.

    To fix this, you can have the firewalls freeze their current state (active or standby) during a policy push by running the following command on each firewall gateway:

    fw ctl set int fwha_freeze_state_machine_timeout 30

    This will freeze the 'state' of the firewall for up to 30 seconds. To have this setting survive a reboot, edit $FWDIR/boot/modules/fwkern.conf and add the following line:
    fwha_freeze_state_machine_timeout=30

    I have this on one of my really busy firewall clusters and it works great. It does not affect a normal failover situation like an interface being down, only during a policy install.

  4. #4
    Join Date
    2006-05-20
    Posts
    52
    Rep Power
    18

    Default Re: R70.3 Failover during policy install

    Thanks, bmolnar. This is what checkpoint support suggested as well, and it seems like a good fix. We'll try this during our next scheduled maintenance.

  5. #5
    Join Date
    2008-12-16
    Posts
    4
    Rep Power
    0

    Default Re: R70.3 Failover during policy install

    Yesterday we experienced the exact same failover during a policy install. (HA pair running R70.3) However it was on our lab units that run at 1% utilization and have a tiny (100 line) rule base. We had failed over to the backup unit to test a NAT issue and just like this post stated, a policy push failed it back to the A unit. (Shocked us). Sounds more like a bug then a resource overrun issue. I wonder if R71 has this same "feature"?

  6. #6
    Join Date
    2005-08-29
    Location
    Upstate NY
    Posts
    2,720
    Rep Power
    21

    Default Re: R70.3 Failover during policy install

    There is a bunch of things that will cause this. I usually see it with convoluted OSPF routing tables. I have one customer that it happens in one data center but not the other with the same gateways, management and rule base. Only thing different is the routing.

  7. #7
    Join Date
    2008-12-16
    Posts
    4
    Rep Power
    0

    Default Re: R70.3 Failover during policy install

    No Dynamic routing configured on this set. Static routes only on the lab HA pair.

  8. #8
    Join Date
    2005-08-29
    Location
    Upstate NY
    Posts
    2,720
    Rep Power
    21

    Default Re: R70.3 Failover during policy install

    OK are you actively trying to be my new "I can break anything" customer (I'm assuming you are the same mcatkinson I'm thinking of).

    Do try the freeze_state and see if that fixes it. Let me know either way.

  9. #9
    Join Date
    2006-05-20
    Posts
    52
    Rep Power
    18

    Default Re: R70.3 Failover during policy install

    The freeze_state fixed it on both of our clusters, each which have a very large rulebase.

  10. #10
    Join Date
    2009-04-14
    Location
    Ohio
    Posts
    405
    Rep Power
    15

    Default Re: R70.3 Failover during policy install

    Quote Originally Posted by fdamstra View Post
    The freeze_state fixed it on both of our clusters, each which have a very large rulebase.
    Glad that worked for you and thanks for letting us know

  11. #11
    Join Date
    2006-12-04
    Posts
    1,316
    Rep Power
    19

    Default Re: R70.3 Failover during policy install

    Quote Originally Posted by mcatkinson View Post
    Sounds more like a bug then a resource overrun issue. I wonder if R71 has this same "feature"?
    This feature was already implemented with R60 (according to CP SK) >http://www.cpug.org/forums/clusterin...html#post47359

    And it will be a standart CP FW feature for every new main versions...

    Solution ID: sk32488
    Version: R70, R70.1, R70.20, NGX R60, NGX R61, NGX R62, NGX R65
    Last edited by serlud; 2010-09-01 at 06:51.

  12. #12
    Join Date
    2006-02-09
    Location
    Charleston, SC
    Posts
    1,172
    Rep Power
    20

    Default Re: R70.3 Failover during policy install

    Well, this sucks! I have just experienced this issue myself with a new site that went live yesterday.

    Regardless of which cluster member is listed as the top priority in the cluster properties, fw1 always takes over the cluster on policy push. What's the point of being able to change the priorities in Dashboard if it's just going to ignore it and fail over anyway?

    I shouldn't have to put in extra parameters to prevent this from occurring, it shouldn't be happening at all. Jim, do you know if someone at CP is already looking into why this occurs?
    There's no place like 127.0.0.1

  13. #13
    Join Date
    2006-12-04
    Posts
    1,316
    Rep Power
    19

    Default Re: R70.3 Failover during policy install

    Do you really thing that CP care about this issue or others limitations BUGs (example > fwkern.conf will be empty..)?

    They (CP) have produce an SK and also write this limitation=feature=Bug in CP Official Docs that is enough..

    Of course customers can write an RFE and wait again 5 years till this will be implimented.

    But with every new reliase CP will produce=introduce new limitation=feature=Bug just to keep this process working... (example R71 - Case sense do not work any more....)

    PS : we have this bug only since R65 just due to using only Open Server Platform for all our GWs, CP have implimented this bug in R60 (2005) = same time as new (first) CP UTM-1 Appliaces were produced...

    Bug history >
    Release Notes for Check Point NGX (R60). Last Update — May 23, 2005 page 80

    State synchronization during policy installation may in certain cases cause a cluster
    member to initiate a failover. To prevent this situation, modify the enforcement module
    global parameter fwha_freeze_state_machine_timeout. This parameter sets the number
    of seconds during policy installation in which no state synchronization will be
    performed. Set this parameter to the shortest period which eliminates the issue; the
    recommended value is 30 seconds.


    Enterprise Suite NGX R61 Known Limitations Supplement Last Update — February 7, 2007 page 35

    State synchronization during policy installation may in certain cases cause a cluster
    member to initiate a failover. To prevent this situation, modify the enforcement
    module global parameter fwha_freeze_state_machine_timeout. This parameter sets
    the number of seconds during policy installation in which no state synchronization
    will be performed. Set this parameter to the shortest period which eliminates the
    issue; the recommended value is 30 seconds.


    Till end of life....
    Last edited by serlud; 2010-09-02 at 05:17.

  14. #14
    Join Date
    2005-08-29
    Location
    Upstate NY
    Posts
    2,720
    Rep Power
    21

    Default Re: R70.3 Failover during policy install

    Quote Originally Posted by lammbo View Post
    I shouldn't have to put in extra parameters to prevent this from occurring, it shouldn't be happening at all. Jim, do you know if someone at CP is already looking into why this occurs?
    Not that I know of. Everyone I have encountered with this has taken either the freeze_state route or just accepted that if you push policy, the cluster's state will reset.

    That all said, I only have visibility into my RFEs and none at all into bug reports.
    Please let your SE know your feelings on this and they can let the PTB know.

    Just because an SK/Limitation is published doesn't mean we will not "fix" it. I have a pretty crazy one now (build a cluster this way, stand on head, push policy with your left hand...) with a really simple work around, but the enduser want's a fix, so a fix he will get. You do have to be a little insistent sometimes, and that is where your SE comes in.

Similar Threads

  1. Differences between policy install, database install etc
    By vbavbalist in forum SmartDashboard
    Replies: 3
    Last Post: 2010-05-28, 10:43
  2. R65 SPlat - policy install results in failover (FIB problem)
    By ChrisA in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 12
    Last Post: 2009-08-20, 02:08
  3. Policy install fails for security policy with more than 4096 NAT rules
    By cciesec2006 in forum NAT (Network Address Translation)
    Replies: 8
    Last Post: 2009-06-07, 09:41
  4. What happens when you install a policy?
    By menz456 in forum SmartDashboard
    Replies: 3
    Last Post: 2008-10-25, 11:37
  5. Policy Editor Locks Up on Save or Policy Install
    By roadrunner in forum SmartDashboard
    Replies: 0
    Last Post: 2005-08-14, 12:03

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •