I have noticed that when ever i add a new rule and install the policy on cluster XL nettwork goes down for about 10-12 seconds. Has anybody any idea about what could be the problem
CPUG: The Check Point User Group | |
Resources for the Check Point Community, by the Check Point Community.
| |
First, I hope you're all well and staying safe. | |
|
I have noticed that when ever i add a new rule and install the policy on cluster XL nettwork goes down for about 10-12 seconds. Has anybody any idea about what could be the problem
Sounds like the spanning tree protocol on the switch the firewall is attached to thinks it is detecting a possible bridging loop and blocking traffic in listening/learning mode, 10-12 seconds sounds about right. You'll need to set portfast on the switch ports attached to the firewall.
Also you may want to check if a failover occurred during the policy push, you can check this from the SmartView Tracker, in the All Records view filter for Control/wrench events (it is the very skinny column to the right of of the Origin column). If a failover is occurring that could explain the outage as well, you'll need to freeze the cluster state during policy install to keep that from happening. See sk32488.
But no switch over happens during policy install. i have checked the logs for control events. there has never happened any switchover when we install policy after changing some rules.
But i think as you have mentioned it could be spanning tree problem. i will put the links as port fast and will check if that helps.
thanks for the reply.
It is unlikely that spanning-tree is causing the outage if there is not a cluster transition but portfast is definitely worth a try first.
Next culprit could be an overloaded CPU on the active member during the push. When installing policy you should see "Verifying" then "Verifying & Installing" then "Installing". I assume the 10-12 second outage occurs during the "Installing" phase? Try running a top on the active gateway during a policy push and see if your outage corresponds with the CPU getting pegged at 100% in sy and/or si space. If that is the case you can try modifying the Connection Persistence setting on the cluster object from "Rematch Connections" to "Keep all connections" which will significantly reduce CPU load; push policy twice and see if the outage is reduced or abated on the second policy install. Worse yet would be seeing the "wa/wio" utilization in top spiking which could indicate a shortage of RAM memory and causing the need for paging to virtual disk memory during policy installation which will absolutely butcher performance.
If your outage is still occurring during policy load after all the above the last-ditch thing to try is powering off the standby member and installing policy with the For Gateway clusters install on all the members, if it fails do not install at all UNchecked on the policy install screen. If the outage goes away that would generally indicate some kind of bad interaction between the cluster and your network; could be ARP, could be multicast-related, could be lots of different things at Layer 2 or 3.
Last edited by ShadowPeak.com; 2014-10-26 at 10:41.
I have tried by defineing the switch port as portfast and still got the network outage. The amazing thing is that network outage happens when i get the screen for policy install successfull, means when the policy install is complete and finished then network outage happens for 10-12 seconds, it does not happen during the verification or installation of policy.
check the /var/log/messages file for something like CUL_Freeze messages that correspond to the same time. I have this exact problem with an undersized cluster
Hi,
Do you have a solution for this already?
I've got the same problem at one of our customer's cluster but only with VPN traffic.
I am awaiting a special and improved hotfix from Check Point now regarding this issue. Maybe I can help you afterwards if the issue still persists on your site.
best regards,
Werner
If your VPN tunnels are hanging upon policy push, it may be due to all IKE Phase 1 tunnels getting reset by default during the policy load process. Under Policy...Global Properties...SmartDashboard Customization...Advanced Config...Configure...VPN Advanced Propertries...VPN IKE Properties try checking "keep_IKE_SAs" which will override this behavior and see if it helps.
That was also the first idea of the Check Point engineer. This option was actually set (maybe even by default?)
But after doing a lot of debugs, Check Point has found the root cause why there are still drops. It has something to do with Link Selection when there are a lot of vpn tunnels terminating at the Gateway/Cluster. There is a fix for this mentioned in sk55244. This fix caused some problems with 3rd party vpn tunnels so be carefull when using it.
I am still interested in the status of the issue mentioned in the first post anyway..
Does the same drop happen when the standby gateway is made the active? If not, it could just be your primary gateway that has the issue.
I had this issue with a standalone Gaia gateway. I rebuilt it in the end and it was fine after that...
Bookmarks