| CPUG | |
| The Check Point User Group | |
| A Resource For The Check Point Community. Fast. Useful. Independent. | |
|
| |||||||
![]() |
| | LinkBack | Thread Tools | Display Modes |
| |||
| Hi all We've some stange behavior with enabled OSPF on our Standby-Cluster. There's always an interrupt of 10 seconds when we install a new policy. We observed the following: - there's no interrupt if the policy hasn't changed ( install the policy twice, the socond installation is interruptless) - if you stop the standby-node prior to the policy installation there's also no interrupt (if you start the standby-node afterwards it will fetch the policy automatically) - the ospf-status goes to EXSTART during the Policy-installation This seems to be a problem between the gated-daemon and the clustering or the kernel. (perhaps a design problem) We sent several debug-logs to checkpoint without success so far. Does anybody made the same experiences? Regards, Manuel |
| |||
| Hello, Yes I have almost exactly same situation.... but in our environment (migration from NG AI/IPSO to NGX HA/SPLAT) this interrupt (because of OSPF restart) also occurs with unchanged policy. We have a case in CP support for this. From the cisco OSPF router side it looks that router receives OSPF hello without his router-id on the neighbor list in the hello packet -> he drops SPLAT neighbor to EXSTART state or cisco OSPF router receives DBD packet with bad sequence number -> he drops SPLAT neighbor to INIT state. I have one more strange (imho strange:) -> cisco OSPF hellos are sending by cisco and SPLAT receives them in correct hello time period - 10 sec. (periods between Cisco TX and SPLAT RX are about 0,002 sec in logs). SPLAT OSPF hellos are sending in different periods (ie. 4, 12, 17 sec) and cisco receives them after periods of time beetwen 1-7 sec. Where are SPLAT OSPF hellos in this time? For me it's a clusterxl sync related problem (clusterxl running with only one gateway doesn't have such problems). Regards |
| |||
| Hello Dominik CP is about to make a patch to fix this behavior. They said that we'll get it right after their tests (should be this week). I let you know if this really fix the problem.. Regards, Manuel __________________ To know recursion, you must first know recursion-1 |
| |||
| UPDATE! The policy-problem was really fixed with this specially created patch. Well, "fixed" is relative. The policy installation works without losing the adjacency IF the first cluster member is the active member. Otherwise a fail-over will ocur and then it looses the adjacency and we're having a ~2min. interrupt.
We now really reached a point where we should think about other firewall-vendors. The support is absolutely the worst part of checkpoint! Best wishes, Manuel __________________ To know recursion, you must first know recursion-1 |
| |||
| Yepp support is so-so. Also had couple of issues with lost debug data, asking for files attached from the first time etc.... shame especially for those paying big figures for support. |
| |||
| Have you talked to your Check Point reseller or SE if you know him? The SE's can sometimes get stuck tickets moving again. |
| |||
| This seem to work. Our reseller sent us a mail that CP is about to select a team of engineers that will do further investigation and decide if there is a solution for our problem. Well after all, I don't think that they will fix it. But at least they now confirmed that there is bug and another customer has a similar problem. Which is a little victory, because CP stated a few times that the problem is related to our equipment/configuration/moon phase or something like that, but they never accepted that ther is a bug.. yeah.. so we're not crazy.. ;-) Best wishes, Manuel __________________ To know recursion, you must first know recursion-1 |
| |||
| We vs. Interrupts, Round 27: During the past weeks, we had to send various data to CP. Which they also read and responded to (!). I think directly communicationg with an account manager at Checkpoint speeded up the whole process.
We'll have our monthly maintainance time window on saturday. I think we'll try this stuff then. And then let's go to round 28.. (maybe... probably....) Regards, Manuel __________________ To know recursion, you must first know recursion-1 |
| |||
| Dear Mary, Dear Audience, We discovered multiple issues:
Sidenote: We started with one problem, found 3 more and have now solved 2/4 problems (in a year !! ). (I really won't think about interpolating this to a lifecycle of 3 years... ) We should start to sell our troublesooting/bugtracking work to CP. This product simply does not work as specified. (Have you ever bought a car, which motor died for some seconds when you shifted to another gear? I mean, policy installation is essential in case of a firewall, or failover-behavior in case of a cluster. How the heck does CP test their products??) Best wishes, Manuel fwha_freeze_state_machine_timeout State synchronization during policy installation may, in certain cases, cause a cluster member to initiate a failover. To prevent this situation, you can modify the security gateway global parameter fwha_freeze_state_machine_timeout. This parameter sets the number of seconds, during policy installation, in which no state synchronization will be performed. You should set this parameter to the shortest period needed to eliminate the issue; the recommended value is 30 seconds. This parameter is not related to the synchronization mechanism in any way. It is related to what Check Point calls the "state machine". The "state machine" is responsible for determining the state of each machine, i.e. if the machine is active/standby/down. When the state of the machine is changed, failover results. During install policy, there are cases, in which, the state is changed, and consequently an unwanted failover may occur. Correctly setting fwha_freeze_state_machine_timeout should prevent the unwanted failover. Correctly setting fwha_freeze_state_machine_timeout should also prevent unwanted failovers in 3rd party environments, especially in cases in which the 3rd party environment may bring the cluster down, during policy installation. In 3rd party environments, the state of the cluster member is determined by the 3rd party environment. Whereas, in ClusterXL, the state of the cluster member is determined by the ClusterXL state machine code, which may cause unwanted failovers during policy installation. __________________ To know recursion, you must first know recursion-1 |
| |||
| Dear Audience, Failover causes a loss of connectivity between 5 and 60 seconds [solved] GateD sends not all routes to all OSPF-Neighbours [solved] One or two weeks ago CP was able to reproduce and locate all of our unresolved problems and fixed them by creating another new version of the GateD-Daemon. This fixed Version now causes an interrupt of 5-6 seconds after 40s after a fail-over ocurred. This behavior is very stable, even if we deactivate an interface or use the command "clusterXL_admin up/down" to trigger the failover. All OSPF-Neighbours now have all routes. The FW is now also stable in this behavior. (Today, 3 days after the fix, all routes are still on all routers) The main question now is "When does CP integrate all the fixes into any official HFA's?". Unless they didn't integrate them we're forced to stay on R60. I think we will close the support case, so this will be the last post concerning these issues. Feel free to ask questions, I will visit the forum at times. May the might be with you. Manuel __________________ To know recursion, you must first know recursion-1 |
| |||
| It depends on how many people it affects and when in the HFA process the fix is developed. When being rolled up, the QA cycle is a lot longer than just the point fix, but normally it will show up in one of the next two HFA's. As far as showing up in other builds (R65 say) that can take a while unless there are people asking for fix. |
| |||
| Since I may be using a SPLAT Pro Cluster + OSPF in the very near future I would like to know the answer to that question myself. How many hotfixes did you have to apply to correct your problems? Was there indication that the patch would be made publicly available? |
![]() |
| Thread Tools | |
| Display Modes | |
| |