CPUG

The Check Point User Group

A Resource For The Check Point Community.  Fast.  Useful.  Independent.

1. Come to CPUG CON 2008 EUROPE in Switzerland on September 8th - 9th!
    Two days full of technical content for Check Point administrators in the beautiful Swiss Alps!
    We already have sign-ups from twelve different countries!
2. CCSA/CCSE One-Week Dual-Certification Training Course with CPUG in San Francisco!
    Courses Starting 7/14, 8/25, 10/6, 11/3, 12/8, (2009) 1/19, 2/9, 3/9, 4/6, 5/4, 6/8, 7/6, 8/3, 9/7.
3. Corrent S3500 SecureXL Turbocards For Sale - Last Six Remaining - Get Your Spares!
4. Join Us On LinkedIn - We now have a CPUG group.


Go Back   CPUG: The Check Point User Group > Check Point Firewall-1/VPN-1 And Related Products > Dynamic Routing
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 2006-04-24
baboo baboo is offline
Junior Member
 
Join Date: 2006-02-20
Location: Switzerland, Burgdorf
Posts: 22
Rep Power: 0
baboo has an average reputation (10+)
Default SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Hi all

We've some stange behavior with enabled OSPF on our Standby-Cluster.
There's always an interrupt of 10 seconds when we install a new policy.

We observed the following:
- there's no interrupt if the policy hasn't changed ( install the policy twice, the socond installation is interruptless)
- if you stop the standby-node prior to the policy installation there's also no interrupt (if you start the standby-node afterwards it will fetch the policy automatically)
- the ospf-status goes to EXSTART during the Policy-installation

This seems to be a problem between the gated-daemon and the clustering or the kernel. (perhaps a design problem)

We sent several debug-logs to checkpoint without success so far.

Does anybody made the same experiences?

Regards, Manuel
Reply With Quote
  #2 (permalink)  
Old 2006-05-19
dominik.latusek dominik.latusek is offline
Junior Member
 
Join Date: 2006-05-19
Posts: 1
Rep Power: 0
dominik.latusek has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Hello,

Yes I have almost exactly same situation.... but in our environment (migration from NG AI/IPSO to NGX HA/SPLAT) this interrupt (because of OSPF restart) also occurs with unchanged policy.
We have a case in CP support for this.

From the cisco OSPF router side it looks that router receives OSPF hello without his router-id on the neighbor list in the hello packet -> he drops SPLAT neighbor to EXSTART state or cisco OSPF router receives DBD packet with bad sequence number -> he drops SPLAT neighbor to INIT state.

I have one more strange (imho strange:) -> cisco OSPF hellos are sending by cisco and SPLAT receives them in correct hello time period - 10 sec. (periods between Cisco TX and SPLAT RX are about 0,002 sec in logs). SPLAT OSPF hellos are sending in different periods (ie. 4, 12, 17 sec) and cisco receives them after periods of time beetwen 1-7 sec.
Where are SPLAT OSPF hellos in this time?

For me it's a clusterxl sync related problem (clusterxl running with only one gateway doesn't have such problems).

Regards
__________________
--

Dominik Latusek
PGP Keyserver: http://pgp.mit.edu
KeyID = 0xDD2F0F58
Reply With Quote
  #3 (permalink)  
Old 2006-06-19
baboo baboo is offline
Junior Member
 
Join Date: 2006-02-20
Location: Switzerland, Burgdorf
Posts: 22
Rep Power: 0
baboo has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Hello Dominik

CP is about to make a patch to fix this behavior. They said that we'll get it right after their tests (should be this week).

I let you know if this really fix the problem..

Regards, Manuel
__________________
To know recursion, you must first know recursion-1
Reply With Quote
  #4 (permalink)  
Old 2006-12-14
baboo baboo is offline
Junior Member
 
Join Date: 2006-02-20
Location: Switzerland, Burgdorf
Posts: 22
Rep Power: 0
baboo has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

UPDATE!

The policy-problem was really fixed with this specially created patch.
Well, "fixed" is relative. The policy installation works without losing the adjacency IF the first cluster member is the active member. Otherwise a fail-over will ocur and then it looses the adjacency and we're having a ~2min. interrupt.
  • The mentioned fix still isn't included in any public available HFA or release.
    --> so we cannot update our FW to any newer release, which really pi** me off!

  • A case with the fail-over is still hanging @Checkpoint since more than a half year!!
    We had to give them very much debug-data (they lost some of our debug data, so we had it to give a second time.. ) but they still let us in darkness about their investigation.

We now really reached a point where we should think about other firewall-vendors. The support is absolutely the worst part of checkpoint!

Best wishes,
Manuel
__________________
To know recursion, you must first know recursion-1
Reply With Quote
  #5 (permalink)  
Old 2006-12-14
abusharif abusharif is offline
Senior Member
 
Join Date: 2006-04-27
Location: Twillight zone
Posts: 434
Rep Power: 3
abusharif has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Quote:
Originally Posted by baboo View Post
We had to give them very much debug-data (they lost some of our debug data, so we had it to give a second time.. ) but they still let us in darkness about their investigation.

The support is absolutely the worst part of checkpoint!
Yepp support is so-so. Also had couple of issues with lost debug data, asking for files attached from the first time etc.... shame especially for those paying big figures for support.
Reply With Quote
  #6 (permalink)  
Old 2006-12-15
chillyjim chillyjim is offline
Senior Member
 
Join Date: 2005-08-29
Location: Upstate NY
Posts: 1,603
Rep Power: 4
chillyjim has an average reputation (10+)
Send a message via AIM to chillyjim Send a message via Skype™ to chillyjim
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Quote:
Originally Posted by baboo View Post
We now really reached a point where we should think about other firewall-vendors. The support is absolutely the worst part of checkpoint!
Have you talked to your Check Point reseller or SE if you know him? The SE's can sometimes get stuck tickets moving again.
Reply With Quote
  #7 (permalink)  
Old 2007-01-31
baboo baboo is offline
Junior Member
 
Join Date: 2006-02-20
Location: Switzerland, Burgdorf
Posts: 22
Rep Power: 0
baboo has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

This seem to work.

Our reseller sent us a mail that CP is about to select a team of engineers that will do further investigation and decide if there is a solution for our problem.

Well after all, I don't think that they will fix it.
But at least they now confirmed that there is bug and another customer has a similar problem.

Which is a little victory, because CP stated a few times that the problem is related to our equipment/configuration/moon phase or something like that, but they never accepted that ther is a bug..

yeah.. so we're not crazy.. ;-)

Best wishes,
Manuel
__________________
To know recursion, you must first know recursion-1
Reply With Quote
  #8 (permalink)  
Old 2007-02-22
baboo baboo is offline
Junior Member
 
Join Date: 2006-02-20
Location: Switzerland, Burgdorf
Posts: 22
Rep Power: 0
baboo has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

We vs. Interrupts, Round 27:

During the past weeks, we had to send various data to CP. Which they also read and responded to (!). I think directly communicationg with an account manager at Checkpoint speeded up the whole process.
  • CP breeded a new gated-package. (We don't really know what exactly changed.)
  • They think that changing of the parameter "fwha_freeze_state_machine_timeout" should prevent the failover.
    CP:"When the state of the machine is changed, failover results. During install policy, there are cases, in which, the state is changed, and consequently an unwanted failover may occur. Correctly setting fwha_freeze_state_machine_timeout should prevent the unwanted failover. "

We'll have our monthly maintainance time window on saturday. I think we'll try this stuff then. And then let's go to round 28.. (maybe... probably....)

Regards, Manuel
__________________
To know recursion, you must first know recursion-1
Reply With Quote
  #9 (permalink)  
Old 2007-03-09
GiulianaJane GiulianaJane is offline
Junior Member
 
Join Date: 2006-09-26
Posts: 1
Rep Power: 0
GiulianaJane has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

I have exactly the same problem with a cluster of R60 SPLAT boxes. Did the gated fix work?

Thanks!
Mary
Reply With Quote
  #10 (permalink)  
Old 2007-03-26
baboo baboo is offline
Junior Member
 
Join Date: 2006-02-20
Location: Switzerland, Burgdorf
Posts: 22
Rep Power: 0
baboo has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Dear Mary,
Dear Audience,

We discovered multiple issues:
  • Interrupt while installing the Policy [solved]:
    Checkpoint created a Hotfix (Hotfix 603) which solved this problem. CP told us to integrate this hotfix soonest in HFA05. HFA05 arrived on 22nd February. But according to the release notes it wasn't integrated. We're looking forward to HFA06 and hope they will integrate this into any further version.
  • Failover when the Standby-Node is active [solved]:
    This may be rather a ClusterXL-specific problem. We're using the "Maintain current Active Gateway"-Option to prevent that the FW to changes back when a failover ocurred and the node with the higher priority gets available. (see ClusterXL Userguide p. 65). Finally we set the parameter "fwha_freeze_state_machine_timeout" (read more below) to 60 seconds, which solved this problem.
  • Failover causes a loss of connectivity between 5 and 60 seconds [unsolved].
    First, there are "2 types" of failovers. Earlier we performed a failover by shutting down a interface on one node. With this method a loss of connectivity of 50-60 seconds occurred immediately. CP told us then to use the command "clusterXL_admin up/down" to trigger the failover. With this method the connection is not interruptet for 40 seconds and then 5 pings are lost. (voodoo?).
    You can decide by yourself which method reflects the nature of a real failover better. (We're using VOIP through this FW, so 5 seconds is very an upper limit)
    How can "High Availability" be interpreted in seconds??
  • GateD sends not all routes to all OSPF-Neighbours [unsolved]:
    Our GateD-Update is another tragic episode. First CP told us to update the CPadvr-R60**.rpm using rpm -Uhv.. but some part of the post-installation-script crashed with a segfault. Then CP responded that we have to erase the old package first and THEN install the new rpm (-ihv). But this didn't work too (the same post-installation-script segfaulted). So we extracted all files within the rpm to a temporary directory and compared the md5sums of all files. Only the binary of the GateD-Daemon himself changed. So we updated just this file.
    cpstart started the new gated without problem and our tests showed that all OSPF-Neighbours now have all routes. A day later (my headache caused by the one-more-cp-problem-is-solved-champagne is almost gone) I discovered that the routes are now gone.. I'm now investigating why the routes have been disappeared.

Sidenote:
We started with one problem, found 3 more and have now solved 2/4 problems (in a year !! ). (I really won't think about interpolating this to a lifecycle of 3 years... )
We should start to sell our troublesooting/bugtracking work to CP.
This product simply does not work as specified. (Have you ever bought a car, which motor died for some seconds when you shifted to another gear? I mean, policy installation is essential in case of a firewall, or failover-behavior in case of a cluster. How the heck does CP test their products??)


Best wishes,
Manuel

fwha_freeze_state_machine_timeout
State synchronization during policy installation may, in certain cases, cause a cluster member to initiate a failover. To prevent this situation, you can modify the security gateway global parameter fwha_freeze_state_machine_timeout. This parameter sets the number of seconds, during policy installation, in which no state synchronization will be performed. You should set this parameter to the shortest period needed to eliminate the issue; the recommended value is 30 seconds.

This parameter is not related to the synchronization mechanism in any way. It is related to what Check Point calls the "state machine". The "state machine" is responsible for determining the state of each machine, i.e. if the machine is active/standby/down. When the state of the machine is changed, failover results. During install policy, there are cases, in which, the state is changed, and consequently an unwanted failover may occur. Correctly setting fwha_freeze_state_machine_timeout should prevent the unwanted failover.

Correctly setting fwha_freeze_state_machine_timeout should also prevent unwanted failovers in 3rd party environments, especially in cases in which the 3rd party environment may bring the cluster down, during policy installation. In 3rd party environments, the state of the cluster member is determined by the 3rd party environment. Whereas, in ClusterXL, the state of the cluster member is determined by the ClusterXL state machine code, which may cause unwanted failovers during policy installation.
__________________
To know recursion, you must first know recursion-1
Reply With Quote
  #11 (permalink)  
Old 2007-04-24
baboo baboo is offline
Junior Member
 
Join Date: 2006-02-20
Location: Switzerland, Burgdorf
Posts: 22
Rep Power: 0
baboo has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Dear Audience,

Failover causes a loss of connectivity between 5 and 60 seconds [solved]
GateD sends not all routes to all OSPF-Neighbours [solved]

One or two weeks ago CP was able to reproduce and locate all of our unresolved problems and fixed them by creating another new version of the GateD-Daemon.
This fixed Version now causes an interrupt of 5-6 seconds after 40s after a fail-over ocurred. This behavior is very stable, even if we deactivate an interface or use the command "clusterXL_admin up/down" to trigger the failover.

All OSPF-Neighbours now have all routes. The FW is now also stable in this behavior. (Today, 3 days after the fix, all routes are still on all routers)

The main question now is "When does CP integrate all the fixes into any official HFA's?". Unless they didn't integrate them we're forced to stay on R60.

I think we will close the support case, so this will be the last post concerning these issues. Feel free to ask questions, I will visit the forum at times.

May the might be with you.
Manuel
__________________
To know recursion, you must first know recursion-1
Reply With Quote
  #12 (permalink)  
Old 2007-04-24
chillyjim chillyjim is offline
Senior Member
 
Join Date: 2005-08-29
Location: Upstate NY
Posts: 1,603
Rep Power: 4
chillyjim has an average reputation (10+)
Send a message via AIM to chillyjim Send a message via Skype™ to chillyjim
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Quote:
Originally Posted by baboo View Post
The main question now is "When does CP integrate all the fixes into any official HFA's?". Unless they didn't integrate them we're forced to stay on R60.
It depends on how many people it affects and when in the HFA process the fix is developed. When being rolled up, the QA cycle is a lot longer than just the point fix, but normally it will show up in one of the next two HFA's. As far as showing up in other builds (R65 say) that can take a while unless there are people asking for fix.
Reply With Quote
  #13 (permalink)  
Old 2007-04-24
melipla melipla is offline
Senior Member
 
Join Date: 2006-01-25
Posts: 724
Rep Power: 3
melipla has an average reputation (10+)
Default Re: SPLAT PRO, OSPF and Cluster => Interrupt while Policy inst.

Quote:
Originally Posted by baboo View Post
The main question now is "When does CP integrate all the fixes into any official HFA's?". Unless they didn't integrate them we're forced to stay on R60.
Since I may be using a SPLAT Pro Cluster + OSPF in the very near future I would like to know the answer to that question myself. How many hotfixes did you have to apply to correct your problems? Was there indication that the patch would be made publicly available?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -7. The time now is 21:40.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.0.0