CPUG

The Check Point User Group

A Resource For The Check Point Community.  Fast.  Useful.  Independent.

1. CCSA/CCSE One-Week Dual-Certification Training Course with CPUG in San Francisco!
    Courses Starting (2009) 1/19, 2/9, 3/9, 4/6, 5/4, 6/8, 7/6, 8/3.
2. Join Us On LinkedIn - We now have a CPUG group.


Go Back   CPUG: The Check Point User Group > Check Point Firewall-1/VPN-1 Platforms > Check Point SecurePlatform (SPLAT)
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 2008-08-02
Senior Member
 
Join Date: 2006-09-26
Posts: 856
Rep Power: 3
cciesec2006 has an average reputation (10+)
Default SecurePlatform 2.4 kernel in Active/Active mode

I have a pair of Secureplatform NGx R65 with HFA_02 and hf_49 running
in Active/Active Unicast mode mode. The policy on the gateways is
"Any Any Accept log".

This is what I am seeing from gateway #1:

[Expert@NGx-gw1]# cphaprob state

Cluster Mode: Load Sharing (Unicast)

Number Unique Address Assigned Load State

1 (local) 192.2.0.1 30% Active (pivot)
2 192.2.0.2 70% Active

[Expert@NGx-gw1]#

This is what I am seeing from gateway #2:

[Expert@NGx-gw2]# cphaprob state

Cluster Mode: Load Sharing (Unicast)

Number Unique Address Assigned Load State

1 192.2.0.1 0% Active (pivot)
2 (local) 192.2.0.2 100% Active

[Expert@NGx-gw2]#


Gateway #1 shows up with the correct load but gateway #2 is not.
Why?

If I reboot gateway #2, it will come back with with the right 30%/70%
load. However, if I reboot gateway #1, gateway #2 will come back
with 0%/100% with gateway #2 taking 100% of the load.

Anyone seen this before? Thanks.
Reply With Quote
  #2 (permalink)  
Old 2008-08-02
Senior Member
 
Join Date: 2007-07-16
Posts: 689
Rep Power: 2
Thorpuse has an average reputation (10+)
Default Re: SecurePlatform 2.4 kernel in Active/Active mode

sounds like one of the gateways isn't getting the right CCP or sync information. Check the interface statuses (cphaprob -a if and cphaprob -i list) and look for errors reported here. Check that both boxes are SPLAT and not SPLAT Pro (or vice versa. Also, change your Cluster control protocol to broadcast from multicast and see if that helps (particularly if you're using Cisco or cheap switches).

Finally, are there any other NGX clusters on the same segment as this one? In rare cases, the cluster IDs for the broadcast traffic can interfere with each other.
Reply With Quote
  #3 (permalink)  
Old 2008-08-02
Senior Member
 
Join Date: 2006-09-26
Posts: 856
Rep Power: 3
cciesec2006 has an average reputation (10+)
Default Re: SecurePlatform 2.4 kernel in Active/Active mode

Quote:
Originally Posted by Thorpuse View Post
sounds like one of the gateways isn't getting the right CCP or sync information. Check the interface statuses (cphaprob -a if and cphaprob -i list) and look for errors reported here. Check that both boxes are SPLAT and not SPLAT Pro (or vice versa. Also, change your Cluster control protocol to broadcast from multicast and see if that helps (particularly if you're using Cisco or cheap switches).

Finally, are there any other NGX clusters on the same segment as this one? In rare cases, the cluster IDs for the broadcast traffic can interfere with each other.
- There are NO other clusters in this segment, just this one,

- Both gateways are running Secureplatform, NOT SPLAT Pro

- The gateways are connected to Cisco Catalyst 6513 switches with
MFC-720. Therefore I don't think these are cheap switches,

- I enable "spanningtree portfast" on access ports and "spanningtree
portfast trunk" on trunk ports,

- The ccp is broadcast on both gateways when I have this issues:

[Expert@NGx-gw1]# cphaprob -a if

Required interfaces: 3
Required secured interfaces: 1

eth0 UP non sync(non secured), broadcast
eth2 UP sync(secured), broadcast
eth1 UP non sync(non secured), broadcast (eth1.140 )

Virtual cluster interfaces: 3

eth0 192.168.15.193
eth1.140 192.168.192.1
eth1.150 192.168.193.1

[Expert@NGx-gw1]#
[Expert@NGx-gw2]# cphaprob -a if

Required interfaces: 3
Required secured interfaces: 1

eth0 UP non sync(non secured), broadcast
eth2 UP sync(secured), broadcast
eth1 UP non sync(non secured), broadcast (eth1.140 )

Virtual cluster interfaces: 3

eth0 192.168.15.193
eth1.140 192.168.192.1
eth1.150 192.168.193.1

[Expert@NGx-gw2]#

One other thing I noticed:

1- reboot gw1, gw2 gets all the traffics. That's normal.
2- after gw1 comes back online, gw2, shows 100% load and 0% for gw1. On gw1, it shows 30% for gw1 and 70% for gw2.
But, if I tried to ssh to a hosts behind the firewall, gw1 gest all the traffics, nothing going across gw2
3- after rebooting gw2, everthing comes back normal. If I tried to ssh to a host behind the firewall, I see traffics on both
gw1 and gw2, as confirmed wth tcpdump

weird.

Last edited by cciesec2006; 2008-08-02 at 19:16.
Reply With Quote
  #4 (permalink)  
Old 2008-08-03
Senior Member
 
Join Date: 2007-07-16
Posts: 689
Rep Power: 2
Thorpuse has an average reputation (10+)
Default Re: SecurePlatform 2.4 kernel in Active/Active mode

Cisco 6513.... I think that's where the issue is. I recall some strange things about the way that 6500 switches and ClusterXL go together. Check the arp caches, and I'd also start looking at the CCP broadcast packets themselves - I have some vague memory of seeing something like this where the 6513 either got confused because of the arp entires or blocked the CCP packets on the VLANs.

I guess the other thing to check is that the sync traffic is actually getting between the devices. Test a transparent failover with something that uses a data connection (ftp is good for this...). Test a failover and a failback. Use the fw tab pstat and fw tab -t connections -s commands to see if connections are actually getting synced. Finally, is your sync interface a crossover cable or another switch/VLAN? I'd recommend a test where you plug the Sync network into a hub/switch rather than a crossover, to see if there's an issue with confusion about sync interface failures.

Good luck, from experience I know that CXL can be a real pain to troubleshoot. My final question would be to assess the need for an Active/Active setup. I always worry about this when there's only 2 nodes, because while it buys you performance, it potentially exposes you a non-redundant solution. Assume both boxes get to 60% utilisation, and one dies. 120% of traffic doesn't go through one box well, and you now actually have a single point of failure again when the box trying to manage 120% of traffic falls over....
Reply With Quote
  #5 (permalink)  
Old 2008-08-03
Senior Member
 
Join Date: 2006-09-26
Posts: 856
Rep Power: 3
cciesec2006 has an average reputation (10+)
Default Re: SecurePlatform 2.4 kernel in Active/Active mode

Thank you Thorpuse. Just so you know I have an identical setup
R55 gateways connecting to another 6513 switch and I have no issue
whatsoever. But I took your advice connect the NGx R65 gateways
to a Catalyst 2960. It did not resolve the issue.

For most of the implementation that I've worked with for the past
eight years, we always use X-over cable for sync interface. If you connect
the sync interface into a dedicate hub/VLAN/switch, you introduce another
point of failure, don't you think?

Running Active/Active is not my idea. It was imposed upon me by my boss.
For the past five years, the firewall never exceeds 20% cpu utilization and
the maxium connection never exceeds 11k. But my boss thinks that "hey,
we already purchase clusterXL load-sharing license, let use it."

While I don't like Active/active myself, I do see a benefits with this in
the sense that you know both firewalls are working versus Active/Standby.
When the Active firewalls goes down, you do not know how the standby
firewall will react because it is "standby". There could be hidden problem
with the Standby that we just may not know. This is especially true in a
high throughput/connection environment. I've seen it happened on the
Nokia IP1220 myself. The Active Nokia crashed and failover to the Standby
Nokia via VRRP and the standby firewall crashed as well.
Reply With Quote
  #6 (permalink)  
Old 2008-08-03
Senior Member
 
Join Date: 2007-07-16
Posts: 689
Rep Power: 2
Thorpuse has an average reputation (10+)
Default Re: SecurePlatform 2.4 kernel in Active/Active mode

Certainly sounds like the sync interface connection is the issue then, not the switch... I'd be interested to see if all the interfaces can see each other properly in the 100/0 position. It seems like to me that a NIC isn't coming up correctly or isn't being detected at the right stage in the firewall start process. If you can create the fail condition again, I'd suggest checking connectivity on all interfaces using the member server's IP - I suspect that you'll find one of them won't be able to ping/arp correctly for the other's IP, or even that the interface isn't coming up correctly during the failback.

One other thought - are the member server's IP addresses on the same subnets to the cluster IPs, or different ones? If different, I wonder if the routing to allow the arp publications aren't being missed.

HTH, this problem sounds really familiar to me, but I can't recall exactly where I've seen it before....
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT -7. The time now is 07:22.


Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.2.0