CPUG

The Check Point User Group

A Resource For The Check Point Community.  Fast.  Useful.  Independent.

1. CCSA/CCSE One-Week Dual-Certification Training Course with CPUG in San Francisco!
    Courses Starting 12/8, (2009) 1/19, 2/9, 3/9, 4/6, 5/4, 6/8, 7/6, 8/3.
2. Join Us On LinkedIn - We now have a CPUG group.


Go Back   CPUG: The Check Point User Group > Check Point Firewall-1/VPN-1 Platforms > Crossbeam
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 2008-03-21
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default X80 performance 4 times less than C25!?

We are seeing almost 100% CPU load at peak hour on APM with very little traffic (see summary below). Has anybody else come across such trully weak X80 results? Crossbeam so fas has been unable to reslove the issue:

To summrise:

Traffic profile:

- total throughput 150MB/s
- new connections per second <1000
- accepted packet rate < 27000
- concurrent connections < 60000
- packet size distribution: 2/3 < 199; 1/3 > 1400

According to X80 APM specifications it should be able to handle much more than that (at least double). Same traffic profile causes only 14% load on C25 HW.

Actions taken so far:
- connections table increased from 100'000 to 200'000
- SmartDefense deactivated (made no noticable difference)
- Logging turned off (made no noticable difference)
- XOS upgraded to latest patch 7.2.1.5

Observations:
- majority of CPU time is spent on "softirq" *(> 70%)

Attached is a screenshoot of X80 and C25 Smartview monitor
Attached Thumbnails
x80-performance-4-times-less-than-c25-nn-gi-peak-hour-example-1.jpg  
Reply With Quote
  #2 (permalink)  
Old 2008-04-07
Senior Member
 
Join Date: 2007-09-17
Location: Singapore
Posts: 161
Rep Power: 2
chuachongchee has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by Dragon View Post
We are seeing almost 100% CPU load at peak hour on APM with very little traffic (see summary below). Has anybody else come across such trully weak X80 results? Crossbeam so fas has been unable to reslove the issue:

To summrise:

Traffic profile:

- total throughput 150MB/s
- new connections per second <1000
- accepted packet rate < 27000
- concurrent connections < 60000
- packet size distribution: 2/3 < 199; 1/3 > 1400

According to X80 APM specifications it should be able to handle much more than that (at least double). Same traffic profile causes only 14% load on C25 HW.

Actions taken so far:
- connections table increased from 100'000 to 200'000
- SmartDefense deactivated (made no noticable difference)
- Logging turned off (made no noticable difference)
- XOS upgraded to latest patch 7.2.1.5

Observations:
- majority of CPU time is spent on "softirq" *(> 70%)

Attached is a screenshoot of X80 and C25 Smartview monitor
Whats your hardware configuration for x80? Checkpoint app version n hfa??

DId you turn on SecureXL?

WHat happens if you run "fw ctl zdebug drop"?? Whats being dropped??
Reply With Quote
  #3 (permalink)  
Old 2008-04-07
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Whats your hardware configuration for x80? Checkpoint app version n hfa??

>>> 2xNPM + APM; NGX R61 HFA03

DId you turn on SecureXL?

>>> Not on X80 but see note below about C25

WHat happens if you run "fw ctl zdebug drop"?? Whats being dropped??

>>> mostly 95% rulebase drops + antispoofing, but you get 1/sec
fw_log_drop: Packet proto=-1 ?:0 -> ?:0 dropped by fwchain_frag Reason: wait for more fragments

It turns out that C25s are not better. Because C25 has 4 cores, CPU utilisation figure is average between 4. Top shows quite a different picture. At peak hour 2 CPUs runs 100% softirq - same as X80. Secure XL made no difference.

So far working with Checkpoint and Crossbeam we tried following without any success:
  • turn off sync on http (most used service ~95%)
  • implemented SK25921 (tweaking Intel PRO/1000 NIC parameters)
  • tried single "any-any-accept" rule (automatic NATs on objets were left on)
  • manually increased allocated memory (cluster capacity management)
  • We also tried changing global properties as described in Check Point Software: FireWall-1 Performance Tuning Guide
    nat_limit (125000)
    nat_hashsize (131072)
    http_buffers_size (16384)
Reply With Quote
  #4 (permalink)  
Old 2008-04-07
Senior Member
 
Join Date: 2007-09-17
Location: Singapore
Posts: 161
Rep Power: 2
chuachongchee has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by Dragon View Post
Whats your hardware configuration for x80? Checkpoint app version n hfa??

>>> 2xNPM + APM; NGX R61 HFA03

DId you turn on SecureXL?

>>> Not on X80 but see note below about C25

WHat happens if you run "fw ctl zdebug drop"?? Whats being dropped??

>>> mostly 95% rulebase drops + antispoofing, but you get 1/sec
fw_log_drop: Packet proto=-1 ?:0 -> ?:0 dropped by fwchain_frag Reason: wait for more fragments

It turns out that C25s are not better. Because C25 has 4 cores, CPU utilisation figure is average between 4. Top shows quite a different picture. At peak hour 2 CPUs runs 100% softirq - same as X80. Secure XL made no difference.

So far working with Checkpoint and Crossbeam we tried following without any success:
  • turn off sync on http (most used service ~95%)
  • implemented SK25921 (tweaking Intel PRO/1000 NIC parameters)
  • tried single "any-any-accept" rule (automatic NATs on objets were left on)
  • manually increased allocated memory (cluster capacity management)
  • We also tried changing global properties as described in Check Point Software: FireWall-1 Performance Tuning Guide
    nat_limit (125000)
    nat_hashsize (131072)
    http_buffers_size (16384)
Hi,

You are talking about a few different things at one go..

1st of all, C25 has a rated throughput of 6gbps, an x80-AC2 has a rated throughput of 8Gbps, and a x80-AC3 can do 40gbps..

C25 has dual single cores if i remember correctly, with the new APM8600s for x80, you can have dual dual-cores..

Moreover the C-series and X-series are 2 entirely different architecture.. State sync on C-series relies on a crossover cable between 2 units, for an x80, they work on the backplane..

Next, on the x80, because it is not jus the state sync you have to be clear about, theres the control network as well, and since all these are constantly working, they take up more cpu then "normal".. If you top in up with more blades on active~active, cpu will spike even more...

On your http traffic, did you try turning off connection mirroring (Cant remember excat terms.. something like that) or sync for http??

For c25, i presume you are running COS? For Checkpoint, if you are running crossbeam OSes, the cpu% u see in Smartview Monitor, this will be a measure of Softirq, this is an indication of disk I/O, not the actual cpu in itself..

Next, I would suggest to use Checkpoint R60 or R65, if not R62.. generally these 3 versions are the more stable ones, yes i have seen stable R61s, but from checkpoint's perspective it is the 3 mentioned versions that are more stable
Reply With Quote
  #5 (permalink)  
Old 2008-04-13
Junior Member
 
Join Date: 2007-01-03
Location: CA
Posts: 22
Rep Power: 0
woody has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

What XOS version currently? Are the APMs 8400 or 8600s.. ?
When you log into that vap-group and run 'top', anything stand out in that list of processes ?
Reply With Quote
  #6 (permalink)  
Old 2008-04-17
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by chuachongchee View Post
Hi,

You are talking about a few different things at one go..

1st of all, C25 has a rated throughput of 6gbps, an x80-AC2 has a rated throughput of 8Gbps, and a x80-AC3 can do 40gbps..

C25 has dual single cores if i remember correctly, with the new APM8600s for x80, you can have dual dual-cores..

Moreover the C-series and X-series are 2 entirely different architecture.. State sync on C-series relies on a crossover cable between 2 units, for an x80, they work on the backplane..

Next, on the x80, because it is not jus the state sync you have to be clear about, theres the control network as well, and since all these are constantly working, they take up more cpu then "normal".. If you top in up with more blades on active~active, cpu will spike even more...

On your http traffic, did you try turning off connection mirroring (Cant remember excat terms.. something like that) or sync for http??

For c25, i presume you are running COS? For Checkpoint, if you are running crossbeam OSes, the cpu% u see in Smartview Monitor, this will be a measure of Softirq, this is an indication of disk I/O, not the actual cpu in itself..

Next, I would suggest to use Checkpoint R60 or R65, if not R62.. generally these 3 versions are the more stable ones, yes i have seen stable R61s, but from checkpoint's perspective it is the 3 mentioned versions that are more stable
chuachongchee,

it does not make much difference (8Gbps or 40Gbps) if it cannot run even 150Mbps...

Our C25 has 4 cores (3.2GHz) and APM is a single core 8400. We have a cluster of 2 C25s and another cluster of 2 X80s, so the sync architecture is similar (chassis to chassis)

Anyway. We had some progress on case.

Both Checkpoint and Crossbeam came to conclusion that HW is at capacity!


So if you're planning deploying Crossbeams in ISP-like environment, be very careful with dimensioning. To give you rough figures:
  • 95% of all traffic is HTTP with NAT (hide behind IP)
  • packet rate 42'000
  • new connections rate 1'500
  • concurrent connections 90'000
  • throughput 190Mbps

With this profile:
C25 will run 60-70% CPU utilisation with SecureXL switched ON. If turned off, CPU goes 100% (softirq)
X80 with two APM blades (8400, single CPU, SecureXL off) - each peaks at 60%. There is a potential to improve X80 if we had dual CPU APM - we could turn SecureXL on which potentially would reduce load to 30-40%. According to Checkpoint, we cannot run SecureXL on single CPU as it only will increase CPU utilisation. Furthermore option is going 8600.

REMEMBER!

We had to modify default HTTP protocol settings, see attached screenshot. Changing protocol type from "HTTP" to "none" dropped CPU utilisation from 80% to 50% on C25s. Removing Sync in cluster reduced another 2-5%.
Attached Thumbnails
x80-performance-4-times-less-than-c25-http_adv.jpg  

Last edited by Dragon; 2008-04-17 at 19:35. Reason: Accidental submit
Reply With Quote
  #7 (permalink)  
Old 2008-04-17
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by woody View Post
What XOS version currently? Are the APMs 8400 or 8600s.. ?
When you log into that vap-group and run 'top', anything stand out in that list of processes ?
woody

as per original message:
XOS - 7.2.1.5
APM - single CPU 8400
top - softirq causes 100% load peak hours
Reply With Quote
  #8 (permalink)  
Old 2008-04-18
Junior Member
 
Join Date: 2007-05-04
Posts: 4
Rep Power: 0
wa1di has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

First of all turn on PPK (SecureXL). It will incerease performance (if you are not using features disabling it). And one more hint - if you would have dual cpu apm blades softirq caused by traffic would be redistributed between 4 cores. Additionally with PPK enabled it would work perfect.
I wonder what exactly causes this high softirq on blades. I had the same problem before and I never reached any APM/NPM limit.
Reply With Quote
  #9 (permalink)  
Old 2008-04-20
Senior Member
 
Join Date: 2007-09-17
Location: Singapore
Posts: 161
Rep Power: 2
chuachongchee has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by Dragon View Post
chuachongchee,

it does not make much difference (8Gbps or 40Gbps) if it cannot run even 150Mbps...

Our C25 has 4 cores (3.2GHz) and APM is a single core 8400. We have a cluster of 2 C25s and another cluster of 2 X80s, so the sync architecture is similar (chassis to chassis)

Anyway. We had some progress on case.

Both Checkpoint and Crossbeam came to conclusion that HW is at capacity!


So if you're planning deploying Crossbeams in ISP-like environment, be very careful with dimensioning. To give you rough figures:
  • 95% of all traffic is HTTP with NAT (hide behind IP)
  • packet rate 42'000
  • new connections rate 1'500
  • concurrent connections 90'000
  • throughput 190Mbps

With this profile:
C25 will run 60-70% CPU utilisation with SecureXL switched ON. If turned off, CPU goes 100% (softirq)
X80 with two APM blades (8400, single CPU, SecureXL off) - each peaks at 60%. There is a potential to improve X80 if we had dual CPU APM - we could turn SecureXL on which potentially would reduce load to 30-40%. According to Checkpoint, we cannot run SecureXL on single CPU as it only will increase CPU utilisation. Furthermore option is going 8600.

REMEMBER!

We had to modify default HTTP protocol settings, see attached screenshot. Changing protocol type from "HTTP" to "none" dropped CPU utilisation from 80% to 50% on C25s. Removing Sync in cluster reduced another 2-5%.

Depending on how old your chassis is, you might not even be able to move onto 8600-series blades.

CPS, max new CPS aside, a few things i can recomened:
- Upgrade to 8.0.1.1
- Upgrade to R65 w/ HFA02

Both the XOS and R65 has some major enhancements and performance increases.

Next, Not sure how many blades in A~A you are runing, have you tried to increase the blades or add ram in it? Whats the max connections you set in the checkpoint cluster object??

One thing is turning off connection sync for stateless and/or short lived protocols like http, https, smtp, pop3 etc. This is help to reduce load. Also, have you tried "NAT-Reclassify" to ON?


I have deployed the x80 in an telco env b4, and done a POC in another telco, i have not seen performance issues so far. With the installed telco, they are runing xos7.3.0.3 w/ R62, SBHA, 5 APM8400, 4GB RAM in A~A. The other telco i have done the poc is runing 2 blades in A~S, apm8200s, one with dual cpu, both 1gb ram..
Reply With Quote
  #10 (permalink)  
Old 2008-04-21
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by chuachongchee View Post
Depending on how old your chassis is, you might not even be able to move onto 8600-series blades.

CPS, max new CPS aside, a few things i can recomened:
- Upgrade to 8.0.1.1
- Upgrade to R65 w/ HFA02

Both the XOS and R65 has some major enhancements and performance increases.

Next, Not sure how many blades in A~A you are runing, have you tried to increase the blades or add ram in it? Whats the max connections you set in the checkpoint cluster object??

One thing is turning off connection sync for stateless and/or short lived protocols like http, https, smtp, pop3 etc. This is help to reduce load. Also, have you tried "NAT-Reclassify" to ON?


I have deployed the x80 in an telco env b4, and done a POC in another telco, i have not seen performance issues so far. With the installed telco, they are runing xos7.3.0.3 w/ R62, SBHA, 5 APM8400, 4GB RAM in A~A. The other telco i have done the poc is runing 2 blades in A~S, apm8200s, one with dual cpu, both 1gb ram..
Asked Checkpoint gurus in Israel and they said R65 won't bring any performance gain.

Sync is off as per my previous comment

XOS 8 might be an option but Crossbeam could not confirm if it will improve our performance.

Just installed second APM8400 blade, both run at approx 60% peak hour now.

Max connections is set to 200'000

NAT-reclassify is ON

Next plan is to upgrade to 8600 which are dual core and will allow enabling SecureXL (plus they should be 50% faster)
Reply With Quote
  #11 (permalink)  
Old 2008-04-21
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by wa1di View Post
First of all turn on PPK (SecureXL). It will incerease performance (if you are not using features disabling it). And one more hint - if you would have dual cpu apm blades softirq caused by traffic would be redistributed between 4 cores. Additionally with PPK enabled it would work perfect.
I wonder what exactly causes this high softirq on blades. I had the same problem before and I never reached any APM/NPM limit.
Cannot run SecureXL on single core CPU APM blade - it only makes it worse (at quiet hour CPU utlisation jumped from 40% to 80% after turning SecureXL on)
Reply With Quote
  #12 (permalink)  
Old 2008-04-22
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by Dragon View Post
Cannot run SecureXL on single core CPU APM blade - it only makes it worse (at quiet hour CPU utlisation jumped from 40% to 80% after turning SecureXL on)
That was Checkpoint advise, now they want to recall it.. So looks like we have a problem with X80 + SecureXL, will update when we resolve it
Reply With Quote
  #13 (permalink)  
Old 2008-04-22
Senior Member
 
Join Date: 2007-09-17
Location: Singapore
Posts: 161
Rep Power: 2
chuachongchee has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by Dragon View Post
That was Checkpoint advise, now they want to recall it.. So looks like we have a problem with X80 + SecureXL, will update when we resolve it
I understand that SecureXL was an issue in NGAIR55, when securexl is turned on, connections break. This, as i understand it, was due to a bug in R55.

There is currently no issues with turning on SecureXL in NGX on x80. As far as where checkpoint crossbeam and support is concerned, SecureXL SHOULD reduce CPU load.

So unless you can provide details from a reliable source or such that you have grounds to prove that, i think believe SecureXL will help in most situations.
Reply With Quote
  #14 (permalink)  
Old 2008-04-30
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by chuachongchee View Post
I understand that SecureXL was an issue in NGAIR55, when securexl is turned on, connections break. This, as i understand it, was due to a bug in R55.

There is currently no issues with turning on SecureXL in NGX on x80. As far as where checkpoint crossbeam and support is concerned, SecureXL SHOULD reduce CPU load.

So unless you can provide details from a reliable source or such that you have grounds to prove that, i think believe SecureXL will help in most situations.
X80 SecureXL resolved: we had to hide vlan tag when forwarding traffic from NMP TO APM:

circuit gprs_wap circuit-id 1028
device-name gprs.wap
proxy-arp
vap-group aunn00fw02gi
hide-vlan-header
default-egress-vlan-tag 214
After doing this enabling SecureXL dropped CPU load by 50%.

Last edited by Dragon; 2008-04-30 at 00:59.
Reply With Quote
  #15 (permalink)  
Old 2008-04-30
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Update on C25s:

1. using Checkpoint PPK (SecureXL) sim affinity command we hardcoded interface allocation to two different cores which dropped load by 50% when running in auto mode. Just search for Checkpoint Performance Pack documentation - it explains well the usage

2. We noticed quite a few packet drops on busy interface (peaking at approx 160Mbps). Crossbeam asked us to hardcode switches to use flow control as well as speed and duplex setting:

interface GigabitEthernet1/7
description eths2p2 WAP
switchport access vlan 100
speed 1000
duplex full
flowcontrol receive on
flowcontrol send on


It reduced number of drops dramatically but they are still present! Case still open with Crossbeam

Last edited by Dragon; 2008-04-30 at 00:57.
Reply With Quote
  #16 (permalink)  
Old 2008-04-30
Senior Member
 
Join Date: 2007-09-17
Location: Singapore
Posts: 161
Rep Power: 2
chuachongchee has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by Dragon View Post
X80 SecureXL resolved: we had to hide vlan tag when forwarding traffic from NMP TO APM:

circuit gprs_wap circuit-id 1028
device-name gprs.wap
proxy-arp
vap-group aunn00fw02gi
hide-vlan-header
default-egress-vlan-tag 214
After doing this enabling SecureXL dropped CPU load by 50%.
Glad to hear that...

i see Gi interface... are you a telco? ;D

Last edited by chuachongchee; 2008-04-30 at 09:20.
Reply With Quote
  #17 (permalink)  
Old 2008-04-30
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by chuachongchee View Post
Glad to hear that...

i see Gi interface... are you a telco? ;D
thought it was obvious from all other inputs ;)
Reply With Quote
  #18 (permalink)  
Old 2008-04-30
Senior Member
 
Join Date: 2007-09-17
Location: Singapore
Posts: 161
Rep Power: 2
chuachongchee has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by Dragon View Post
thought it was obvious from all other inputs ;)
Well, not really, such issues are not jus limited to telcos, i have had larger entreprises do performance tuning as well... Its only when i saw the vap name that i knew probably was telcos...

Well, since only Crossbeam had such high throughput for firewalls/security apps, probably might only telcos have such requirements, but then again, large enterprises make sense too...

Cheers. ;D
Reply With Quote
  #19 (permalink)  
Old 2008-05-05
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by chuachongchee View Post
Well, not really, such issues are not jus limited to telcos, i have had larger entreprises do performance tuning as well... Its only when i saw the vap name that i knew probably was telcos...

Well, since only Crossbeam had such high throughput for firewalls/security apps, probably might only telcos have such requirements, but then again, large enterprises make sense too...

Cheers. ;D
All I want to say guys is be very careful when assessing Crossbeam performance! Do not look only at the throughput figure (40Gbps) - it can be misleading, in our case big time! Check TCP connection rate and packets per second rate (small UDP packets) and compare to other vendors before making your selection. Especially if you have large number of connections NATed.

Our experience show that you won't be able to run more than 100Mbps on a single APM8400 (NAT used > 95% cases) with CPU going close to 50% utilisation.
Reply With Quote
  #20 (permalink)  
Old 2008-05-27
Junior Member
 
Join Date: 2008-03-19
Posts: 16
Rep Power: 0
Dragon has an average reputation (10+)
Default Re: X80 performance 4 times less than C25!?

Quote:
Originally Posted by Dragon View Post
Update on C25s:

1. using Checkpoint PPK (SecureXL) sim affinity command we hardcoded interface allocation to two different cores which dropped load by 50% when running in auto mode. Just search for Checkpoint Performance Pack documentation - it explains well the usage

2. We noticed quite a few packet drops on busy interface (peaking at approx 160Mbps). Crossbeam asked us to hardcode switches to use flow control as well as speed and duplex setting:

interface GigabitEthernet1/7
description eths2p2 WAP
switchport access vlan 100
speed 1000
duplex full
flowcontrol receive on
flowcontrol send on


It reduced number of drops dramatically but they are still present! Case still open with Crossbeam

Update on packet drops on C25 from Crossbeam:


Our engineering team has a new development. We executed the same test after upgrading to HFA02 on R65 and the RX drop problem is no longer seen
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT -7. The time now is 00:35.


Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.2.0