CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


Tim Hall has done it again! He has just released the 2nd edition of "Max Power".
Rather than get into details here, I urge you to check out this announcement post.
It's a massive upgrade, and well worth checking out. -E

 

Page 1 of 2 12 LastLast
Results 1 to 20 of 25

Thread: Checkpoint 5400 100% CPU usage

  1. #1
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Checkpoint 5400 100% CPU usage

    Hi,

    in my place of work we have x2 checkpoint 5400 appliances running in a clustered configuration. We're struggling badly at the minute with them as CPU usage seems to be maxed out most of the time.

    We have all of the acceleration templates drop templates etc enabled. I have tried to enable hyperthreading, but it looks like either the 5400 doesn't support it, or it's disabled in the BIOS (one for the support contractors to resolve).

    when running cpview I have noticed that there is 1 connection which stands out when CPU usage is extremely high (90% - 100%) the TCP connection is iSCSI, I'm pretty sure we shouldn't have iscsi traffic running through the firewall, and that's something I'll look into resolving when back in the office, however the pps for this traffic is only 7500, bandwith throughput is a measly 65Mbps or so, everything else is barely hitting 3 figures pps and doesn't even register on the Mbps column.

    According to this document

    https://www.checkpoint.com/downloads...ison-chart.pdf

    the 5400 should be capable of 15,000 pps. What gives? Why is our appliance struggling so badly? Is it because it's iSCSI traffic, or is there something else that I've missed? (highly likely I'm very new to checkpoint).

    Any and all advise is very much appreciated.

  2. #2
    Join Date
    2007-03-30
    Location
    DFW, TX
    Posts
    265
    Rep Power
    12

    Default Re: Checkpoint 5400 100% CPU usage

    The comparison chart you linked actually says these boxes should be able to handle 150,000 new connections per second (under ideal testing conditions, of course). Setting up new connections is computationally expensive. They should be able to handle far more than that in terms of packets per second on existing connections.



    Where are you seeing CPU usage maxed out? Different tools report different levels of usage as "100%". Some report an average of all cores (so 100% on one core would be reported as 25%, and 100% of four cores would be reported as 100%), while others add the cores together (so 100% on one core would be reported as 100%, but 100% of four cores would be reported as 400%).

    What cluster mode are you running? You can check this with 'cphaprob state'.

    Do you have a separate SmartCenter, or are these firewalls also management servers? To check this, run 'fwm ver'.

    On the active member, what does your RAM usage look like? Check this with the 'free -m' command.



    Depending on what features you have enabled, the boxes may be running low on RAM, which causes them to swap data out to the disk. Swapping data out to disk, then swapping other data back into RAM is a synchronous operation. The time spent doing that gets booked as consumed processor time, even though it isn't really the processor doing any work.
    Zimmie

  3. #3
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,229
    Rep Power
    13

    Default Re: Checkpoint 5400 100% CPU usage

    The 5400 does not support SMT/Hyperthreading, support for SMT starts with the 5800 model and higher.

    Please provide the output of the following commands for further diagnosis, ideally run when the system is exhibiting its worst performance:

    free -m

    netstat -ni

    enabled_blades

    fwaccel stat

    fwaccel stats -s

    fw ctl multik stat

    fw ctl affinity -l -r

    fw ctl multik get_mode (R77.30) or fw ctl multik dynamic_dispatching get_mode (R80.10+)

    cpstat os -f multi_cpu -o 1

    cpconfig (the menu displayed by this command)
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  4. #4
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by Bob_Zimmerman View Post
    Where are you seeing CPU usage maxed out? Different tools report different levels of usage as "100%". Some report an average of all cores (so 100% on one core would be reported as 25%, and 100% of four cores would be reported as 100%), while others add the cores together (so 100% on one core would be reported as 100%, but 100% of four cores would be reported as 400%).
    We use SOlarwinds to monitor all our kit, but also SSH'ing to each node in the cluster and running cpview, both report similar numbers.

    Quote Originally Posted by Bob_Zimmerman View Post
    What cluster mode are you running? You can check this with 'cphaprob state'.
    Cluster Mode: High Availability (Active Up) with IGMP Membership

    Number Unique Address Assigned Load State

    1 (local) XXX.XXX.XXX.XXX 100% Active
    2 0% Standby

    Quote Originally Posted by Bob_Zimmerman View Post
    Do you have a separate SmartCenter, or are these firewalls also management servers? To check this, run 'fwm ver'.
    We have a separate management appliance. running that command gives the following output:

    This is not a Security Management Server station


    Quote Originally Posted by Bob_Zimmerman View Post
    On the active member, what does your RAM usage look like? Check this with the 'free -m' command.
    total used free shared buffers cached
    Mem: 15812 8476 7336 0 358 4065
    -/+ buffers/cache: 4051 11761
    Swap: 17390 0 17390

    Looks like there's plenty free to me?


    Quote Originally Posted by Bob_Zimmerman View Post
    Depending on what features you have enabled, the boxes may be running low on RAM, which causes them to swap data out to the disk. Swapping data out to disk, then swapping other data back into RAM is a synchronous operation. The time spent doing that gets booked as consumed processor time, even though it isn't really the processor doing any work.
    to be honest in an effort to reduce the CPU usage we've taken to basically tuning 90% of the features off. only the IPS is running at the minute really

  5. #5
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by ShadowPeak.com View Post
    The 5400 does not support SMT/Hyperthreading, support for SMT starts with the 5800 model and higher.
    that's annoying, Checkpoint support suggested enabling it as a potential solution!

    Quote Originally Posted by ShadowPeak.com View Post
    free -m
    total used free shared buffers cached
    Mem: 15812 8474 7338 0 358 4066
    -/+ buffers/cache: 4049 11763
    Swap: 17390 0 17390

    Quote Originally Posted by ShadowPeak.com View Post
    netstat -ni
    Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
    Mgmt 1500 0 55782830 0 14 14 665957109 0 0 0 BMRU
    Sync 1500 0 64227211 0 0 0 258529161 0 0 0 BMRU
    bond0 1500 0 15387506853 0 393325 393325 12723291410 0 0 0 BMmRU
    bond1 1500 0 54812944762 0 18592 18592 56658898954 0 0 0 BMmRU
    bond1.103 1500 0 22599292224 0 0 0 20259908774 0 0 0 BMmRU
    bond1.104 1500 0 4714192578 0 0 0 5145028503 0 0 0 BMmRU
    bond1.301 1500 0 4713 0 0 0 61354 0 0 0 BMmRU
    bond1.302 1500 0 61196 0 0 0 1525687 0 0 0 BMmRU
    bond1.401 1500 0 12121689 0 0 0 0 0 0 0 BMmRU
    bond1.410 1500 0 27466905116 0 0 0 31249907646 0 0 0 BMmRU
    bond1.411 1500 0 15834510 0 0 0 15271286 0 0 0 BMmRU
    bond1.990 1500 0 1977 0 0 0 2571 0 0 0 BMmRU
    eth1 1500 0 6393994266 0 0 0 6267717651 0 0 0 BMsRU
    eth2 1500 0 8993512627 0 393325 393325 6455573799 0 0 0 BMsRU
    eth3 1500 0 26659358408 0 10289 10289 30857876234 0 0 0 BMsRU
    eth4 1500 0 28153586408 0 8303 8303 25801022770 0 0 0 BMsRU
    lo 16436 0 13644985 0 0 0 13644985 0 0 0 LRU

    Quote Originally Posted by ShadowPeak.com View Post
    enabled_blades
    fw appi ips identityServer

    Quote Originally Posted by ShadowPeak.com View Post
    fwaccel stat
    Accelerator Status : on
    Accept Templates : enabled
    Drop Templates : disabled
    NAT Templates : disabled by user

    Accelerator Features : Accounting, NAT, Cryptography, Routing,
    HasClock, Templates, Synchronous, IdleDetection,
    Sequencing, TcpStateDetect, AutoExpire,
    DelayedNotif, TcpStateDetectV2, CPLS, McastRouting,
    WireMode, DropTemplates, NatTemplates,
    Streaming, MultiFW, AntiSpoofing, Nac,
    ViolationStats, AsychronicNotif, ERDOS,
    NAT64, GTPAcceleration, SCTPAcceleration,
    McastRoutingV2
    Cryptography Features : Tunnel, UDPEncapsulation, MD5, SHA1, NULL,
    3DES, DES, CAST, CAST-40, AES-128, AES-256,
    ESP, LinkSelection, DynamicVPN, NatTraversal,
    EncRouting, AES-XCBC, SHA256

    Quote Originally Posted by ShadowPeak.com View Post
    fwaccel stats -s
    Accelerated conns/Total conns : 1073/6949 (15%)
    Delayed conns/(Accelerated conns + PXL conns) : 151/6537 (2%)
    Accelerated pkts/Total pkts : 1272408892/1600746449 (79%)
    F2Fed pkts/Total pkts : 15375115/1600746449 (0%)
    PXL pkts/Total pkts : 312962442/1600746449 (19%)
    QXL pkts/Total pkts : 0/1600746449 (0%)

    Quote Originally Posted by ShadowPeak.com View Post
    fw ctl multik stat
    ID | Active | CPU | Connections | Peak
    ----------------------------------------------
    0 | Yes | 1 | 3852 | 31734
    1 | Yes | 0 | 3513 | 25076

    Quote Originally Posted by ShadowPeak.com View Post
    fw ctl affinity -l -r
    CPU 0: eth1 eth2 Sync
    fw_1
    CPU 1: eth3 eth4 Mgmt
    fw_0
    All: rad vpnd fwd pdpd pepd lpd rtmd mpdaemon cpd cprid

    Quote Originally Posted by ShadowPeak.com View Post
    fw ctl multik get_mode (R77.30) or fw ctl multik dynamic_dispatching get_mode (R80.10+)
    Current mode is Off - I've actually turned this on, another Checkpoint suggested solution, I just haven't rebooted the Firewalls yet, that's this evenings job.

    Quote Originally Posted by ShadowPeak.com View Post
    cpstat os -f multi_cpu -o 1
    Processors load
    ---------------------------------------------------------------------------------
    |CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
    ---------------------------------------------------------------------------------
    | 1| 0| 7| 93| 7| ?| 7286|
    | 2| 0| 6| 93| 7| ?| 7286|
    ---------------------------------------------------------------------------------

    Quote Originally Posted by ShadowPeak.com View Post
    cpconfig (the menu displayed by this command)
    Configuration Options:
    ----------------------
    (1) Licenses and contracts
    (2) SNMP Extension
    (3) PKCS#11 Token
    (4) Random Pool
    (5) Secure Internal Communication
    (6) Disable cluster membership for this gateway
    (7) Enable Check Point Per Virtual System State
    (8) Enable Check Point ClusterXL for Bridge Active/Standby
    (9) Disable Check Point SecureXL
    (10) Check Point CoreXL
    (11) Automatic start of Check Point Products

    (12) Exit


    Typically it's now currently only consuming between 9 - 40% CPU, I'll grab the relevant outputs again when it's maxed out.
    Last edited by RichardPriest; 2018-03-31 at 10:28.

  6. #6
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,229
    Rep Power
    13

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by RichardPriest View Post
    that's annoying, Checkpoint support suggested enabling it as a potential solution!
    The underlying 5400 processor does not support it at all, SMT is not deliberately disabled by Check Point:


    https://ark.intel.com/products/77775...Cache-3_20-GHz

    total used free shared buffers cached
    Mem: 15812 8474 7338 0 358 4066
    -/+ buffers/cache: 4049 11763
    Swap: 17390 0 17390


    Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
    Mgmt 1500 0 55782830 0 14 14 665957109 0 0 0 BMRU
    Sync 1500 0 64227211 0 0 0 258529161 0 0 0 BMRU
    bond0 1500 0 15387506853 0 393325 393325 12723291410 0 0 0 BMmRU
    bond1 1500 0 54812944762 0 18592 18592 56658898954 0 0 0 BMmRU
    bond1.103 1500 0 22599292224 0 0 0 20259908774 0 0 0 BMmRU
    bond1.104 1500 0 4714192578 0 0 0 5145028503 0 0 0 BMmRU
    bond1.301 1500 0 4713 0 0 0 61354 0 0 0 BMmRU
    bond1.302 1500 0 61196 0 0 0 1525687 0 0 0 BMmRU
    bond1.401 1500 0 12121689 0 0 0 0 0 0 0 BMmRU
    bond1.410 1500 0 27466905116 0 0 0 31249907646 0 0 0 BMmRU
    bond1.411 1500 0 15834510 0 0 0 15271286 0 0 0 BMmRU
    bond1.990 1500 0 1977 0 0 0 2571 0 0 0 BMmRU
    eth1 1500 0 6393994266 0 0 0 6267717651 0 0 0 BMsRU
    eth2 1500 0 8993512627 0 393325 393325 6455573799 0 0 0 BMsRU
    eth3 1500 0 26659358408 0 10289 10289 30857876234 0 0 0 BMsRU
    eth4 1500 0 28153586408 0 8303 8303 25801022770 0 0 0 BMsRU
    lo 16436 0 13644985 0 0 0 13644985 0 0 0 LRU


    fw appi ips identityServer
    Memory and network interfaces look good, however you have application control enabled but not URL filtering? That's a bit odd but not related to your performance problem.



    Accelerator Status : on
    Accept Templates : enabled
    Drop Templates : disabled
    NAT Templates : disabled by user

    Accelerator Features : Accounting, NAT, Cryptography, Routing,
    HasClock, Templates, Synchronous, IdleDetection,
    Sequencing, TcpStateDetect, AutoExpire,
    DelayedNotif, TcpStateDetectV2, CPLS, McastRouting,
    WireMode, DropTemplates, NatTemplates,
    Streaming, MultiFW, AntiSpoofing, Nac,
    ViolationStats, AsychronicNotif, ERDOS,
    NAT64, GTPAcceleration, SCTPAcceleration,
    McastRoutingV2
    Cryptography Features : Tunnel, UDPEncapsulation, MD5, SHA1, NULL,
    3DES, DES, CAST, CAST-40, AES-128, AES-256,
    ESP, LinkSelection, DynamicVPN, NatTraversal,
    EncRouting, AES-XCBC, SHA256


    Accelerated conns/Total conns : 1073/6949 (15%)
    Delayed conns/(Accelerated conns + PXL conns) : 151/6537 (2%)
    Accelerated pkts/Total pkts : 1272408892/1600746449 (79%)
    F2Fed pkts/Total pkts : 15375115/1600746449 (0%)
    PXL pkts/Total pkts : 312962442/1600746449 (19%)
    QXL pkts/Total pkts : 0/1600746449 (0%)


    ID | Active | CPU | Connections | Peak
    ----------------------------------------------
    0 | Yes | 1 | 3852 | 31734
    1 | Yes | 0 | 3513 | 25076


    CPU 0: eth1 eth2 Sync
    fw_1
    CPU 1: eth3 eth4 Mgmt
    fw_0
    All: rad vpnd fwd pdpd pepd lpd rtmd mpdaemon cpd cprid
    All that looks very good, about 80% of your traffic is accelerated which is great!


    Current mode is Off - I've actually turned this on, another Checkpoint suggested solution, I just haven't rebooted the Firewalls yet, that's this evenings job.
    Turning on DD may help a little, but won't make a huge difference anyway since so much of your traffic is accelerated. The DD only helps balance traffic which is PXL/F2F.

    Processors load
    ---------------------------------------------------------------------------------
    |CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
    ---------------------------------------------------------------------------------
    | 1| 0| 7| 93| 7| ?| 7286|
    | 2| 0| 6| 93| 7| ?| 7286|
    ---------------------------------------------------------------------------------



    Configuration Options:
    ----------------------
    (1) Licenses and contracts
    (2) SNMP Extension
    (3) PKCS#11 Token
    (4) Random Pool
    (5) Secure Internal Communication
    (6) Disable cluster membership for this gateway
    (7) Enable Check Point Per Virtual System State
    (8) Enable Check Point ClusterXL for Bridge Active/Standby
    (9) Disable Check Point SecureXL
    (10) Check Point CoreXL
    (11) Automatic start of Check Point Products

    (12) Exit


    Typically it's now currently only consuming between 9 - 40% CPU, I'll grab the relevant outputs again when it's maxed out.
    Distributed configuration (good) CPU obviously not too busy when these commands were run.

    High CPU *might* be caused by an overloaded sync network between cluster members and you will need to consider selective synchronization of services if that is the case, to determine that please provide output of the following as well:

    fw ctl pstat

    Edit: Using cpview -t go back in time to a known period of high CPU utilization and please report the type of numbers being displayed for Bits/sec, Packets/sec, Connections/sec, & Concurrent connections on the Overview screen.

    Since you suspect iSCSI traffic may be the culprit, make sure that traffic is not getting dragged into the PXL/F2F path by appi (ensure you are not using Any as a destination in APCL/URLF policy) or IPS (can be immediately disabled for new connections with the ips off command for testing). fwaccel conns can be used to verify which path the iSCSI traffic is getting processed in.
    Last edited by ShadowPeak.com; 2018-03-31 at 13:04.
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  7. #7
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by ShadowPeak.com View Post
    The underlying 5400 processor does not support it at all, SMT is not deliberately disabled by Check Point:


    https://ark.intel.com/products/77775...Cache-3_20-GHz
    Sorry what I meant by that was our support contractors have passed this issue onto Checkpoint and they suggested turning Hyperthreading on! this issue has been going on far too long, I've been trying to resolve the issue myself / research as much as I can which has led me to this forum.


    Quote Originally Posted by ShadowPeak.com View Post
    fw ctl pstat

    Edit: Using cpview -t go back in time to a known period of high CPU utilization and please report the type of numbers being displayed for Bits/sec, Packets/sec, Connections/sec, & Concurrent connections on the Overview screen.

    Since you suspect iSCSI traffic may be the culprit, make sure that traffic is not getting dragged into the PXL/F2F path by appi (ensure you are not using Any as a destination in APCL/URLF policy) or IPS (can be immediately disabled for new connections with the ips off command for testing). fwaccel conns can be used to verify which path the iSCSI traffic is getting processed in.
    OK, CPU usage is now hovering at around 98% on a Saturday night!

    result of fw ctl pstat is:

    System Capacity Summary:
    Memory used: 8% (1036 MB out of 11763 MB) - below watermark
    Concurrent Connections: 6314 (Unlimited)
    Aggressive Aging is in detect mode

    Hash kernel memory (hmem) statistics:
    Total memory allocated: 1233125376 bytes in 301056 (4096 bytes) blocks using 1 pool
    Total memory bytes used: 146833184 unused: 1086292192 (88.09%) peak: 482980276
    Total memory blocks used: 54460 unused: 246596 (81%) peak: 123144
    Allocations: 233482188 alloc, 0 failed alloc, 232013950 free

    System kernel memory (smem) statistics:
    Total memory bytes used: 1926550188 peak: 1975698744
    Total memory bytes wasted: 3592549
    Blocking memory bytes used: 4424456 peak: 10920712
    Non-Blocking memory bytes used: 1922125732 peak: 1964778032
    Allocations: 7834977 alloc, 0 failed alloc, 7832813 free, 0 failed free
    vmalloc bytes used: 1918308932 expensive: no

    Kernel memory (kmem) statistics:
    Total memory bytes used: 834664984 peak: 1130048568
    Allocations: 241314314 alloc, 0 failed alloc
    239844621 free, 0 failed free
    External Allocations: 1221120 for packets, 86497657 for SXL

    Cookies:
    1362187951 total, 0 alloc, 0 free,
    668 dup, 2940940326 get, 270591025 put,
    1907131572 len, 3108612 cached len, 0 chain alloc,
    0 chain free

    Connections:
    287972655 total, 117303352 TCP, 161463590 UDP, 9188578 ICMP,
    17135 other, 101776 anticipated, 95337 recovered, 6314 concurrent,
    55008 peak concurrent

    Fragments:
    3179698 fragments, 1588725 packets, 139 expired, 0 short,
    0 large, 0 duplicates, 6 failures

    NAT:
    94304039/0 forw, 125589496/0 bckw, 123910519 tcpudp,
    13239464 icmp, 25916962-13992922 alloc

    Sync:
    Version: new
    Status: Able to Send/Receive sync packets
    Sync packets sent:
    total : 241317139, retransmitted : 20019, retrans reqs : 472, acks : 8759
    Sync packets received:
    total : 11324288, were queued : 15220, dropped by net : 1389
    retrans reqs : 15448, received 19126 acks
    retrans reqs for illegal seq : 0
    dropped updates as a result of sync overload: 0
    Callback statistics: handled 1727 cb, average delay : 1, max delay : 16


    Result of free -m

    total used free shared buffers cached
    Mem: 15812 8479 7333 0 358 4073
    -/+ buffers/cache: 4047 11765
    Swap: 17390 0 17390


    result of cpstat os -f multi_cpu -o 1

    Processors load
    ---------------------------------------------------------------------------------
    |CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
    ---------------------------------------------------------------------------------
    | 1| 1| 22| 78| 22| ?| 524|
    | 2| 0| 96| 4| 96| ?| 1048|
    ---------------------------------------------------------------------------------

    This image is a snip of the cpview overview screen, I couldn't copy and paste that screen - and I don't think it would've in a nice format anyeway
    Click image for larger version. 

Name:	cpview1.PNG 
Views:	98 
Size:	24.2 KB 
ID:	1382

    This is the network tab:
    Click image for larger version. 

Name:	cpview-network.PNG 
Views:	94 
Size:	32.8 KB 
ID:	1383

    I've run the fwaccel conns command as you suggested, but I'm not really sure how to decipher the output? I get an awful lot in the output, more than securecrt can handle in it's view buffer anyway! Can you explain to me what the following means? "PXL/F2F path by appi" apologies if this is a very simple question, I am very new to checkpoint firewalls!
    Last edited by RichardPriest; 2018-03-31 at 16:04.

  8. #8
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,229
    Rep Power
    13

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by RichardPriest View Post
    Sorry what I meant by that was our support contractors have passed this issue onto Checkpoint and they suggested turning Hyperthreading on! this issue has been going on far too long, I've been trying to resolve the issue myself / research as much as I can which has led me to this forum.




    OK, CPU usage is now hovering at around 98% on a Saturday night!

    result of fw ctl pstat is:

    System Capacity Summary:
    Memory used: 8% (1036 MB out of 11763 MB) - below watermark
    Concurrent Connections: 6314 (Unlimited)
    Aggressive Aging is in detect mode

    Hash kernel memory (hmem) statistics:
    Total memory allocated: 1233125376 bytes in 301056 (4096 bytes) blocks using 1 pool
    Total memory bytes used: 146833184 unused: 1086292192 (88.09%) peak: 482980276
    Total memory blocks used: 54460 unused: 246596 (81%) peak: 123144
    Allocations: 233482188 alloc, 0 failed alloc, 232013950 free

    System kernel memory (smem) statistics:
    Total memory bytes used: 1926550188 peak: 1975698744
    Total memory bytes wasted: 3592549
    Blocking memory bytes used: 4424456 peak: 10920712
    Non-Blocking memory bytes used: 1922125732 peak: 1964778032
    Allocations: 7834977 alloc, 0 failed alloc, 7832813 free, 0 failed free
    vmalloc bytes used: 1918308932 expensive: no

    Kernel memory (kmem) statistics:
    Total memory bytes used: 834664984 peak: 1130048568
    Allocations: 241314314 alloc, 0 failed alloc
    239844621 free, 0 failed free
    External Allocations: 1221120 for packets, 86497657 for SXL

    Cookies:
    1362187951 total, 0 alloc, 0 free,
    668 dup, 2940940326 get, 270591025 put,
    1907131572 len, 3108612 cached len, 0 chain alloc,
    0 chain free

    Connections:
    287972655 total, 117303352 TCP, 161463590 UDP, 9188578 ICMP,
    17135 other, 101776 anticipated, 95337 recovered, 6314 concurrent,
    55008 peak concurrent

    Fragments:
    3179698 fragments, 1588725 packets, 139 expired, 0 short,
    0 large, 0 duplicates, 6 failures

    NAT:
    94304039/0 forw, 125589496/0 bckw, 123910519 tcpudp,
    13239464 icmp, 25916962-13992922 alloc

    Sync:
    Version: new
    Status: Able to Send/Receive sync packets
    Sync packets sent:
    total : 241317139, retransmitted : 20019, retrans reqs : 472, acks : 8759
    Sync packets received:
    total : 11324288, were queued : 15220, dropped by net : 1389
    retrans reqs : 15448, received 19126 acks
    retrans reqs for illegal seq : 0
    dropped updates as a result of sync overload: 0
    Callback statistics: handled 1727 cb, average delay : 1, max delay : 16


    Result of free -m

    total used free shared buffers cached
    Mem: 15812 8479 7333 0 358 4073
    -/+ buffers/cache: 4047 11765
    Swap: 17390 0 17390

    Sync network & memory look fine.



    result of cpstat os -f multi_cpu -o 1

    Processors load
    ---------------------------------------------------------------------------------
    |CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
    ---------------------------------------------------------------------------------
    | 1| 1| 22| 78| 22| ?| 524|
    | 2| 0| 96| 4| 96| ?| 1048|
    --------------------------------------------------------------------------------

    This image is a snip of the cpview overview screen, I couldn't copy and paste that screen - and I don't think it would've in a nice format anyeway
    Click image for larger version. 

Name:	cpview1.PNG 
Views:	98 
Size:	24.2 KB 
ID:	1382



    This is the network tab:
    Click image for larger version. 

Name:	cpview-network.PNG 
Views:	94 
Size:	32.8 KB 
ID:	1383
    CPU 2 is slammed to 100% mostly in kernel/system space while CPU 1 is 78% idle; so technically the overall firewall CPU load is 59%. Enabling the Dynamic Dispatcher is likely to help with this situation as it will more evenly balance the traffic load between the two available cores. Enabling the DD is definitely your first course of action to take and may solve your problem, mostly.

    Beyond that, there are three different paths that packets can take through the firewall, in order of increasing CPU overhead: Accelerated/SecureXL Path (fastest - minimal CPU), Medium Path (PXL - slower - more CPU) and the Firewall Path (F2F - slowest - most CPU). You can see the percentages for each path with fwaccel stats -s and you provided statistics earlier while the firewall was not slammed showing 79% of traffic in the Accelerated/SecureXL Path (fastest). However now that the firewall is under heavy load almost all of your traffic is in the Medium Path (PXL) based on the second screenshot which is not that unusual, but I would strongly suspect that high-speed LAN traffic between two internal networks (or an internal network and a DMZ) is being pulled into the Medium Path where there is much more CPU overhead. The only blades you have currently enabled that can cause this effect are IPS and APCL. To figure out which one, try disabling IPS by unchecking the box on the firewall object and reinstalling policy. If you are still seeing CPU spikes, also disable application control and install policy. Once you determine which blade is causing the heavy PXL usage at LAN speeds issue we can troubleshoot further.

    I've run the fwaccel conns command as you suggested, but I'm not really sure how to decipher the output? I get an awful lot in the output, more than securecrt can handle in it's view buffer anyway! Can you explain to me what the following means? "PXL/F2F path by appi" apologies if this is a very simple question, I am very new to checkpoint firewalls!
    We need to figure out which blade is the culprit before dealing with this command's output.
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  9. #9
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    I thought this was odd, why is the cpu usage so high, but no interrupts?

    Click image for larger version. 

Name:	pcview-CPU.PNG 
Views:	48 
Size:	8.1 KB 
ID:	1384

  10. #10
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,229
    Rep Power
    13

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by RichardPriest View Post
    I thought this was odd, why is the cpu usage so high, but no interrupts?

    Click image for larger version. 

Name:	pcview-CPU.PNG 
Views:	48 
Size:	8.1 KB 
ID:	1384
    Interrupts in this context mostly refer to the emptying of the NIC ring buffers via the SoftIRQ process. When a SND/IRQ core becomes much more heavily utilized than the others, SecureXL automatic interface affinity shifts the SoftIRQ processing away from the slammed CPU to the more lightly loaded SND/IRQ core(s). This helps ensure timely emptying of the interface ring buffers and avoids RX-DRPs of packets (visible with netstat -ni).
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  11. #11
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by ShadowPeak.com View Post
    Sync network & memory look fine.



    CPU 2 is slammed to 100% mostly in kernel/system space while CPU 1 is 78% idle; so technically the overall firewall CPU load is 59%. Enabling the Dynamic Dispatcher is likely to help with this situation as it will more evenly balance the traffic load between the two available cores. Enabling the DD is definitely your first course of action to take and may solve your problem, mostly.
    Many thanks for that, I've actually just reloaded the x2 firewalls so Dynamic dispatcher should now be active on both units.

    Quote Originally Posted by ShadowPeak.com View Post
    Beyond that, there are three different paths that packets can take through the firewall, in order of increasing CPU overhead: Accelerated/SecureXL Path (fastest - minimal CPU), Medium Path (PXL - slower - more CPU) and the Firewall Path (F2F - slowest - most CPU). You can see the percentages for each path with fwaccel stats -s and you provided statistics earlier while the firewall was not slammed showing 79% of traffic in the Accelerated/SecureXL Path (fastest). However now that the firewall is under heavy load almost all of your traffic is in the Medium Path (PXL) based on the second screenshot which is not that unusual, but I would strongly suspect that high-speed LAN traffic between two internal networks (or an internal network and a DMZ) is being pulled into the Medium Path where there is much more CPU overhead. The only blades you have currently enabled that can cause this effect are IPS and APCL. To figure out which one, try disabling IPS by unchecking the box on the firewall object and reinstalling policy. If you are still seeing CPU spikes, also disable application control and install policy. Once you determine which blade is causing the heavy PXL usage at LAN speeds issue we can troubleshoot further.



    We need to figure out which blade is the culprit before dealing with this command's output.
    Fantastic, I'll try disabling the IPS when the load is particularly heavy and see if it improves matters then report back.

    Really appreciate the help, many thanks

  12. #12
    Join Date
    2006-09-26
    Posts
    3,164
    Rep Power
    16

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by RichardPriest View Post
    Many thanks for that, I've actually just reloaded the x2 firewalls so Dynamic dispatcher should now be active on both units.

    Fantastic, I'll try disabling the IPS when the load is particularly heavy and see if it improves matters then report back.

    Really appreciate the help, many thanks
    A question and few comments:

    1- How do you if DD is enable on the firewalls? Can you provide the output of the command "fw ctl multik get_mode"?

    - Enable DD might make the issue worse in other ways. I had an issue with where enable DD might make the traffics process by on CPU core on the inbound and another CPU core on the outbound thus the traffic got dropped. It is a KNOWN issue with DD. I did have an TAC case opened with Checkpoint.

    - Is it possible that the traffics you see as iSCCI is actually Microsoft DFS or Oracle traffics RMAN? Are you running Microsoft DFS or Oracle application in your environment? Might want to investigate that. It is also a known issue in checkpoint as well. Had a TAC case open with Checkpoint too.

    - Just because you disable IPS does not mean that IPS is actually is disabled. IPS is integrated with Checkpoint FW that you just can't simply uncheck the box and expect IPS to be completely OFF. It does not work that way. I learned a painful lesson on that as well.

  13. #13
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,632
    Rep Power
    9

    Default Re: Checkpoint 5400 100% CPU usage

    Is there any chance the iscsi traffic is fragmenting? Might explain high cpu usage as frags basically suck. Would need to packet capture to tell since the firewall is going to reassembly the frags before allow/deny the traffic.

    Also have you checked if the iscsi traffic is really ipx encapsulated over a layer 2 gre tunnel that is streaming Highlander 2?

  14. #14
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by ShadowPeak.com View Post
    Beyond that, there are three different paths that packets can take through the firewall, in order of increasing CPU overhead: Accelerated/SecureXL Path (fastest - minimal CPU), Medium Path (PXL - slower - more CPU) and the Firewall Path (F2F - slowest - most CPU). You can see the percentages for each path with fwaccel stats -s and you provided statistics earlier while the firewall was not slammed showing 79% of traffic in the Accelerated/SecureXL Path (fastest). However now that the firewall is under heavy load almost all of your traffic is in the Medium Path (PXL) based on the second screenshot which is not that unusual, but I would strongly suspect that high-speed LAN traffic between two internal networks (or an internal network and a DMZ) is being pulled into the Medium Path where there is much more CPU overhead. The only blades you have currently enabled that can cause this effect are IPS and APCL. To figure out which one, try disabling IPS by unchecking the box on the firewall object and reinstalling policy. If you are still seeing CPU spikes, also disable application control and install policy. Once you determine which blade is causing the heavy PXL usage at LAN speeds issue we can troubleshoot further.



    We need to figure out which blade is the culprit before dealing with this command's output.
    OK this is the result of fwaccel stats -s after the IPS blade is disabled on the cluster in SmartDashboard, now what dynamic dispatcher is enabled the CPU usage has never gone as high, but when the iSCSI traffic is present everything has a definite slowness to it. (RDP sessions occasionally timeout, lots of egg timers when using applications etc.

    Accelerated conns/Total conns : 1432/7251 (19%)
    Delayed conns/(Accelerated conns + PXL conns) : 121/6818 (1%)
    Accelerated pkts/Total pkts : 2215894079/2938246762 (75%)
    F2Fed pkts/Total pkts : 23422291/2938246762 (0%)
    PXL pkts/Total pkts : 698930392/2938246762 (23%)
    QXL pkts/Total pkts : 0/2938246762 (0%)

    Does that output mean that none of the packets are now going through the medium path?

  15. #15
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,229
    Rep Power
    13

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by RichardPriest View Post
    OK this is the result of fwaccel stats -s after the IPS blade is disabled on the cluster in SmartDashboard, now what dynamic dispatcher is enabled the CPU usage has never gone as high, but when the iSCSI traffic is present everything has a definite slowness to it. (RDP sessions occasionally timeout, lots of egg timers when using applications etc.

    Accelerated conns/Total conns : 1432/7251 (19%)
    Delayed conns/(Accelerated conns + PXL conns) : 121/6818 (1%)
    Accelerated pkts/Total pkts : 2215894079/2938246762 (75%)
    F2Fed pkts/Total pkts : 23422291/2938246762 (0%)
    PXL pkts/Total pkts : 698930392/2938246762 (23%)
    QXL pkts/Total pkts : 0/2938246762 (0%)

    Does that output mean that none of the packets are now going through the medium path?
    That looks pretty good as 75% of traffic is now accelerated even when passing iSCSI traffic and 23% is Medium Path, surprised things still feel slow for you with those kind of statistics. Try disabling APCL now as well and see if the "slowness" improves, once APCL is disabled the amount of fully accelerated traffic should be >90%. Your APCL policy may need some tuning.

    Unlikely the iSCSI traffic is fragmented as another poster mentioned earlier as the fragmented traffic would go F2F (not PXL) and fw ctl pstat did not show any excessive fragmentation statistics.

    To respond to a different poster, if the Dynamic Dispatcher is causing problems with certain types of traffic it can be disabled on the fly for certain port numbers. This procedure is not documented, but did get a mention in my book. Should not be needed for this iSCSI issue:

    Enabling the Dynamic Dispatcher on R77.30 after loading the latest GA Jumbo HFA
    is about as close to a no-brainer as it gets, and I have not personally witnessed any
    situation where enabling the Dynamic Dispatcher caused problems with the firewall or
    the applications traversing it. But interestingly enough, there appears to be a real-time
    mechanism to partially disable the Dynamic Dispatcher on a per-port basis with these
    kernel variables that can be set or queried via the fw ctl set/get commands:

    • dynamic_dispatcher_bypass_add_port
    • dynamic_dispatcher_bypass_ports_number
    • dynamic_dispatcher_bypass_remove_port
    • dynamic_dispatcher_bypass_show_ports


    These kernel variables did not exist in the initial R77.30 code release but seem to
    have been added in one of the R77.30 Jumbo HFAs; they also exist in the R80.10
    firewall code. Be warned however that these variables are undocumented and tampering
    with them is most definitely not supported. But if certain applications are proven to be
    incompatible with the Dynamic Dispatcher for some reason, it is worth a call to the
    Check Point TAC to inquire about this hidden feature rather than disabling the Dynamic
    Dispatcher completely.
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  16. #16
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by ShadowPeak.com View Post
    That looks pretty good as 75% of traffic is now accelerated even when passing iSCSI traffic and 23% is Medium Path, surprised things still feel slow for you with those kind of statistics. Try disabling APCL now as well and see if the "slowness" improves, once APCL is disabled the amount of fully accelerated traffic should be >90%. Your APCL policy may need some tuning.
    Good to know, thanks!
    I've just gone to re-enable IPS and disable APCL, via smart dashboard. It looks like APCL is already disabled!

  17. #17
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by ShadowPeak.com View Post
    That looks pretty good as 75% of traffic is now accelerated even when passing iSCSI traffic and 23% is Medium Path, surprised things still feel slow for you with those kind of statistics. Try disabling APCL now as well and see if the "slowness" improves, once APCL is disabled the amount of fully accelerated traffic should be >90%. Your APCL policy may need some tuning.

    Unlikely the iSCSI traffic is fragmented as another poster mentioned earlier as the fragmented traffic would go F2F (not PXL) and fw ctl pstat did not show any excessive fragmentation statistics.

    To respond to a different poster, if the Dynamic Dispatcher is causing problems with certain types of traffic it can be disabled on the fly for certain port numbers. This procedure is not documented, but did get a mention in my book. Should not be needed for this iSCSI issue:
    IPS is now reenabled and everyone is back at work so the traffic flowing though the firewall is much greater now

    this is a result of feaccel stats -s with the IPS on

    Delayed conns/(Accelerated conns + PXL conns) : 305/15988 (1%)
    Accelerated pkts/Total pkts : 682863864/855004816 (79%)
    F2Fed pkts/Total pkts : 8384835/855004816 (0%)
    PXL pkts/Total pkts : 163756117/855004816 (19%)
    QXL pkts/Total pkts : 0/855004816 (0%)

    with dynamic dispatcher enabled both CPU cores are currently hovering around 40 - 45% each, so in real terms 90%+ before it was enabled. the iSCSI traffic has now been removed too so that's helping an awful lot.

    Is there anything I can do to reduce the CPU usage further?

    EDIT:

    actally looking in CPview (see screenshot attached) there are considerably more PXL connections than there are SecureXL connections. Is there a way I can find out why these are hitting the medium path and not the accelerated path?

    Click image for larger version. 

Name:	pxl conenctions.PNG 
Views:	37 
Size:	29.6 KB 
ID:	1385

    EDIT again:

    this is 30 minutes after turning IPS off via CLI

    Click image for larger version. 

Name:	pxl connections - IPS off.PNG 
Views:	35 
Size:	30.8 KB 
ID:	1386
    Last edited by RichardPriest; 2018-04-03 at 04:34.

  18. #18
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,229
    Rep Power
    13

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by RichardPriest View Post
    IPS is now reenabled and everyone is back at work so the traffic flowing though the firewall is much greater now

    this is a result of feaccel stats -s with the IPS on

    Delayed conns/(Accelerated conns + PXL conns) : 305/15988 (1%)
    Accelerated pkts/Total pkts : 682863864/855004816 (79%)
    F2Fed pkts/Total pkts : 8384835/855004816 (0%)
    PXL pkts/Total pkts : 163756117/855004816 (19%)
    QXL pkts/Total pkts : 0/855004816 (0%)

    with dynamic dispatcher enabled both CPU cores are currently hovering around 40 - 45% each, so in real terms 90%+ before it was enabled. the iSCSI traffic has now been removed too so that's helping an awful lot.

    Is there anything I can do to reduce the CPU usage further?
    In my book the stated goal is to have about 50% average utilization on the CPUs during the firewall's busiest period, thus allowing enough "headroom" for the firewall to potentially burst at double that speed. This is a realistic goal in most environments and it sounds like you are there!


    EDIT:

    actally looking in CPview (see screenshot attached) there are considerably more PXL connections than there are SecureXL connections. Is there a way I can find out why these are hitting the medium path and not the accelerated path?

    Click image for larger version. 

Name:	pxl conenctions.PNG 
Views:	37 
Size:	29.6 KB 
ID:	1385

    EDIT again:

    this is 30 minutes after turning IPS off via CLI

    Click image for larger version. 

Name:	pxl connections - IPS off.PNG 
Views:	35 
Size:	30.8 KB 
ID:	1386
    Having most traffic in PXL is normal on most firewalls; F2F is what you want to avoid. If the iSCSI traffic was still present and/or you hadn't reached the 50% utilization goal, the next steps taking into consideration which blades you have enabled would be:

    1) Disable any IPS signatures in the current IPS profile with a "Performance Impact" of Critical or High: helps more traffic potentially get processed in the SXL/accelerated path
    2) Tune the APCL/URLF policy: get rid of the "Any Any Any Recognized Accept" cleanup rule at the bottom and make sure that "Any" is not used in the source or destination of any APCL/URLF rule, this exempts high speed LAN-LAN or LAN-DMZ traffic from processing by APCL/URLF in PXL and potentially makes more traffic eligible for the SXL/accelerated path.

    All this is covered step by step in my book.
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  19. #19
    Join Date
    2006-09-26
    Posts
    3,164
    Rep Power
    16

    Default Re: Checkpoint 5400 100% CPU usage

    Quote Originally Posted by ShadowPeak.com View Post
    In my book the stated goal is to have about 50% average utilization on the CPUs during the firewall's busiest period, thus allowing enough "headroom" for the firewall to potentially burst at double that speed. This is a realistic goal in most environments and it sounds like you are there!



    Having most traffic in PXL is normal on most firewalls; F2F is what you want to avoid. If the iSCSI traffic was still present and/or you hadn't reached the 50% utilization goal, the next steps taking into consideration which blades you have enabled would be:

    1) Disable any IPS signatures in the current IPS profile with a "Performance Impact" of Critical or High: helps more traffic potentially get processed in the SXL/accelerated path
    2) Tune the APCL/URLF policy: get rid of the "Any Any Any Recognized Accept" cleanup rule at the bottom and make sure that "Any" is not used in the source or destination of any APCL/URLF rule, this exempts high speed LAN-LAN or LAN-DMZ traffic from processing by APCL/URLF in PXL and potentially makes more traffic eligible for the SXL/accelerated path.

    All this is covered step by step in my book.
    Let say that step #1 and step #2 are done like you suggested and still has high CPU, what is the next step?

  20. #20
    Join Date
    2018-02-26
    Posts
    12
    Rep Power
    0

    Default Re: Checkpoint 5400 100% CPU usage

    Many thanks for all that, I have asked my manager to authorise a purchase of your book :)

Page 1 of 2 12 LastLast

Similar Threads

  1. FW Monitor CPU Usage
    By igormaxfv in forum fw monitor, tcpdump and Wireshark
    Replies: 1
    Last Post: 2013-03-01, 19:34
  2. 100% CPU usage in SPLAT - NGX R65
    By akchakravarthi09 in forum Check Point SecurePlatform (SPLAT)
    Replies: 8
    Last Post: 2010-06-11, 06:09
  3. CLI usage
    By westy2222 in forum Miscellaneous
    Replies: 2
    Last Post: 2010-05-24, 15:12
  4. Memory Usage in Checkpoint
    By anakalem in forum Miscellaneous
    Replies: 0
    Last Post: 2008-04-08, 21:52
  5. FW1 and proxy usage
    By shoenix in forum Content Security/Security Servers/CVP/UFP
    Replies: 0
    Last Post: 2008-03-27, 07:13

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •