CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


Tim Hall has done it again! He has just released the 2nd edition of "Max Power".
Rather than get into details here, I urge you to check out this announcement post.
It's a massive upgrade, and well worth checking out. -E

 

Page 1 of 2 12 LastLast
Results 1 to 20 of 39

Thread: CPU, CPU, CPU, the mistery of the CPU

  1. #1
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default CPU, CPU, CPU, the mistery of the CPU

    Hi guys,

    I have a Nokia Checkpoint IP390 cluster running IPSO R71. It has been running for many years, with no problems at all. But in the last 2 months, the CPU of BOTH nodes is always above 70% usage, reaching 100% from time to time. We have had several engineers looking into them, and can't find any explanation to it.

    We have studied the traffic along this year, and there isn't any traffic increasing at all in the last months, the traffic patterns keep the same. We have not added any new feature to the cluster, and this cluster works only as a firewall (no VPN tunnels, no IPs, no APP Control, etc.)

    The first weird thing is that BOTH nodes present high CPU utilisation, the active and the passive nodes. Anyone can explain this?

    This is what the CPU troubleshooting commands show:

    TOP:

    last pid: 39378; load averages: 0.34, 0.52, 0.55 up 21+02:14:36 11:45:09
    49 processes: 1 running, 48 sleeping
    CPU states: 21.3% user, 0.0% nice, 27.7% system, 2.0% interrupt, 49.0% idle
    Mem: 339M Active, 70M Inact, 384M Wired, 18M Cache, 99M Buf, 184M Free
    Swap: 8192M Total, 25M Used, 8167M Free

    PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
    39248 root 6 44 0 117M 73188K select 7:27 50.98% fw
    891 root 1 44 0 7920K 2228K select 23:18 0.00% snmpd
    894 root 1 4 0 3272K 996K sbwait 7:25 0.00% perfmond
    649 root 1 44 0 10248K 1304K select 4:05 0.00% ipsrd
    24941 root 1 44 0 8296K 792K select 0:21 0.00% httpd
    892 root 1 44 0 3888K 336K select 0:09 0.00% monitord
    35659 root 1 44 0 3624K 1452K select 0:08 0.00% ntpd
    39240 root 1 8 0 76988K 70840K nanslp 0:08 0.00% cphamcset
    38974 root 5 100 0 84148K 60768K ucond 0:07 0.00% cpd
    39309 root 2 98 0 90884K 80560K ucond 0:06 0.00% vpn
    893 root 1 8 0 1492K 356K nanslp 0:04 0.00% cron
    39306 root 2 97 0 66148K 50376K ucond 0:03 0.00% fwssd
    39305 root 2 97 0 66148K 50352K ucond 0:03 0.00% fwssd
    257 root 1 44 0 1468K 388K select 0:03 0.00% syslogd
    39316 root 2 98 0 66148K 50248K ucond 0:03 0.00% fwssd
    653 root 1 44 0 2012K 520K select 0:03 0.00% ifm
    38992 root 2 44 0 32136K 13528K ucond 0:02 0.00% cpsnmpd
    889 root 1 44 0 9816K 188K select 0:02 0.00% clishd
    39315 root 1 44 0 12696K 7684K select 0:01 0.00% dtls
    651 root 1 44 0 7016K 1200K select 0:01 0.00% xpand
    560 root 1 44 0 7516K 1668K select 0:00 0.00% cprid
    24942 www 1 20 0 9384K 3240K lockf 0:00 0.00% httpd
    38686 root 1 44 0 6256K 1792K select 0:00 0.00% sshd-x
    28518 www 1 20 0 9400K 2848K lockf 0:00 0.00% httpd
    38960 root 1 44 0 8784K 4108K select 0:00 0.00% cpwd
    24943 www 1 20 0 9380K 3104K lockf 0:00 0.00% httpd
    28527 www 1 44 0 9364K 3228K select 0:00 0.00% httpd
    38690 root 1 20 0 4172K 2580K pause 0:00 0.00% csh
    248 root 1 44 0 1216K 176K select 0:00 0.00% pm
    895 root 1 44 0 3372K 328K select 0:00 0.00% sshd-x
    888 root 1 44 0 3368K 268K select 0:00 0.00% ifwd
    887 root 1 44 0 1524K 0K select 0:00 0.00% <inetd>
    290 root 1 20 0 2848K 0K pause 0:00 0.00% <csh>
    39378 root 1 44 0 2784K 1408K RUN 0:00 0.00% top
    940 root 1 5 0 1436K 28K ttyin 0:00 0.00% getty


    cpstat -f cpu os

    CPU User Time (%): 31
    CPU System Time (%): 42
    CPU Idle Time (%): 26
    CPU Usage (%): 73
    CPU Queue Length: 2
    CPU Interrupts/Sec: -
    CPUs Number: 1


    Anyone could help me to find out what's going on here?

    Thanks in advance

  2. #2
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    It seems the main issue is that fwd process is running rather high CPU user mode. The process is responsible for two different things:

    1. regular logging
    2. full cluster sync after boot.

    I would advise you to check the first one and then also consider the second option. For p.1 check if you did not increase amount of logs lately. Check also that logs are coming as expected, and there is no local logging issues. I would bet on logging, but yone never knows before a detailed investigation.

    Also, you probably know that R71 is not supported for many years now. Also, end of engineering support for IP390 is the end of the year.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  3. #3
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by varera View Post
    It seems the main issue is that fwd process is running rather high CPU user mode. The process is responsible for two different things:

    1. regular logging
    2. full cluster sync after boot.

    I would advise you to check the first one and then also consider the second option. For p.1 check if you did not increase amount of logs lately. Check also that logs are coming as expected, and there is no local logging issues. I would bet on logging, but yone never knows before a detailed investigation.

    Also, you probably know that R71 is not supported for many years now. Also, end of engineering support for IP390 is the end of the year.
    Hi Varera,

    The size of the daily log files hasn't increased at all - actually they are even smaller now as one of the first actions we did was to remove logging from many rules to reduce the CPU load.
    We are aware of the lack of support for R71, and we are going to replace this cluster, however this is impacting our operations now and we can't find any explanation.

    Thanks

  4. #4
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Check if you have any sync issues. Start with fw pstat command and see if there any sync errors there.

    Fw using user mode in your case is not normal. Try debugging th process with fw debud, but be careful, it requires even more CPU
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  5. #5
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Hi Valeri, that command is not available:

    [admin]# fw pstat
    Usage:
    fw ver [-h] ... # Display version
    fw kill [-sig_no] procname # Send signal to a daemon
    fw putkey ... # Client server keys
    fw sam ... # Control sam server
    fw sam_policy ... # SAM policy editor
    fw fetch targets # Fetch last policy
    fw tab [-h] ... # Kernel tables content
    fw monitor [-h] ... # Monitor VPN-1/FW-1 traffic
    fw ctl [args] # Control kernel
    fw lichosts # Display protected hosts
    fw log [-h] ... # Display logs
    fw logswitch [-h target] [+|-][oldlog] # Create a new log file;
    # the old log is moved
    fw repairlog ... # Log index recreation
    fw mergefiles ... # log files merger
    fw lslogs ... # Remote machine log file list
    fw fetchlogs ... # Fetch logs from a remote host

  6. #6
    Join Date
    2006-04-27
    Location
    Twillight zone
    Posts
    1,009
    Rep Power
    15

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by malaga1980 View Post
    Hi Valeri, that command is not available:
    He probably ment "fw ctl pstat" or "fw stat"

  7. #7
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by abusharif View Post
    He probably ment "fw ctl pstat" or "fw stat"
    [admin]# fw ctl pstat

    Machine Capacity Summary:
    Memory used: 14% (73MB out of 499MB) - below low watermark
    Concurrent Connections: 2% (5027 out of 199900) - below low watermark
    Aggressive Aging is disabled

    Hash kernel memory (hmem) statistics:
    Total memory allocated: 156237824 bytes in 38142 4KB blocks using 2 pools
    Initial memory allocated: 20971520 bytes (Hash memory extended by 135266304 bytes)
    Memory allocation limit: 314572800 bytes using 10 pools
    Total memory bytes used: 16020048 unused: 140217776 (89.75%) peak: 26549712
    Total memory blocks used: 4800 unused: 33342 (87%) peak: 7106
    Allocations: 2517106293 alloc, 0 failed alloc, 2516969319 free

    System kernel memory (smem) statistics:
    Total memory bytes used: 198992428 peak: 226167900
    Blocking memory bytes used: 1565340 peak: 1605384
    Non-Blocking memory bytes used: 197427088 peak: 224562516
    Allocations: 13479330 alloc, 0 failed alloc, 13478555 free, 0 failed free

    Kernel memory (kmem) statistics:
    Total memory bytes used: 58638892 peak: 94556808
    Allocations: 2530585621 alloc, 0 failed alloc, 2530447874 free, 0 failed free
    External Allocations: 0 for packets, 0 for SXL

    Kernel stacks:
    0 bytes total, 0 bytes stack size, 0 stacks,
    0 peak used, 0 max stack bytes used, 0 min stack bytes used,
    0 failed stack calls

    INSPECT:
    0 packets, 0 operations, 0 lookups,
    0 record, 0 extract

    Cookies:
    1920576575 total, 0 alloc, 0 free,
    13 dup, 313571826 get, 2193242 put,
    1924606804 len, 70 cached len, 0 chain alloc,
    0 chain free

    Connections:
    62391334 total, 57832389 TCP, 3906048 UDP, 652887 ICMP,
    10 other, 0 anticipated, 12808 recovered, 5027 concurrent,
    12214 peak concurrent

    Fragments:
    56 fragments, 27 packets, 0 expired, 0 short,
    0 large, 0 duplicates, 0 failures

    NAT:
    3839533/0 forw, 63021/0 bckw, 9415 tcpudp,
    12329 icmp, 4189511-4778823 alloc

    Sync:
    Live connections update: on
    Version: new
    Status: Able to Send/Receive sync packets
    Sync packets sent:
    total : 43733148, retransmitted : 66, retrans reqs : 7, acks : 38
    Sync packets received:
    total : 19594185, were queued : 1332628, dropped by net : 7
    retrans reqs : 56, received 76 acks
    retrans reqs for illegal seq : 0
    dropped updates as a result of sync overload: 0

  8. #8
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Correct, I meant fw ctl pstat

    You need to see and compare both modules. From what I see, here is a huge amount of queued packets. Check VRRP to make sure only one is master.

    But the clustering is not okay
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  9. #9
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Yep, only one master:

    NokiaIP390:102> show vrrp

    VRRP State
    VRRP Router State: Up
    Flags: On,MonitorFirewall
    Interface enabled: 6
    Virtual routers configured: 6
    In Init state 0
    In Backup state 0
    In Master state 6

    NokiaIP390:102> show vrrp

    VRRP State
    VRRP Router State: Up
    Flags: On,MonitorFirewall
    Interface enabled: 6
    Virtual routers configured: 6
    In Init state 0
    In Backup state 6
    In Master state 0


    When you say "You need to see and compare both modules" you mean both nodes?

    One of the tests I did was to stop the CP services in the passive node (cpstop) for a few hours just to see if the CPU of the active node decreased, but it didn't reduce at all.

    About the queued packets you mention, I wonder may those queued packets be the symptom and the high CPU the cause, or the oposite - may the CPU usage be that high that causes those packets to be queued as it can't process all?

    I'm lost and don't know what else to try :(
    Last edited by malaga1980; 2016-11-15 at 09:34.

  10. #10
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Update, I just realized that the queued packets value is dramatically different between bot nodes:

    Currently Active node: Sync packets received: total : 19607658, were queued : 1332628, dropped by net : 7
    Currently passive node: Sync packets received: total : 10646089, were queued : 7, dropped by net : 4

    Is this somehow meaningful?

  11. #11
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,651
    Rep Power
    10

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by malaga1980 View Post
    Update, I just realized that the queued packets value is dramatically different between bot nodes:

    Currently Active node: Sync packets received: total : 19607658, were queued : 1332628, dropped by net : 7
    Currently passive node: Sync packets received: total : 10646089, were queued : 7, dropped by net : 4

    Is this somehow meaningful?
    39248 root 6 44 0 117M 73188K select 7:27 50.98% fw <- this is fwd not a firewall worker.

    fwd is doing two things.

    sending logs from the gateway to the managment server.
    sycning connections between the two gateways.

    Well.. it does more then that but thats mostly what it does. Those are the things you should be looking at to lower cpu usage. Logging less and syncing less data.

    DNS and HTTP are good things to stop syncing.

  12. #12
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by jflemingeds View Post
    39248 root 6 44 0 117M 73188K select 7:27 50.98% fw <- this is fwd not a firewall worker.

    fwd is doing two things.

    sending logs from the gateway to the managment server.
    sycning connections between the two gateways.

    Well.. it does more then that but thats mostly what it does. Those are the things you should be looking at to lower cpu usage. Logging less and syncing less data.

    DNS and HTTP are good things to stop syncing.
    There is an issues with Sync, too many queued packets from "fw ctl pstat"

    whoever, this does not explain fw staying alive after cpstop. That makes no sense, unless this fw is a zombie. Reboot should fix that.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  13. #13
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by varera View Post
    There is an issues with Sync, too many queued packets from "fw ctl pstat"

    whoever, this does not explain fw staying alive after cpstop. That makes no sense, unless this fw is a zombie. Reboot should fix that.
    We rebooted both nodes on 20th October, no success

  14. #14
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by jflemingeds View Post
    Those are the things you should be looking at to lower cpu usage. Logging less and syncing less data.

    DNS and HTTP are good things to stop syncing.
    How can I set which protocols to sync/to exclude from syncing?

  15. #15
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by malaga1980 View Post
    We rebooted both nodes on 20th October, no success
    I am sorry, but this is impossible. If you stop FW services with cpstop command, fw process goes down. Something is wrong here. Some of the information you are giving is incorrect.

    To tune synchronization on a service, open service itself in SmartDashbooar, go to Advanced tab and check/uncheck "Synchronize" checkbox. make sure you install policy afterwards.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  16. #16
    Join Date
    2007-06-04
    Posts
    3,304
    Rep Power
    17

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by malaga1980 View Post
    We rebooted both nodes on 20th October, no success
    Find the process id of the fw process with

    ps -auxwww | grep fw

    and then

    kill -9 the pid number of the fw process

    then reboot the two boxes

  17. #17
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by varera View Post
    I am sorry, but this is impossible. If you stop FW services with cpstop command, fw process goes down. Something is wrong here. Some of the information you are giving is incorrect.

    To tune synchronization on a service, open service itself in SmartDashbooar, go to Advanced tab and check/uncheck "Synchronize" checkbox. make sure you install policy afterwards.
    We rebooted the passive node, then forced the fail over the other node, and then restarted it also.

    Also last week I issued "cpstop" on the passive node just to see if the CPU in the active node became lower (thinking of a possible sync related issue), but nothing changed, the CPU in the active node kept the same while the passive one was stopped.

  18. #18
    Join Date
    2016-03-10
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Question: may this be caused by a routing loop? if so...how to detect it and troubleshooting it?

  19. #19
    Join Date
    2014-01-23
    Posts
    28
    Rep Power
    0

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    What is the output of cpwd_admin list? Do you see process restarting?
    If dynamic routing is not in use and is enabled, disable that
    When you shutdown one member completely what happens to the active member? Does its cpu decrease?
    Was there any change in the environment that was recently introduced? New rules or maybe like active logging turned on?

    My thought process is to understand is the issue caused by clustering mechanism or is it traffic induced

    Does disabling sync or unplugging sync cable do anything?

    As already mentioned by others, excessive logging can cuse issue. high io wait - look into that. disable sync on https and dns . How are the interfaces stats? Is an interface being hammered? In perfect world, you could unplug an interface one at a time and see if that drops down cpu. Another option would be to sniff interfaces and anayluze top hitters. How are the switch interfaces these firewalls are plugged in? Any collisions or anything like that?

  20. #20
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: CPU, CPU, CPU, the mistery of the CPU

    Quote Originally Posted by malaga1980 View Post
    We rebooted the passive node, then forced the fail over the other node, and then restarted it also.

    Also last week I issued "cpstop" on the passive node just to see if the CPU in the active node became lower (thinking of a possible sync related issue), but nothing changed, the CPU in the active node kept the same while the passive one was stopped.
    Ah, okay. From what you wrote I assumed that after cpstop on one of the cluster members its fw continued to run taking CPU time.

    You do have issues with synchronization, as mentioned before. Check pstat , sync section, on both members to see if some errors are growing. There should not be any queueing at all, actually. I think your CPU is just too weak to cope with sync fast enough, hence the queue, hence even more CPU time to spend.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

Page 1 of 2 12 LastLast

Similar Threads

  1. NGX R61 VPN mistery... (help!)
    By fwadman in forum IPsec VPN Blade (Virtual Private Networks)
    Replies: 5
    Last Post: 2006-10-27, 10:57

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •