CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


CPUG Challenge 2018?? We will be holding another CPUG Challenge for 2018.
The plan is to time it around CPX again (earlier this year), but not necessarily limit it to those in attendance.
I'll provide more details as we get a bit closer, but be ready! -E

 

Results 1 to 19 of 19

Thread: high cpu on the fw process of the standby firewall

  1. #1
    Join Date
    2006-09-26
    Posts
    3,055
    Rep Power
    15

    Default high cpu on the fw process of the standby firewall

    Firewall is R75.47 ClusterXL with Active/Standby. Hardware is high end 13500 appliances.

    I am seeing high cpu on the fw process of the standby firewall:

    top - 10:35:56 up 388 days, 20:07, 1 user, load average: 2.87, 2.79, 2.67
    Tasks: 291 total, 4 running, 278 sleeping, 0 stopped, 9 zombie
    Cpu0 : 4.9%us, 7.3%sy, 0.0%ni, 87.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu1 : 2.5%us, 50.0%sy, 0.0%ni, 47.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu2 : 0.0%us, 7.3%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu8 : 2.4%us, 36.6%sy, 0.0%ni, 61.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu9 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu12 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu13 : 2.4%us, 12.2%sy, 0.0%ni, 85.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu14 : 2.4%us, 42.9%sy, 0.0%ni, 52.4%id, 0.0%wa, 0.0%hi, 2.4%si, 0.0%st
    Cpu15 : 2.4%us, 43.9%sy, 0.0%ni, 53.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Mem: 65961224k total, 14264088k used, 51697136k free, 547476k buffers
    Swap: 69223992k total, 0k used, 69223992k free, 1022240k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    9505 admin 25 0 680m 261m 19m R 91 0.4 235367:41 fw
    6912 admin 15 0 0 0 0 S 2 0.0 266:08.44 fw_worker_1

    Thoughts?

  2. #2
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,480
    Rep Power
    8

    Default Re: high cpu on the fw process of the standby firewall

    Have you thought about getting support from a third party?

  3. #3
    Join Date
    2006-09-26
    Posts
    3,055
    Rep Power
    15

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by jflemingeds View Post
    Have you thought about getting support from a third party?
    you mean third party support for this?

  4. #4
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,480
    Rep Power
    8

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by cciesec2006 View Post
    you mean third party support for this?
    no, for all technical issues.

  5. #5
    Join Date
    2006-09-26
    Posts
    3,055
    Rep Power
    15

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by jflemingeds View Post
    no, for all technical issues.
    What can they do better than Checkpoint TAC? Eventually, it has to stop at checkpoint TAC right?

    Isn't it like another barrier to deal with? Does the third party have the environment to replicate my production issues?

  6. #6
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    4

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by cciesec2006 View Post
    What can they do better than Checkpoint TAC?
    A - Depending on who your partner would be, possibly lots more... They could/would likely have more knowledge about Cisco and other 3rd party devices/services, and have actionable knowledge on those devices, and the ability to troubleshoot them. Granted, maybe not the most applicable here to this problem specifically, but more general and overall...

    Quote Originally Posted by cciesec2006 View Post
    Eventually, it has to stop at checkpoint TAC right?
    A - Nope, not at all. There are some really awesome partners out there who rarely contact CP TAC, and only do so when there is true need for just a hotfix, and that is all CP does for the issue as a whole, just provide the requested hotfix. For things like (mis)configurations, documentation requests, general troubleshooting, etc... it is very common for the partner to handle all this withOUT involving Checkpoint. Though that also depends on their skill(s) (or lack thereof).

    Quote Originally Posted by cciesec2006 View Post
    Isn't it like another barrier to deal with?
    A - That depends on the skills and knowledge of the partner. Sometime the issue is fully resolved by the partner alone. Other times, the partner is just a go-between the end-customer and TAC.

    Quote Originally Posted by cciesec2006 View Post
    Does the third party have the environment to replicate my production issues?
    A - Again, depending on the partner. For the most part, yes, they should. There is also the added benefit that Partners (typically) don't support JUST Checkpoint. So, unlike the TAC labs that have Checkpoint (and Checkpoint ONLY) equipment, Partner labs typically have (some at least) of that 3rd party equipment; there is much from a 3rd party perspective that CP TAC simply CANNOT reproduce, if for no other reason than lacking the 3rd party equipment.

  7. #7
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,480
    Rep Power
    8

    Default Re: high cpu on the fw process of the standby firewall

    What can they do better than Checkpoint TAC?
    Provide you with someone that knows your environment. This means you don't have to explain a setup every time you call support. This cuts down on the time needed to move things forward. Also you know the technical abilities of the partner and are talking to a smaller team. Its basically like having Diamond light.

    Eventually, it has to stop at checkpoint TAC right?
    If it requires patching yes, but to get to that point is difficult, time consuming and frustrating is it not? Many times the real issue slowing down support is figuring out root cause and how to properly debug the issue. Once a root cause is found and a targeted debug is run with showing exactly where the error is the support process will be sped up considerably.

    Isn't it like another barrier to deal with?
    I'm surprised that you've said this. You always seem incredibly unhappy with the technical abilities of checkpoint's support (deserved or not). I would have thought removing you from having to interface with tech support would relieve you of a lot of frustration.

    Does the third party have the environment to replicate my production issues?
    I can't speak for anyone else, but it depends on what level of service you want. For sure if you wanted a full replication that's completely possible. It would require some agreements be in place and someway to transfer data in a secure method. If its only during a break fix issue then it would be just be a matter of providing a cpinfo and migrate export as a starting point. Its also very possible to take a packet capture and replay it in a lab. This can't always be done. For example a proxy based service or maybe a hidenat would make this difficult, but not impossible.

  8. #8
    Join Date
    2005-08-14
    Location
    Gig Harbor, WA, USA
    Posts
    2,388
    Rep Power
    15

    Default Re: high cpu on the fw process of the standby firewall

    Some partners provide excellent support, as jflemingeds says.
    You can also investigate Diamond Support from Check Point, which gives you an assigned engineer.
    http://phoneboy.org
    Unless otherwise noted, views expressed are my own

  9. #9
    Join Date
    2015-12-23
    Posts
    47
    Rep Power
    0

    Default Re: high cpu on the fw process of the standby firewall

    one of our standby gateway cpu pegged at 100% this morning. totally unresponsive and had to cold boot. the primary unit flooded with these messages. and there wasnt anything useful from the standby log.


    May 12 04:36:23 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze][CUL - Cluster] CUL should be OFF (short timeout of 10 seconds expired) but at least one member reported high CPU usage 0 seconds ago
    May 12 04:36:24 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze_on_remote][CUL - Cluster] CUL state is ON for 177 seconds, remote Member 1 reporting high kernel CPU usage (100%), threshold=80%, local kernel CPU usage is 0%
    May 12 04:36:24 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze][CUL - Cluster] CUL should be OFF (short timeout of 10 seconds expired) but at least one member reported high CPU usage 0 seconds ago
    May 12 04:36:25 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze_on_remote][CUL - Cluster] CUL state is ON for 178 seconds, remote Member 1 reporting high kernel CPU usage (100%), threshold=80%, local kernel CPU usage is 1%
    May 12 04:36:25 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze][CUL - Cluster] CUL should be OFF (short timeout of 10 seconds expired) but at least one member reported high CPU usage 0 seconds ago
    May 12 04:36:26 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze_on_remote][CUL - Cluster] CUL state is ON for 179 seconds, remote Member 1 reporting high kernel CPU usage (100%), threshold=80%, local kernel CPU usage is 4%
    May 12 04:36:26 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze][CUL - Cluster] CUL should be OFF (short timeout of 10 seconds expired) but at least one member reported high CPU usage 0 seconds ago
    May 12 04:36:27 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze][CUL - Cluster] Changing CUL state to OFF, cluster is still underload but CUL timeout expired (180 seconds)
    May 12 04:36:27 MPFW01 kernel: [fw4_1];FW-1: [cul_load_freeze_on_remote][CUL - Cluster] CUL state is ON for 0 seconds, remote Member 1 reporting high kernel CPU usage (100%), threshold=80%, local kernel CPU usage is 5%
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	mpfw2.png 
Views:	83 
Size:	628.9 KB 
ID:	1126  

  10. #10
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,480
    Rep Power
    8

    Default Re: high cpu on the fw process of the standby firewall

    You didn't grab dmesg output by chance before rebooting did you? Kernel level messages get logged there and should reveal something useful.

    Events looks like a kernel level process so if it goes crazy with the cpu it means now everything happening in the kernel is now fighting for cpu, namely nic drivers, the kernel, all filesystem access etc.

    Basically that's bad. Not sure if anything useful showed up in /var/log/messages* or not but its worth a look if you haven't already.

  11. #11
    Join Date
    2006-09-26
    Posts
    3,055
    Rep Power
    15

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by jflemingeds View Post
    You didn't grab dmesg output by chance before rebooting did you? Kernel level messages get logged there and should reveal something useful.

    Events looks like a kernel level process so if it goes crazy with the cpu it means now everything happening in the kernel is now fighting for cpu, namely nic drivers, the kernel, all filesystem access etc.

    Basically that's bad. Not sure if anything useful showed up in /var/log/messages* or not but its worth a look if you haven't already.
    I am basically seeing the same thing on my Active firewalls everyday:

    May 11 14:40:46 napa1 kernel: [fw_1];FW-1: [cul_load_freeze][CUL - Cluster] Setting CUL FREEZE_ON, high kernel CPU usage (82%) on local Member 0, threshold = 80%
    May 11 15:04:04 napa1 kernel: [fw_1];FW-1: [cul_load_freeze][CUL - Cluster] Setting CUL FREEZE_ON, high kernel CPU usage (82%) on local Member 0, threshold = 80%
    May 11 17:06:34 napa1 kernel: [fw_1];FW-1: [cul_load_freeze][CUL - Cluster] Setting CUL FREEZE_ON, high kernel CPU usage (88%) on local Member 0, threshold = 80%

    /var/log/messages essentially show the same above message. the same goes for dmesg.

    I actually asked Checkpoint to troubleshoot this issue, they asked for the usual, cpinfo but did absolutely nothing in the end.

  12. #12
    Join Date
    2015-12-23
    Posts
    47
    Rep Power
    0

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by jflemingeds View Post
    You didn't grab dmesg output by chance before rebooting did you? Kernel level messages get logged there and should reveal something useful.

    Events looks like a kernel level process so if it goes crazy with the cpu it means now everything happening in the kernel is now fighting for cpu, namely nic drivers, the kernel, all filesystem access etc.

    Basically that's bad. Not sure if anything useful showed up in /var/log/messages* or not but its worth a look if you haven't already.
    no. I wish I did. I will remember to grab it next time.

    this was a standby gateway. I've reviewed the messages files. nothing unusual around the time it happen, just some system crond jobs. these jobs seem to run several times daily. problem started at 04:09:25. seems like a rare problem as I could not find any google hits.



    May 12 02:00:01 MPFW02 crond[8379]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:10:01 MPFW02 crond[32213]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:20:01 MPFW02 crond[23745]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:30:01 MPFW02 crond[15167]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:40:01 MPFW02 crond[6500]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:50:01 MPFW02 crond[30296]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:00:01 MPFW02 crond[21786]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:10:01 MPFW02 crond[13333]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:20:01 MPFW02 crond[4858]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:30:01 MPFW02 crond[27637]: (root) CMD (/bin/pam_nonuse_daily -c pwcontrol:nonuse )
    May 12 03:30:01 MPFW02 crond[27639]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:40:01 MPFW02 crond[19011]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:50:01 MPFW02 crond[10384]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 04:00:01 MPFW02 crond[1839]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 04:09:25 MPFW02 kernel: [fw4_1];FW-1: [cul_load_freeze][CUL - Cluster] Setting CUL FREEZE_ON, high kernel CPU usage (86%) on local Member 1, threshold = 80%
    May 12 04:09:26 MPFW02 routed[14090]: ifa_unnumbered_find_proxy: no proxy interface found
    May 12 04:09:26 MPFW02 kernel: eth1-02.62: dev_set_promiscuity(master, 1)
    May 12 04:09:26 MPFW02 kernel: device eth1-02 entered promiscuous mode
    May 12 04:09:26 MPFW02 kernel: device eth1-02.62 entered promiscuous mode
    May 12 04:09:28 MPFW02 kernel: Passive ARP hook successfully installed!
    May 12 04:09:28 MPFW02 kernel: parpdrv ioctl: cmd 2011
    May 12 04:09:28 MPFW02 routed[14090]: vrrp_vr_master: interface eth1-02.62, VRID 105: state=MASTER
    May 12 04:09:28 MPFW02 routed[14090]: ifa_unnumbered_find_proxy: no proxy interface found
    May 12 04:09:28 MPFW02 kernel: eth1-02.55: dev_set_promiscuity(master, 1)

  13. #13
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,480
    Rep Power
    8

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by wayne0206 View Post
    no. I wish I did. I will remember to grab it next time.

    this was a standby gateway. I've reviewed the messages files. nothing unusual around the time it happen, just some system crond jobs. these jobs seem to run several times daily. problem started at 04:09:25. seems like a rare problem as I could not find any google hits.



    May 12 02:00:01 MPFW02 crond[8379]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:10:01 MPFW02 crond[32213]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:20:01 MPFW02 crond[23745]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:30:01 MPFW02 crond[15167]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:40:01 MPFW02 crond[6500]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 02:50:01 MPFW02 crond[30296]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:00:01 MPFW02 crond[21786]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:10:01 MPFW02 crond[13333]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:20:01 MPFW02 crond[4858]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:30:01 MPFW02 crond[27637]: (root) CMD (/bin/pam_nonuse_daily -c pwcontrol:nonuse )
    May 12 03:30:01 MPFW02 crond[27639]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:40:01 MPFW02 crond[19011]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 03:50:01 MPFW02 crond[10384]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 04:00:01 MPFW02 crond[1839]: (root) CMD (/usr/lib/sa/sa1 1 1)
    May 12 04:09:25 MPFW02 kernel: [fw4_1];FW-1: [cul_load_freeze][CUL - Cluster] Setting CUL FREEZE_ON, high kernel CPU usage (86%) on local Member 1, threshold = 80%
    May 12 04:09:26 MPFW02 routed[14090]: ifa_unnumbered_find_proxy: no proxy interface found
    May 12 04:09:26 MPFW02 kernel: eth1-02.62: dev_set_promiscuity(master, 1)
    May 12 04:09:26 MPFW02 kernel: device eth1-02 entered promiscuous mode
    May 12 04:09:26 MPFW02 kernel: device eth1-02.62 entered promiscuous mode
    May 12 04:09:28 MPFW02 kernel: Passive ARP hook successfully installed!
    May 12 04:09:28 MPFW02 kernel: parpdrv ioctl: cmd 2011
    May 12 04:09:28 MPFW02 routed[14090]: vrrp_vr_master: interface eth1-02.62, VRID 105: state=MASTER
    May 12 04:09:28 MPFW02 routed[14090]: ifa_unnumbered_find_proxy: no proxy interface found
    May 12 04:09:28 MPFW02 kernel: eth1-02.55: dev_set_promiscuity(master, 1)
    This alone might be enough of a hint. BTW anything on the other firewall in this time frame?

    Take a look at sk106266. Seems valid all the way to R77.30. Now the error message isn't the same, but the symptoms are.

    what version is this again?
    Why are you using VRRP as well? On IPSO VRRP is rock solid, but on Gaia/Splat clusterXL is the way to go.
    BTW one other thing to look for. Have you checked /var/log/dump/usermode to see if there is a dump file (check both firewalls). I'm guessing routed dump if anything.

  14. #14
    Join Date
    2016-10-19
    Posts
    24
    Rep Power
    0

    Default Re: high cpu on the fw process of the standby firewall

    Just curious, sk106266 does not exist. Am I looking at the wrong one?

    Thanks

  15. #15
    Join Date
    2006-09-26
    Posts
    3,055
    Rep Power
    15

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by venkata View Post
    Just curious, sk106266 does not exist. Am I looking at the wrong one?

    Thanks
    That SK was available last year, I remembered reading it. My guess is that the SK went horribly wrong and got pulled

  16. #16
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,480
    Rep Power
    8

    Default Re: high cpu on the fw process of the standby firewall

    The screen shot shows a kernel process called events. That was what I was keying on. May not be related to your issue. Start a new thread to be sure.

  17. #17
    Join Date
    2006-09-26
    Posts
    3,055
    Rep Power
    15

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by jflemingeds View Post
    The screen shot shows a kernel process called events. That was what I was keying on. May not be related to your issue. Start a new thread to be sure.
    Just checked support Center and I am not seeing sk106266 either :-(

  18. #18
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,480
    Rep Power
    8

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by venkata View Post
    Just curious, sk106266 does not exist. Am I looking at the wrong one?

    Thanks
    Yeah so non idea why the sk went mia. Call support and ask. Might be an internal only now.

    I think the end result pointed to a driverissue or maybe something with vrrp.

  19. #19
    Join Date
    2005-08-14
    Location
    Gig Harbor, WA, USA
    Posts
    2,388
    Rep Power
    15

    Default Re: high cpu on the fw process of the standby firewall

    Quote Originally Posted by jflemingeds View Post
    Yeah so non idea why the sk went mia. Call support and ask. Might be an internal only now.
    No trace of the SK internally either.
    http://phoneboy.org
    Unless otherwise noted, views expressed are my own

Similar Threads

  1. High CPU - FWM and CPD process (R71.30)
    By DntBrnDPig in forum Check Point UTM-1 Appliances
    Replies: 3
    Last Post: 2011-10-18, 04:03
  2. high CPU on the standby firewall
    By cciesec2006 in forum Miscellaneous
    Replies: 6
    Last Post: 2010-09-02, 12:12
  3. Several UTM's running high CPD process...
    By boldin in forum Check Point UTM-1 Appliances
    Replies: 0
    Last Post: 2009-02-07, 16:21
  4. Telnet to standby firewall
    By pkochummen in forum Sun Solaris
    Replies: 1
    Last Post: 2006-10-16, 12:43
  5. SPLAT High CPU utilization in standby cluster member
    By omahrez in forum Check Point SecurePlatform (SPLAT)
    Replies: 0
    Last Post: 2006-05-19, 12:07

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •