CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


Tim Hall has done it yet again - That's right, the 3rd edition is here!
You can read his announcement post here.
It's a massive upgrade focusing on current versions, and well worth checking out. -E

 

Page 1 of 2 12 LastLast
Results 1 to 20 of 37

Thread: CUL - Cluster

  1. #1
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default CUL - Cluster

    May 23 20:38:32 nappa-fw1 kernel: [fw_1];FW-1: [cul_load_freeze][CUL - Cluster] Setting CUL FREEZE_ON, high kernel CPU usage (82%) on local Member 0, threshold = 80%
    May 23 20:38:43 nappa-fw1 kernel: [fw_1];FW-1: [cul_load_freeze][CUL - Cluster] CUL should be OFF (short timeout of 10 seconds expired) but at least one member reported high CPU usage 9 seconds ago

    the /var/log/messages are flooded with these messages and my firewalls failover at least twice a week. I am running R75.47 with take 67.

    I am trying to find out which traffics/protocols are causing this. I am suspecting it is Microsoft DFS but not sure because the DFS traffics at that time is less than 40Mbps and usually the firewall does not panic with microsoft DFS until about 200Mbps. Checkpoint TAC so far has not been helpful at all. The case goes to a Junior and all he is doing is reading from an SK which does not help me. I am not sure what to do next. This is a production firewalls and it needs to be up 24x7x365 so I can't do much with it.

    Thoughts?

  2. #2
    Join Date
    2005-08-29
    Location
    Upstate NY
    Posts
    2,720
    Rep Power
    17

    Default Re: CUL - Cluster

    I don't know of any current issues with DCE traffic but it has been a problem in the past. I would suggest getting to R77.30+JHF as the R77 kernel is noticeably better than the R75. Add to that R75.X is end of support:

    Check Point R75.40 April 2012 R75.40, R75.45, R75.46, R75.47 April 2016


    That said, request escalation right away. If you are not having progress with L1 after a few hours it is perfectly reasonable to escalate.

  3. #3
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: CUL - Cluster

    Quote Originally Posted by chillyjim View Post
    I don't know of any current issues with DCE traffic but it has been a problem in the past. I would suggest getting to R77.30+JHF as the R77 kernel is noticeably better than the R75. Add to that R75.X is end of support:

    Check Point R75.40 April 2012 R75.40, R75.45, R75.46, R75.47 April 2016


    That said, request escalation right away. If you are not having progress with L1 after a few hours it is perfectly reasonable to escalate.
    LOL. You don't know of any current issues with DCE traffics because you don't work with it as much as I have :-)

    I have another case with Diamond support and it has been over two weeks with no progress. I was about to open another but I am not having much hope on this. The diamond guy ignored me for almost a week.

    R77.30 with JHF does not seem to resolve my issue either, at least in test environment. By the way, how come the latest JHF is not available for download? You have to open a TAC case with checkpoint to request for it. Why?
    Last edited by cciesec2006; 2016-05-24 at 08:38.

  4. #4
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    6

    Default Re: CUL - Cluster

    Quote Originally Posted by cciesec2006 View Post
    I have another case with Diamond support and it has been over two weeks with no progress. I was about to open another but I am not having much hope on this. The diamond guy ignored me for almost a week.
    I strongly urge you (and anyone else so dissatisfied with their service for that matter) to call in and talk to Diamond Management and discuss your concerns with them, and if you don't feel they're taking you seriously, talk to THEIR Management. The Diamond Team (as a whole) DOES perform some excellent work, and I believe that you should experience this at least once before your Diamond contract expires. I take great pride in the work Diamond performs, and it pains me to hear of someone who is so unsatisfied with their service.

    Quote Originally Posted by cciesec2006 View Post
    By the way, how come the latest JHF is not available for download? You have to open a TAC case with checkpoint to request for it. Why?
    That would depend on how your trying to install it I suppose. The LATEST take 145 should be available through the Gaia Online Updater, CPUSE. If you are trying to download it for the legacy CLI method, the file for that is not generally available at all, for any version. I suppose this is likely to encourage people to use the CPUSE (which has made HUGE improvements since first released with Gaia R75.40), rather than this older method, but that's just my 2c as opposed to "official word".

  5. #5
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    6

    Default Re: CUL - Cluster

    Quote Originally Posted by cciesec2006 View Post
    I am trying to find out which traffics/protocols are causing this. I am suspecting it is Microsoft DFS but not sure because the DFS traffics at that time is less than 40Mbps and usually the firewall does not panic with microsoft DFS until about 200Mbps. Checkpoint TAC so far has not been helpful at all. The case goes to a Junior and all he is doing is reading from an SK which does not help me. I am not sure what to do next. This is a production firewalls and it needs to be up 24x7x365 so I can't do much with it.

    Thoughts?
    Are you able to do anything with the gateway when it is in that state? Could you, for example, log in via SSH and get a connections table dump? Or run some other diagnostic (non-debugging/intrusive/disruptive) commands?

    I understand that R77.30 doesn't solve THIS issue at hand, but is it an upgrade that can be done? I ask this because in R77 and above, there's a new resource monitoring tool called CPView. This has the ability to log various usage stats to a database, that can be exported for offline analysis. Sadly, due to the nature of how tightly integrated this tool is with the FW Kernel code, it cannot be backported to previous versions. Though R77.30 may not fully reSOLVE this issue at hand, something like this may help shed some additional light on the problem and root cause.

    From my perspective, I would want to try to replicate this in a lab or something, and do those intensive troubleshooting and debugs there. Thanks to my Google-Fu, I now know that DFS is some kind of Distributed File System. Sadly, my Microsoft skills are lacking (I'm a Linux guy myself), and have no idea what would be needed to set up some DFS traffic in a lab. And even then, how to generate 40-200Mbps of it... Anyone (cciesec2006, I'm talking to YOU ;) ) have any thoughts on this?

  6. #6
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: CUL - Cluster

    Quote Originally Posted by jdmoore0883 View Post
    Are you able to do anything with the gateway when it is in that state? Could you, for example, log in via SSH and get a connections table dump? Or run some other diagnostic (non-debugging/intrusive/disruptive) commands?

    I understand that R77.30 doesn't solve THIS issue at hand, but is it an upgrade that can be done? I ask this because in R77 and above, there's a new resource monitoring tool called CPView. This has the ability to log various usage stats to a database, that can be exported for offline analysis. Sadly, due to the nature of how tightly integrated this tool is with the FW Kernel code, it cannot be backported to previous versions. Though R77.30 may not fully reSOLVE this issue at hand, something like this may help shed some additional light on the problem and root cause.

    From my perspective, I would want to try to replicate this in a lab or something, and do those intensive troubleshooting and debugs there. Thanks to my Google-Fu, I now know that DFS is some kind of Distributed File System. Sadly, my Microsoft skills are lacking (I'm a Linux guy myself), and have no idea what would be needed to set up some DFS traffic in a lab. And even then, how to generate 40-200Mbps of it... Anyone (cciesec2006, I'm talking to YOU ;) ) have any thoughts on this?
    LOL... Join the club. I am also a Linux guy, rarely touching windows but now I have no choice :-(

    Setting Mirosoft DFS is very easy. here is what you need:

    1- host A, Windows 2008R2x64 acting as AD, sitting inside the firewall
    2- host B, Windows 2008R2x64 acting as client sitting outside the firewall,
    3- join host B to AD,
    4- Install DFS service,

    the rest is quite easy. Take less than 10 minutes to set it up. Mine is all VMs so it is quite simple. One that is done, it is just a matter of "drag and drop".

    If you PM me, I can walk through the process with you.

  7. #7
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    6

    Default Re: CUL - Cluster

    Quote Originally Posted by cciesec2006 View Post
    LOL... Join the club. I am also a Linux guy, rarely touching windows but now I have no choice :-(

    Setting Mirosoft DFS is very easy. here is what you need:

    1- host A, Windows 2008R2x64 acting as AD, sitting inside the firewall
    2- host B, Windows 2008R2x64 acting as client sitting outside the firewall,
    3- join host B to AD,
    4- Install DFS service,

    the rest is quite easy. Take less than 10 minutes to set it up. Mine is all VMs so it is quite simple. One that is done, it is just a matter of "drag and drop".
    So... Would I be correct in assuming that you have replicated the environment in a lab setup? Does this issue exist there? If yes, then I would suggest debugging something or other in the lab; if no, then I would tend to think that the issue isn't necessarily with DFS alone, and likely involves other traffic.

  8. #8
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: CUL - Cluster

    Quote Originally Posted by jdmoore0883 View Post
    So... Would I be correct in assuming that you have replicated the environment in a lab setup? Does this issue exist there? If yes, then I would suggest debugging something or other in the lab; if no, then I would tend to think that the issue isn't necessarily with DFS alone, and likely involves other traffic.
    I am reasonably confident that it is either Microsoft DFS or Oracle RMAN but I've fixed the Oracle RMAN issue.

  9. #9
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    6

    Default Re: CUL - Cluster

    Quote Originally Posted by cciesec2006 View Post
    I am reasonably confident that it is either Microsoft DFS or Oracle RMAN but I've fixed the Oracle RMAN issue.
    Let me tell you a story about confidence vs. confirmed knowledge...

    Problem: This 1 AD Group wasn't being detected by Identity Awareness, and the rules for this group just weren't being used.

    The customer themselves troubleshot this for 2-3 weeks before FINALLY giving up and contacting their VAR/Partner. The Partner then followed up with yet another 2-3 weeks of intensive troubleshooting, debugging, and replicating. Everything showed that everything was working properly, and all replications worked as expected, and the problem could not be reproduced. The Partner, as well, finally gave up called in to TAC. They worked with the Tier 2 agent for a week or so before it got escalated to me (I was a Tier 3 at the time). After a week working with me, I FINALLY convinced to VAR to show me, on a remote session, the AD Group on the AD Server. We copied/pasted this into Notepad. Next, we looked at the IA Group used in the rule that was being used. We copied/pasted this into the same Notepad window. Only after a very careful examination of this were we able to finally discern the root of the problem:
    Code:
    OU=Departmental
    vs
    Code:
    OU=Departamental
    It turns out that one of the Hispanic team members put this in. I mention his ethnicity not to be racist, but to point out that EVERYONE was ABSOLUTELY 100% CONFIDENT that the OU was properly spelled (the misspell is proper in spanish ;) ). Even I bought in to their absolute confidence for a week (so a total of 6-8 weeks) before deciding to CONFIRM.

    Now, I am sure you are confident in your findings. I do not question this. But my point here is that if you have replicated this in your lab, then we can debug the shit out of your lab, and not worry about "consequences". Otherwise, if the issue does not exist in your lab, then it is likely that another fact, AS WELL AS the DFS, is at play in your production environment, and if we can figure this out, we are either 1 step closer to a "true" solution, or we have another set of traffic patterns to work in to the lab so we debug the shit out of it at that time.

  10. #10
    Join Date
    2015-12-23
    Posts
    47
    Rep Power
    0

    Default Re: CUL - Cluster

    I've been searching for the answer to this problem too. 10 days ago we upgraded a pair of 12200 to 15600 with High Performance Pack. the old firewalls ran R77.10. new firewalls installed R77.30. I enabled CoreXL Dynamic Dispatcher on the new firewalls and thinking my nightmares are over. No. I am still seeing these messages, just less often. I know 2 things trigger these events in my environment consistently. 1) massive amount of data transmitted through the firewall. 2) install policy.

    when I see large amount of data transmit through the firewall CPUx would spike to ~100%. if another transfer is sent CPUy would spike to ~100%. a third transfer would cause CPUz to spike to ~100%. base on my observation it seems CPU is distributed per session rather than how it's advertised in CoreXL Dynamic Dispatcher sk105261. or perhaps my lack of understanding CDD multi-tasking. if CPUs cannot distribute processes I think we will continue to see these messages.

    see attachment for outputs
    Attached Files Attached Files
    Last edited by wayne0206; 2016-05-24 at 20:32.

  11. #11
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: CUL - Cluster

    Quote Originally Posted by jdmoore0883 View Post
    Now, I am sure you are confident in your findings. I do not question this. But my point here is that if you have replicated this in your lab, then we can debug the shit out of your lab, and not worry about "consequences". Otherwise, if the issue does not exist in your lab, then it is likely that another fact, AS WELL AS the DFS, is at play in your production environment, and if we can figure this out, we are either 1 step closer to a "true" solution, or we have another set of traffic patterns to work in to the lab so we debug the shit out of it at that time.
    I have replicated this issue in the lab; however, in my lab, I can only push about 100Mbps of DFS traffics using VM and that my lab infrastructure can only support 100Mbps. the other issue is that my gateways in the lab is a pair of Power-1 11065 and yes, I can see the traffics are NOT being accelerated but not to the point where I can see these clusterXL messages.

    I am in the process of building another 1G environment so that I can test these things better but that will take time and time is something I don't have :-(

  12. #12
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,658
    Rep Power
    10

    Default Re: CUL - Cluster

    Quote Originally Posted by wayne0206 View Post
    I've been searching for the answer to this problem too. 10 days ago we upgraded a pair of 12200 to 15600 with High Performance Pack. the old firewalls ran R77.10. new firewalls installed R77.30. I enabled CoreXL Dynamic Dispatcher on the new firewalls and thinking my nightmares are over. No. I am still seeing these messages, just less often. I know 2 things trigger these events in my environment consistently. 1) massive amount of data transmitted through the firewall. 2) install policy.

    when I see large amount of data transmit through the firewall CPUx would spike to ~100%. if another transfer is sent CPUy would spike to ~100%. a third transfer would cause CPUz to spike to ~100%. base on my observation it seems CPU is distributed per session rather than how it's advertised in CoreXL Dynamic Dispatcher sk105261. or perhaps my lack of understanding CDD multi-tasking. if CPUs cannot distribute processes I think we will continue to see these messages.

    see attachment for outputs
    Can you show fwaccel stat (not stats -p)

  13. #13
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,658
    Rep Power
    10

    Default Re: CUL - Cluster

    Quote Originally Posted by jflemingeds View Post
    Can you show fwaccel stat (not stats -p)
    BTW the automatic updates of IPS and app control may fire off a policy install as well. I can't remember if it does or not.

  14. #14
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: CUL - Cluster

    Quote Originally Posted by jflemingeds View Post
    BTW the automatic updates of IPS and app control may fire off a policy install as well. I can't remember if it does or not.
    Here is what I have:

    [Expert@fw-2]# fwaccel stats -s
    Accelerated conns/Total conns : 9/20 (45%)
    Accelerated pkts/Total pkts : 13322/5116343 (0%)
    F2Fed pkts/Total pkts : 5099142/5116343 (99%)
    PXL pkts/Total pkts : 3879/5116343 (0%)
    [Expert@fw-2]#

    and I can confirm that Microsoft DFS is 99% of the F2Fed slow path, as confirmed with "fwaccel conns | grep x.x.x.x" and I see F in there.

    Firewall is running R77.30 take 145. In my case, I do NOT have IPS as confirmed with "enabled_blades" output

  15. #15
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    6

    Default Re: CUL - Cluster

    From MS's Site:
    https://technet.microsoft.com/en-ca/...=ws.10%29.aspx
    DFS Protocols

    DFS uses the Common Internet File System (CIFS) for communication between DFS clients, root servers, and domain controllers. CIFS is an extension of the Server Message Block (SMB) file sharing protocol. Examining network captures of CIFS communications between a DFS client and server is helpful for understanding and troubleshooting DFS processes. The following sections illustrate two network captures created when a client receives different types of referrals.
    sk32578: SecureXL Mechanism:
    CIFS is not accelerated.

  16. #16
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: CUL - Cluster

    Quote Originally Posted by jdmoore0883 View Post
    sk32578: SecureXL Mechanism:
    CIFS is not accelerated.
    It is very interesting that the date on the SK was last modified on April 25, 2016. I think I opened my ticket with Checkpoint a few weeks before that. Based on the SK, this is an issue even with R77.30 or may be R80

    Now that we know CIFS is NOT accelerated by SecureXL, how can one justifies paying for an expensive Checkpoint firewall that will choke itself with less than 300Mbps of CIFS traffics?

  17. #17
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    6

    Default Re: CUL - Cluster

    Quote Originally Posted by cciesec2006 View Post
    Now that we know CIFS is NOT accelerated by SecureXL, how can one justifies paying for an expensive Checkpoint firewall that will choke itself with less than 300Mbps of CIFS traffics?
    A - You can request that CIFS not be inspected. There is an internal hotfix available that can accomplish this. It does not appear to be as thoroughly tested as most other hotfixes, and as such isn't as readily available, but it is there, and this can be done.

  18. #18
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: CUL - Cluster

    Quote Originally Posted by jdmoore0883 View Post
    A - You can request that CIFS not be inspected. There is an internal hotfix available that can accomplish this. It does not appear to be as thoroughly tested as most other hotfixes, and as such isn't as readily available, but it is there, and this can be done.
    The diamond guy gave me the fix but it does not work :-(. Problem with most of these hot fixes, as you said (not throughly tested by Checkpoint), it could make the situation from bad to worse. It might break other stuffs as well :-(

    Perhaps, checkpoint needs to put a disclaimer on the product and advise their SEs not to make outrageous claims :-)

  19. #19
    Join Date
    2016-03-08
    Posts
    8
    Rep Power
    0

    Default Re: CUL - Cluster

    Quote Originally Posted by cciesec2006 View Post
    I am suspecting it is Microsoft DFS but not sure because the DFS traffics at that time is less than 40Mbps and usually the firewall does not panic with microsoft DFS until about 200Mbps.
    Am I missing something? If you don't normally see issues until around 200mbps, what leads you to believe this traffic is still problematic?

    Quote Originally Posted by cciesec2006 View Post
    I have replicated this issue in the lab; however, in my lab, I can only push about 100Mbps of DFS traffics using VM and that my lab infrastructure can only support 100Mbps. the other issue is that my gateways in the lab is a pair of Power-1 11065 and yes, I can see the traffics are NOT being accelerated but not to the point where I can see these clusterXL messages.
    So you partially replicated the traffic flow, but not the CUL messages? In other words the issue wasn't replicated?

    Instead of jumping to conclusions, let's look at the basics. What does top show when you see these messages? Any processes spiking?

  20. #20
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    6

    Default Re: CUL - Cluster

    Quote Originally Posted by cciesec2006 View Post
    Perhaps, checkpoint needs to put a disclaimer on the product and advise their SEs not to make outrageous claims :-)
    This could probably be said for all sales for all vendors. My primary gripe with ALL sales, as a whole. And in the end, the sales guys go with the sales info that they're provided, and though some may have technical skills and backgrounds, they aren't in a technical role, and won't (can't, really) know ALL the in-depth, rare and one-off's limitations of all aspects of all products they're trying to sell.

    On the note of the Hotfix though... This is a very specific hotfix whose sole purpose is to have CIFS not be inspected like this. If this truly doesn't work, then notify your engineer to get the hotfix fixed.

Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 2
    Last Post: 2015-12-29, 04:55
  2. Replies: 0
    Last Post: 2012-07-18, 15:12
  3. R75 cluster object corrupt. Cluster not passing traffic
    By jmcgrady in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 0
    Last Post: 2011-12-01, 23:53
  4. No traffic on Cluster Sync interface - Splat 2.6 Cluster XL HA
    By Xoron in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 11
    Last Post: 2009-02-17, 09:05
  5. HA Cluster problem - cluster members can't be active at same time
    By jdickson in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 1
    Last Post: 2008-04-30, 11:17

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •