CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


Tim Hall has done it again! He has just released the 2nd edition of "Max Power".
Rather than get into details here, I urge you to check out this announcement post.
It's a massive upgrade, and well worth checking out. -E

 

Results 1 to 12 of 12

Thread: Strange connection disruption 30minutes + after policy install

  1. #1
    Join Date
    2017-11-01
    Posts
    37
    Rep Power
    0

    Default Strange connection disruption 30minutes + after policy install

    Hello,

    I have a strange issue on our firewall in our UAT eccomerce environment.

    Since itís UAT, itís not critical, but can cause some grumpy faces on developers of course.

    So, this never happened before R80.10 by the way..

    I push a policy, and it installs fine. Half hour later, servers can not talk to each other. They live on seperate subnets behind different interfaces. Doing another policy push, not changing anything resolved the issue 100% of the time.

    That I canít get my head around. I change the connection persistence to keep old connections instead of rematching but to no avail.

    Fw ctl debugs show nothing unnormal. CPU, memory usage all fine during the time and nothing showing in dmesg.

    The weird bit for me is that it is always 30 minutes + after the installation. That isnít a specific 30 minutes by the way, itís varies between 30 minutes to 45 minutes, however itís a significant amount of time afterwards thatís stuff stops.

    Interestingly, logs show no traffic at all. After a certain time stuff just stops. Thereís no drop no accept, nothing.

    TCPDUMP shows a series of arps, from the servers behind the firewalls, arping for their end destinations.

    Is there an issue with the CP not responding to ARP after a policy push? Therefote traffic is never received to the interface on the CP and never processed therefore doesnít show up?

    What would cause the firewall to stop responding and not do anything?

    Itís the fact is happens significantly after thatís throws me. I canít figure out whatís happening.

    Any suggestions to give me food for thought on Monday morning would be great thank you!

  2. #2
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,204
    Rep Power
    13

    Default Re: Strange connection disruption 30minutes + after policy install

    Quote Originally Posted by JPYDX View Post
    Hello,

    I have a strange issue on our firewall in our UAT eccomerce environment.

    Since it’s UAT, it’s not critical, but can cause some grumpy faces on developers of course.

    So, this never happened before R80.10 by the way..

    I push a policy, and it installs fine. Half hour later, servers can not talk to each other. They live on seperate subnets behind different interfaces. Doing another policy push, not changing anything resolved the issue 100% of the time.

    That I can’t get my head around. I change the connection persistence to keep old connections instead of rematching but to no avail.

    Fw ctl debugs show nothing unnormal. CPU, memory usage all fine during the time and nothing showing in dmesg.

    The weird bit for me is that it is always 30 minutes + after the installation. That isn’t a specific 30 minutes by the way, it’s varies between 30 minutes to 45 minutes, however it’s a significant amount of time afterwards that’s stuff stops.

    Interestingly, logs show no traffic at all. After a certain time stuff just stops. There’s no drop no accept, nothing.

    TCPDUMP shows a series of arps, from the servers behind the firewalls, arping for their end destinations.

    Is there an issue with the CP not responding to ARP after a policy push? Therefote traffic is never received to the interface on the CP and never processed therefore doesn’t show up?

    What would cause the firewall to stop responding and not do anything?

    It’s the fact is happens significantly after that’s throws me. I can’t figure out what’s happening.

    Any suggestions to give me food for thought on Monday morning would be great thank you!
    Your first order of business is trying to determine if the stoppage is a Gaia issue (ARP, routing, NIC card, etc.) or a Check Point issue (SecureXL, INSPECT, NAT, ClusterXL, etc). In other which side of the house is "eating" the traffic, which ironically I just talked about in my speech at CPX360. A few things:

    1) When it happens again try immediately restarting SecureXL with fwaccel off;fwaccel on and see if things suddenly start working.

    2) Also save the output for fw ctl arp when things are fine and compare it with the result of the same command when things are not fine.

    3) In cpview baseline the Network part of the Overview screen (throughput, concurrent connections, new connections/sec etc) when things are working, then run cpview in historical mode (-t) and have a look at those same numbers (and perhaps other screens of cpview) during a known problem period. Should give you an idea of which side of the house is causing the issue.
    Last edited by ShadowPeak.com; 2018-02-09 at 19:27.
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  3. #3
    Join Date
    2006-09-26
    Posts
    3,140
    Rep Power
    15

    Default Re: Strange connection disruption 30minutes + after policy install

    Please confirm the followings:

    1- Everything was working properly PRIOR to R80.10
    2- Are you running the lastest JHFA on R80.10? Like Take 56

    Since you have an established baseline when it WILL happen, schedule a TAC case with Checkpoint and request for someone with expertise on this, do NOT settle for some junior engineer because it will be a waste of time, so that they can look into this issue WHILE it is happening.

    Thanks god I don't have to deal with R80.10. We're still at R77.30 at the moment. Next cycle, those R77.30 will be replaced by PaloAlto

  4. #4
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,621
    Rep Power
    9

    Default Re: Strange connection disruption 30minutes + after policy install

    TCPDUMP shows a series of arps, from the servers behind the firewalls, arping for their end destinations.
    This is very odd and smells like an incorrect subnet mask. There is no reason a device should arp for a remote server unless maybe it’s really talking to a nat on the local network.

    Is this possibly a bridge firewall?

  5. #5
    Join Date
    2017-11-01
    Posts
    37
    Rep Power
    0

    Default Re: Strange connection disruption 30minutes + after policy install

    Hi all,

    Thanks all for your replies.

    Firstly, no its not a bridge!

    Tim - I was at your speech at CPX, and I was attempting to use the notes I got down about 'what ate it' - however typically I cant understand my own notes and I think I am doing it wrong.

    If I am right in thinking zdebug -T drop will show traffic dropped by SecureXL (if it has been?)

    Its the last bit that throws me, with regards to Gaia eating the packet and capital I and lower-case i. Could you clarify if you dont mind?

    - I will try your steps mentioned however - thanks!

  6. #6
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,621
    Rep Power
    9

    Default Re: Strange connection disruption 30minutes + after policy install

    Can you show an example arp request you see when the outage hits? Arp should be only used to find out info for the local network.

    Btw Linux does have a limit to the amount of arp entires it can store. You will see messages about neighbor table overflow in the output of the command dmesg when you reach the limit.

  7. #7
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,204
    Rep Power
    13

    Default Re: Strange connection disruption 30minutes + after policy install

    Quote Originally Posted by JPYDX View Post
    Hi all,

    Thanks all for your replies.

    Firstly, no its not a bridge!

    Tim - I was at your speech at CPX, and I was attempting to use the notes I got down about 'what ate it' - however typically I cant understand my own notes and I think I am doing it wrong.

    If I am right in thinking zdebug -T drop will show traffic dropped by SecureXL (if it has been?)

    Its the last bit that throws me, with regards to Gaia eating the packet and capital I and lower-case i. Could you clarify if you dont mind?

    - I will try your steps mentioned however - thanks!
    Please PM me and I'll send you the presentation. After CPX Bangkok it will be publicly posted.
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  8. #8
    Join Date
    2017-11-01
    Posts
    37
    Rep Power
    0

    Default Re: Strange connection disruption 30minutes + after policy install

    Quote Originally Posted by JPYDX View Post
    Hi all,

    Thanks all for your replies.

    Firstly, no its not a bridge!

    Tim - I was at your speech at CPX, and I was attempting to use the notes I got down about 'what ate it' - however typically I cant understand my own notes and I think I am doing it wrong.

    If I am right in thinking zdebug -T drop will show traffic dropped by SecureXL (if it has been?)

    Its the last bit that throws me, with regards to Gaia eating the packet and capital I and lower-case i. Could you clarify if you dont mind?

    - I will try your steps mentioned however - thanks!
    update - The 30 minutes after statement is incorrect. I have feedback from other business areas that manage the services behind the firewall and they saw issue occur directly after the policy installation.

    So - going back to it - doing a policy installation causes the issue, and then another installation will restore service.

    I can provide examples next time I purposely cause the problem.

    Could this be anything to due with SecureXL tables clearing after a policy installation?

  9. #9
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,204
    Rep Power
    13

    Default Re: Strange connection disruption 30minutes + after policy install

    Quote Originally Posted by JPYDX View Post
    update - The 30 minutes after statement is incorrect. I have feedback from other business areas that manage the services behind the firewall and they saw issue occur directly after the policy installation.

    So - going back to it - doing a policy installation causes the issue, and then another installation will restore service.

    I can provide examples next time I purposely cause the problem.

    Could this be anything to due with SecureXL tables clearing after a policy installation?
    Could be, as a recalculation of most tables held by SecureXL is performed at that time. I'd try the fwaccel off trick immediately after policy install to help isolate the issue.
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

  10. #10
    Join Date
    2017-11-01
    Posts
    37
    Rep Power
    0

    Default Re: Strange connection disruption 30minutes + after policy install

    Quote Originally Posted by ShadowPeak.com View Post
    Could be, as a recalculation of most tables held by SecureXL is performed at that time. I'd try the fwaccel off trick immediately after policy install to help isolate the issue.
    How about doing it before? Would that cause any problems?

    Only question is, why does another policy push solve the issue?

  11. #11
    Join Date
    2014-12-21
    Posts
    2
    Rep Power
    0

    Default Re: Strange connection disruption 30minutes + after policy install

    Any chance that the connections are NATed?

  12. #12
    Join Date
    2009-04-30
    Location
    Colorado, USA
    Posts
    2,204
    Rep Power
    13

    Default Re: Strange connection disruption 30minutes + after policy install

    Quote Originally Posted by JPYDX View Post
    How about doing it before? Would that cause any problems?

    Only question is, why does another policy push solve the issue?
    You can do it beforehand but disabling SecureXL on a firewall with 8 or more cores without a good reason is a bit risky, as it may cause a noticeable performance impact. I think it would be better to push policy, have the issue occur, then quickly run fwaccel off and if BOOM everything immediately starts flowing that makes it quite clear where the issue lies.

    Every time the firewall policy is installed, there is a recalculation or "sync" between the INSPECT driver and SecureXL which maintains its own set of state tables. I've seen situations where this procedure can get hung up and just installing the policy again breaks the loop it is stuck in. Symptoms of this would be error messages like "waiting for policy load" or "too many errors" being shown by fwaccel stat. This is not very common but worth checking out based on the behavior you are describing.
    --
    Second Edition of my "Max Power" Firewall Book
    Now Available at http://www.maxpowerfirewalls.com

Similar Threads

  1. Replies: 4
    Last Post: 2015-08-14, 11:54
  2. Connection of "1st Sync" interface causing service disruption
    By dmc0202 in forum Management High Availability
    Replies: 0
    Last Post: 2013-02-11, 09:10
  3. Strange NAT Issue after policy install
    By AndyS in forum NAT (Network Address Translation)
    Replies: 2
    Last Post: 2011-07-21, 12:25
  4. Gateway connection lost during policy install
    By quartino in forum SmartDashboard
    Replies: 3
    Last Post: 2010-08-02, 02:18
  5. Connection to firewall drops on policy install
    By trifid1967 in forum Miscellaneous
    Replies: 0
    Last Post: 2006-03-09, 02:34

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •