CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


Tim Hall has done it again! He has just released the 2nd edition of "Max Power".
Rather than get into details here, I urge you to check out this announcement post.
It's a massive upgrade, and well worth checking out. -E

 

Page 2 of 2 FirstFirst 12
Results 21 to 32 of 32

Thread: ClusterXL Issue with Failover

  1. #21
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: ClusterXL Issue with Failover

    Quote Originally Posted by cciesec2006 View Post
    can you tell me what those issues might be? I've ruled out the switchport, cable, typo and NIC interface. As mentioned before, the setup for the bond interface 802.3AD is a very simple one that I posted earlier.

    Btw, I use the same switch ports, same cable to connect to my Cisco ASR routers for 802.3AD and it works fine.

    Therefore, I am interested to know what you think might be the issue here.

    Thanks,
    Not as of now, not enough info. What I can help you with si how to dig it to the core. Start with the the mentioned commands and add cphaconf show_bond to them.

    That should give you enough info to see what's really going on. In some cases LACP is not forming on two interfaces, because of mismatch of low/fast, misconfig, etc. But at the moment it is only a guess.

    As already said, LACP-enabled sync is working just fine for my customers, tested and validated.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  2. #22
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: ClusterXL Issue with Failover

    Quote Originally Posted by varera View Post
    As already said, LACP-enabled sync is working just fine for my customers, tested and validated.
    were you Tested and validated on R77.30 with JHFA 205?

    I did the followings, ON THE SAME HARDWARE:

    scenario #1:
    1- on both gw1 and gw2, use snapshot to revert back to R77.30 with JHFA_159 on both gateways,
    2- configure both gateways with 802.3AD active/active for the SYNC interface; reboot, and push policy to the gateways, gw1 is Active, gw2 is standby (eth4 and eth8 are bonded interface bond1)
    3- frrom the switch, shut down eth4 on gw2, still active/standby. bring up eth4, still in active/standby. shutdown eth8, still in active/standby, bring up eth8 still in active/standby

    scenario #2:
    1- on both gw1 and gw2, use snapshot to revert back to R77.30 with JHFA_205 on both gateways,
    2- configure both gateways with 802.3AD active/active for the SYNC interface; reboot, and push policy to the gateways, gw1 is Active, gw2 is standby (eth4 and eth8 are bonded interface bond1)
    3- frrom the switch, shut down eth4 on gw2, still active/down. bring up eth4, in active/standby. shutdown eth8, in active/down, bring up eth8 in active/standby


    @jdmoore0883: I am very surprised by your revelation and comments about diamond engineers. I thought with diamond support, one would expect the best from checkpoint in term of engineer and the ability to replicate issue in diamond lab environment. Most customers are multi-vendors and it is very hard to replicate issue without equipments, especially non-checkpoint equipments. The company I am working for is not a big company but we have Palo Alto, Juniper, (Cisco routers, switch, VPN, firewalls) and Checkpoint equipments so that we can replicate all kind of issues in production. Granted these are not top of the line equipments but they are enough to build out and test scenarios when we have issues in production. Now I know it always takes a long time to resolve issues by Checkpoint TAC :-(

  3. #23
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: ClusterXL Issue with Failover

    Quote Originally Posted by cciesec2006 View Post
    were you Tested and validated on R77.30 with JHFA 205?

    I did the followings, ON THE SAME HARDWARE:

    scenario #1:
    1- on both gw1 and gw2, use snapshot to revert back to R77.30 with JHFA_159 on both gateways,
    2- configure both gateways with 802.3AD active/active for the SYNC interface; reboot, and push policy to the gateways, gw1 is Active, gw2 is standby (eth4 and eth8 are bonded interface bond1)
    3- frrom the switch, shut down eth4 on gw2, still active/standby. bring up eth4, still in active/standby. shutdown eth8, still in active/standby, bring up eth8 still in active/standby

    scenario #2:
    1- on both gw1 and gw2, use snapshot to revert back to R77.30 with JHFA_205 on both gateways,
    2- configure both gateways with 802.3AD active/active for the SYNC interface; reboot, and push policy to the gateways, gw1 is Active, gw2 is standby (eth4 and eth8 are bonded interface bond1)
    3- frrom the switch, shut down eth4 on gw2, still active/down. bring up eth4, in active/standby. shutdown eth8, in active/down, bring up eth8 in active/standby


    @jdmoore0883: I am very surprised by your revelation and comments about diamond engineers. I thought with diamond support, one would expect the best from checkpoint in term of engineer and the ability to replicate issue in diamond lab environment. Most customers are multi-vendors and it is very hard to replicate issue without equipments, especially non-checkpoint equipments. The company I am working for is not a big company but we have Palo Alto, Juniper, (Cisco routers, switch, VPN, firewalls) and Checkpoint equipments so that we can replicate all kind of issues in production. Granted these are not top of the line equipments but they are enough to build out and test scenarios when we have issues in production. Now I know it always takes a long time to resolve issues by Checkpoint TAC :-(
    HTML Code:
    <strike>Hi, Jumbo 205 was withdrawn, 206 is replacing it. It seems they had indeed some QA issues with that HFA. You may want to install 206, I guess</strike>
    Corrected. HFA 206 is still the valid one
    Last edited by varera; 2017-02-02 at 06:32.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  4. #24
    Join Date
    2014-11-14
    Location
    Ottawa Canada
    Posts
    364
    Rep Power
    6

    Default Re: ClusterXL Issue with Failover

    Quote Originally Posted by cciesec2006 View Post
    @jdmoore0883, did your customer test it in PRODUCTION and verify that it works as expected? Did they actually shutdown one of the interfaces as part of the trunk and verify that the cluster is still Active/Standby?
    A - Yes, among many other tests as well. They also completely removed one of the switches to verify no single point of failure, and the Sync and Cluster continued to operate properly/as expected.

    Clearly, from the quantity, quality, and content of your posts, you have a greater desire to shit all over Checkpoint and CPUG's Users rather than actually work with someone (anyone) to truly resolve a problem at hand; suggestions are offered, and questions are asked, yet you continue to berate and argue rather than cooperate... What is the purpose of your post? To demonstrate failings or to resolve a problem?

  5. #25
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: ClusterXL Issue with Failover

    Quote Originally Posted by varera View Post
    Hi, Jumbo 205 was withdrawn, 206 is replacing it. It seems they had indeed some QA issues with that HFA. You may want to install 206, I guess
    Interesting... I just logged into Checkpoint web site and Jumbo 205 is still there for general download. I do see Jumbo 207 is available on Jan 8th 2017 but not available for general download.

    thoughts?

  6. #26
    Join Date
    2010-09-20
    Posts
    73
    Rep Power
    10

    Default Re: ClusterXL Issue with Failover

    cciesec2006, can you please run this on both CHKP's and share the output:

    nice -n 15 ifconfig|grep bond | awk {'print $1'} | awk -F "." {'print $1'} | uniq | while read bond ; do echo $bond && cat /proc/net/bonding/$bond ;done

  7. #27
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: ClusterXL Issue with Failover

    Quote Originally Posted by indeni View Post
    cciesec2006, can you please run this on both CHKP's and share the output:

    nice -n 15 ifconfig|grep bond | awk {'print $1'} | awk -F "." {'print $1'} | uniq | while read bond ; do echo $bond && cat /proc/net/bonding/$bond ;done
    @vavera: The command you mentioned "cphaprob stat -a if and cphaprob -a list" does not seem to work. are you referring to "cphaprob -l list"?



    @indeni. information you requested. Thank you.

    [Expert@Power-1-P:0]# installed_jumbo_take
    R77.30 Jumbo Hotfix Accumulator take_205 is installed, see sk106162.
    [Expert@Power-1-P:0]#
    [Expert@Power-1-P:0]# cphaprob state

    Cluster Mode: High Availability (Active Up) with IGMP Membership

    Number Unique Address Assigned Load State

    1 (local) 192.0.2.5 100% Active
    2 192.0.2.6 0% Standby

    ng/$bond ;done'} | uniq | while read bond ; do echo $bond && cat /proc/net/bondin
    bond1
    Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

    Bonding Mode: IEEE 802.3ad Dynamic link aggregation
    Transmit Hash Policy: layer3+4 (1)
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 200
    Down Delay (ms): 200

    802.3ad info
    LACP rate: slow
    Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 2
    Actor Key: 17
    Partner Key: 11
    Partner Mac Address: 00:24:51:ea:54:80

    Slave Interface: Lan1
    MII Status: up
    Link Failure Count: 1
    Permanent HW addr: 00:90:fb:31:76:88
    Aggregator ID: 1

    Slave Interface: Lan2
    MII Status: up
    Link Failure Count: 1
    Permanent HW addr: 00:90:fb:31:76:89
    Aggregator ID: 1
    [Expert@Power-1-P:0]#

    [Expert@Power-1-S:0]# installed_jumbo_take
    R77.30 Jumbo Hotfix Accumulator take_205 is installed, see sk106162.
    [Expert@Power-1-S:0]#
    [Expert@Power-1-S:0]# cphaprob state

    Cluster Mode: High Availability (Active Up) with IGMP Membership

    Number Unique Address Assigned Load State

    1 192.0.2.5 100% Active
    2 (local) 192.0.2.6 0% Standby

    ng/$bond ;done'} | uniq | while read bond ; do echo $bond && cat /proc/net/bondi
    bond1
    Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

    Bonding Mode: IEEE 802.3ad Dynamic link aggregation
    Transmit Hash Policy: layer3+4 (1)
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 200
    Down Delay (ms): 200

    802.3ad info
    LACP rate: slow
    Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 2
    Actor Key: 17
    Partner Key: 12
    Partner Mac Address: 00:24:51:ea:54:80

    Slave Interface: Lan1
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:90:fb:31:75:70
    Aggregator ID: 1

    Slave Interface: Lan2
    MII Status: up
    Link Failure Count: 0
    Permanent HW addr: 00:90:fb:31:75:71
    Aggregator ID: 1
    [Expert@Power-1-S:0]#

  8. #28
    Join Date
    2005-11-25
    Location
    United States, Southeast
    Posts
    857
    Rep Power
    14

    Default Re: ClusterXL Issue with Failover

    So much bad advice being given..

    varera is the only one I see asking reasonable questions..

    When troubleshooting ClusterXL, you must always start with the output from the following commands:

    cphaprob -a if
    cphaprob -i list

    These will tell you if a monitored process or an interface issue is downing the cluster member..

    I saw that you had a problem with eth5.. 99.9% of the time, ClusterXL issues are actually switch configuration problems. so lets get this output and see what the real problem is..

    and seriously.. don't bond.. bonding is foolish from both a KISS standpoint and a basic systems engineering PoV (limits the number of interrupts/CPUs handling inbound/outbound packets, among other reasons.)

  9. #29
    Join Date
    2011-08-02
    Location
    http://spikefishsolutions.com
    Posts
    1,652
    Rep Power
    10

    Default Re: ClusterXL Issue with Failover

    I think you missed some key details from post 1.

    eth5 is sync connected via cross over and fw2 locked up. As in nada on console which is why its strange that fw1 was in down standby state. I kind of doubt this is a switch issue, but it is true we haven't seen the output of all the cphaprob commands so it is worth seeing.

  10. #30
    Join Date
    2005-11-25
    Location
    United States, Southeast
    Posts
    857
    Rep Power
    14

    Default Re: ClusterXL Issue with Failover

    Nope.. didn't miss anything..

    ClusterXL troubleshooting starts with those two commands..

    cphaprob -a if
    cphaprob -i list

    Much clue dust will come bursting forth..

  11. #31
    Join Date
    2006-09-26
    Posts
    3,194
    Rep Power
    17

    Default Re: ClusterXL Issue with Failover

    Quote Originally Posted by varera View Post
    Hi, Jumbo 205 was withdrawn, 206 is replacing it. It seems they had indeed some QA issues with that HFA. You may want to install 206, I guess
    Hi varera,

    Can you confirm that Jumbo 205 for R77.30 was withdrawn by checkpoint and is replaced by 206? Also, can you share where you get that information from?

    I can not find any confirmation on this. JHFA 205 is still available for download from Checkpoint website and I am not seeing any JHFA 206.

    Thanks,

  12. #32
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: ClusterXL Issue with Failover

    Quote Originally Posted by cciesec2006 View Post
    Hi varera,

    Can you confirm that Jumbo 205 for R77.30 was withdrawn by checkpoint and is replaced by 206? Also, can you share where you get that information from?

    I can not find any confirmation on this. JHFA 205 is still available for download from Checkpoint website and I am not seeing any JHFA 206.

    Thanks,
    That sounds extremely weird, but you are right. I must have dreamed it :-( My mistake, thanks for correcting
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

Page 2 of 2 FirstFirst 12

Similar Threads

  1. ClusterXL failover timings
    By tangerine0072000 in forum R75.40 (GAiA)
    Replies: 1
    Last Post: 2013-08-30, 10:29
  2. ClusterXL failover breaks existing connections with static nat.
    By shukalo83 in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 2
    Last Post: 2012-11-17, 08:26
  3. unable to failover r75.30 clusterXL using smartdashboard
    By lordbigsack in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 4
    Last Post: 2012-03-14, 04:43
  4. interface monitoring for failover in clusterXL
    By sebastan_bach in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 12
    Last Post: 2010-02-18, 03:05
  5. ClusterXL long switching time by failover
    By Izzio in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 4
    Last Post: 2006-04-26, 11:30

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •