CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


Tim Hall has done it again! He has just released the 2nd edition of "Max Power".
Rather than get into details here, I urge you to check out this announcement post.
It's a massive upgrade, and well worth checking out. -E

 

Results 1 to 6 of 6

Thread: VSLS Failback

  1. #1
    Join Date
    2005-08-29
    Location
    Upstate NY
    Posts
    2,720
    Rep Power
    17

    Default VSLS Failback

    OK brain death must be setting in but for the life of me I can't figure out what I'm doing wrong.

    Given 2 VSX gateways in VSLS mode and 3 VS's connected to a Cisco 6509.

    Failing one box is fine, traffic pauses for a few seconds but after that life is good. When the "failed" box comes back on-line to takes it VS back and all traffic stops for about 30 seconds and none of the connections resume (they can be restarted).

    Nothing strange about this environment I can see (one trunk in, one trunk out). I KNOW this works, I've done it before, but for the life of me I can't find what I'm missing.

    Thanks.

  2. #2
    Join Date
    2007-07-16
    Location
    a land down under!
    Posts
    2,015
    Rep Power
    15

    Default Re: VSLS Failback

    spanning tree? arp caching on the switch? I recall some config changes that are required specifically on a 6509 for Clustering to work, check the switch config for that.

  3. #3
    Join Date
    2008-07-31
    Location
    Netherlands, Europe
    Posts
    1,146
    Rep Power
    13

    Default Re: VSLS Failback

    Are you using dynamic routing? OSPF can be taking this time to relearn all the routes again....
    Regards, Maarten.
    Triple MDS on R77.30, MDS on R80.10, VSX, GAIA.

  4. #4
    Join Date
    2005-08-29
    Location
    Upstate NY
    Posts
    2,720
    Rep Power
    17

    Default Re: VSLS Failback

    Spanning tree was the first thing I thought. Changing the ports to "port-fast" made it worse.

    Yes OSPF is involved but the "router" IP and ID never changes so OSPF shouldn't see a difference AFAIK.

    Need to check the arp think though. That might be part of it.

  5. #5
    Join Date
    2007-03-07
    Location
    Detroit, Michigan
    Posts
    154
    Rep Power
    13

    Default Re: VSLS Failback

    In a active/standby deployment OSPF takes about 40 seconds for routes to load in the new active cluster member. At least this is what I see in my environment with about 1800 routes. However, in a active/active deployment this shouldn't be a issue, unless gated was shutdown, ie reboot or restart which is what this sounds like.

  6. #6
    Join Date
    2009-03-21
    Posts
    190
    Rep Power
    11

    Default Re: VSLS Failback

    Quote Originally Posted by chillyjim View Post
    OK brain death must be setting in but for the life of me I can't figure out what I'm doing wrong. Given 2 VSX gateways in VSLS mode and 3 VS's connected to a Cisco 6509. Failing one box is fine, traffic pauses for a few seconds but after that life is good. When the "failed" box comes back on-line to takes it VS back and all traffic stops for about 30 seconds and none of the connections resume (they can be restarted). Nothing strange about this environment I can see (one trunk in, one trunk out). I KNOW this works, I've done it before, but for the life of me I can't find what I'm missing. Thanks.
    Jim,
    You're right, this /does/ work. I've done it before in production and in the lab with NGX R65 Management and VSX NGX R65 w/Scalability Pack on Splat.

    Ok, I assume the VS's are layer 3 and not layer 2(?) What would I do?

    First of all, check out how the VIP is resolved at layer 2. I /think/ the VIP is mapped against a unicast MAC address, and when a failure takes place, the 2nd member does a gratuitous ARP to remap the VIP against it's own MAC address. (To /really/ check how this mechanism works, ping the VIP from your laptop and check your ARP cache. Then fail over, ping it again and RE-check your ARP cache).

    So flipping from one member to the other works, then back does work, but with a 40 seconds delay. Check the Cisco Cat. config and see if there are any ARP related protections configured. You might want to check

    - storm control (which can block broadcast traffic)
    - port security - etc (read the IOS config line by line, check the interface section for your ports AND the global section.

    I'd also be inclined to show up at the customer's site with your own Catalyst, with a simple config to rule out their switch.

    You could also grab some tcpdumps from both members during failover (looks for ARPs, and ccp traffic). Probably not needed, but you could also do a debug on the ClusterXL load balancing filter (although, I've done this before, but just looked for the syntax again, and couldn't find it easily.

    You should also check the integrity of sync, with fw ctl pstat (or the suitable VSX command, I think pstat works in context 0).

    So, no answer from me :( but breadcrumbs to follow. Maybe someone who's had this issue can help with an answer rather than simply advice.
    Last edited by MrSnakey; 2009-04-04 at 14:53. Reason: because the shitty forum stripped my carriage returns!
    --
    Mr Snakey
    Remember: Speculation does no-one any good.
    Visit http://www.snakeoilresearch.com

Similar Threads

  1. Adding interfaces to a SPLAT VSLS cluster
    By Adam Carter in forum VPN-1 VSX
    Replies: 0
    Last Post: 2009-08-11, 21:21

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •