View Single Post
  #1 (permalink)  
Old 2009-07-02
ddurocher ddurocher is offline
Junior Member
 
Join Date: 2009-07-02
Location: Toronto, Canada
Posts: 4
Rep Power: 0
ddurocher has an average reputation (10+)
Default Timeouts on Nokia cluster failover

I have an issue with a redundant firewall configuration. I am using 2 Nokia IP380 devices both running IPSO 4.0 and Checkpoint R60.

The basic design is shown below. Please ignore the dots, it was the only way I could get the spacing to work.

SVR1--SW1--FW1--SW4-->to other networks
....\/.|....|....|
..../\.|....|....|
SVR2--SW2--FW2--SW5-->to other networks

where,
SVR = Server
SW = Switch
FW = Firewall

Servers 1 and 2 are using NIC teaming to connecting to Switches 1 and 2.
There is a trunk between Switches 1 and 2 and Swtiches 4 and 5.
Default MAC timeout on all switches is 300 seconds.
The connected networks are all private and there is no NATing.

The problem is that if FW1, for example, is the master node and it fails (powered off) Server 2 will take up to 5 minutes to recover. My assumption is that this is due to the MAC timeout of 300s since if the MAC tables are cleared on the switches the connections timing out recover immediately. However, Server 1 recovers immediately. I expect this is most likely due to SW2 still trying to pass traffic to SW1 to reach the cluster IP since that is the info in its MAC table. The same is also true if FW2 is the master then Server 1 has the same issues reconnecting in a timely fashion.

From my research I understand that the Nokia box that assumes control of the cluster should be sending out a gratuitous ARP to notify devices on the network of the change. So perhaps the ARP isn't being passed over the trunk ports. I'm also wondering if our physical design is wrong and we should add links like SW1 <-> FW2, SW2 <-> FW1.

Has anyone ever seen an issue like this in a redundant network design?
Reply With Quote