View Single Post
  #1 (permalink)  
Old 2008-04-18
jdickson jdickson is offline
Junior Member
 
Join Date: 2007-05-03
Posts: 3
Rep Power: 0
jdickson has an average reputation (10+)
Default HA Cluster problem - cluster members can't be active at same time

Hello,

I've got a strange problem with one of our Splat clusters, and I can't fathom why it's acting so weirdly.

The situation is this, we've got 2 Splat R62 servers running in HA in Broadcast New mode, and when the cluster members are up only one of them appears to be active at any one time.

When I do a cphaprob list on the cluster member that's down, it reads :

Built-in Devices:

Device Name: Interface Active Check
Current state: problem

whereas the active one will say OK (all other entries report OK on both), when I do a cphaprob -a if, on the one that's down:

Required interfaces: 3
Required secured interfaces: 1

eth2 UP sync(secured), broadcast
eth1 UP non sync(non secured), broadcast (eth1.11)

On the one that's active:

eth0 UP non sync(non secured), broadcast
eth2 UP sync(secured), broadcast
eth1 UP non sync(non secured), broadcast (eth1.11)

Where eth0 is the external interface, eth2 the cross-over sync and eth1 the internal trunk interface.

If I reboot the one that's down, the situation will then reverse and the one that was up will revert to the state that the previous one was in (down, with the same cphaprob list entries), and the newly rebooted one will become the active member.

I've checked the fwd.elg and message logs and neither indicate any issues or why the status is changing from down to up.

I've checked the interfaces to make that there are no addresses configured the same, that they're all in the correct subnets and correspond with the topology in the firewall cluster object (a few times to make sure I've not done something stupid!).

The obvious indication is that there's a problem with the eth0 interface, the only other thing that could potentially be an issue is that there's another cluster on the same subnet that eth0, which is running in broadcast mode as well, which appears to have no problems.

Previously this was all working correctly until a recent upgrade from R60 to R62, where there were problems with one of the cluster members, and it had to be rebuilt, ever since then this has been a problem.

Any help would be appreciated.

Regards,

John
Reply With Quote