Hello,
Yesterday my HA pair of Check Point 5800's experienced a unexpected failover. I was able to retrieve the local message logs and have included them below. If I am reading them correctly it appears that the sync interface failed. Today everything is business as usual and I was able to do an admin failover to the other cluster member and everything worked fine.
I did notice that the speed for the sync interfaces on both of these gateways is set to auto negotiate and is currently running at 100Mbps full duplex. Should I hard code these sync interfaces to 1000Mbps full, or is there a reason they are at 100?
Do you think the low speed setting could have caused the failover? If not any other ideas?
Thank you.
CLUSTER MEMBER A
Nov 14 18:24:25 PROBLEM DR Enabled; Master To Slave [Problem]
Nov 14 18:11:00 2018 msgcu-intfw1 kernel: [fw4_0];fwh323_cpas_decide_mon_only: failed
Nov 14 18:11:32 2018 msgcu-intfw1 last message repeated 54 times
Nov 14 18:12:43 2018 msgcu-intfw1 last message repeated 63 times
Nov 14 18:14:11 2018 msgcu-intfw1 last message repeated 61 times
Nov 14 18:15:32 2018 msgcu-intfw1 last message repeated 11 times
Nov 14 18:16:37 2018 msgcu-intfw1 last message repeated 25 times
Nov 14 18:17:46 2018 msgcu-intfw1 last message repeated 17 times
Nov 14 18:18:47 2018 msgcu-intfw1 last message repeated 22 times
Nov 14 18:19:57 2018 msgcu-intfw1 last message repeated 2 times
Nov 14 18:22:02 2018 msgcu-intfw1 last message repeated 11 times
Nov 14 18:23:09 2018 msgcu-intfw1 kernel: [fw4_0];fwh323_cpas_decide_mon_only: failed
Nov 14 18:23:25 2018 msgcu-intfw1 last message repeated 9 times
Nov 14 18:24:08 2018 msgcu-intfw1 kernel: igb: Sync NIC Link is Down
Nov 14 18:24:10 2018 msgcu-intfw1 kernel: [fw4_1];FW-1: fwha_process_state_msg: Update state of member id 1 to FAILURE due to the member report message
Nov 14 18:24:10 2018 msgcu-intfw1 kernel: [fw4_1];FW-1: fwha_update_state: ID 1 (state STANDBY -> FAILURE) (time 91367.8)
CLUSTER MEMBER B
Nov 14 18:24:08 2018 msgcu-intfw2 kernel: [fw4_1];fwha_report_id_problem_status: Try to update state to FAILURE due to pnote Interface Active Check (desc Sync interface is down, 7 interfaces required, only 6 up)
Nov 14 18:24:08 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_set_new_local_state: Setting state of fwha_local_id(1) to FAILURE
Nov 14 18:24:08 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_update_local_state: Local machine state changed to FAILURE
Nov 14 18:24:08 2018 msgcu-intfw2 kernel: [fw4_1];fwha_state_change_implied: Try to update state to ACTIVE because member is down (the change may not be allowed).
Nov 14 18:24:08 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_update_local_state: Local machine state changed to FAILURE
Nov 14 18:24:10 2018 msgcu-intfw2 kernel: [fw4_1];fwha_state_change_implied: Try to update state to ACTIVE because member is down (the change may not be allowed).
Nov 14 18:24:10 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_update_local_state: Local machine state changed to FAILURE
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: igb: Sync NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_0];FW-1: State synchronization is in risk. Please examine your synchronization network to avoid further problems !
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_0];FW-1: Please refer to documentation for details on this issue. Any change must be applied to ALL cluster members
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwldbcast_recv: delta sync connection with member 0 was lost and regained.2748 updates were lost.
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwldbcast_recv: received sequence 0x9e9ab6 (fragm 0, index 1), last processed seq 0x9e8ff9
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];fwha_report_id_problem_status: Try to update state to ACTIVE due to pnote Interface Active Check (desc <NULL>)
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_set_new_local_state: Setting state of fwha_local_id(1) to STANDBY
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_update_local_state: Local machine state changed to STANDBY
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: recv(header) returns 0
Nov 14 18:24:25 2018 msgcu-intfw2 last message repeated 6 times
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_process_state_msg: Update state of member id 0 to FAILURE due to the member report message
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];fwha_set_backup_mode: Try to update local state to ACTIVE because of ID 0 is not ACTIVE or READY. (This attempt may be blocked by other machines)
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_set_new_local_state: Setting state of fwha_local_id(1) to READY
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_update_local_state: Local machine state changed to READY
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_update_state: ID 0 (state ACTIVE -> FAILURE) (time 28187.9)
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];member 1 (172.25.2.1) is down
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_state_change_implied: Try to update local state from READY to ACTIVE because all other machines confirmed my READY state
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_set_new_local_state: Setting state of fwha_local_id(1) to ACTIVE
Nov 14 18:24:25 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_update_local_state: Local machine state changed to ACTIVE
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: recv(header) returns 0
Nov 14 18:24:25 2018 msgcu-intfw2 last message repeated 27 times
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11627]: recv(header) returns 0
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: entering cpcl_vrf_master_init()
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: leaving cpcl_master_init()
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: cpcl_vrf_master_listen_accept(6294): entering cpcl_vrf_master_listen_accept
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: cpcl_vrf_master_listen_accept(6383): leaving cpcl_vrf_master_listen_accept
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: cpcl_vrf_recv_from_instance_manager(6109): instance 0 entering cpcl_vrf_recv_from_instance_manager
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: cpcl_vrf_recv_from_instance_manager(6166): instance 0 received fd 26
Nov 14 18:24:25 2018 msgcu-intfw2 routed[11633]: cpcl_vrf_recv_from_instance_manager(6267): instance 0 leaving cpcl_vrf_recv_from_instance_manager
Nov 14 18:24:32 2018 msgcu-intfw2 kernel: [fw4_0];fwh323_cpas_decide_mon_only: failed
Nov 14 18:24:42 2018 msgcu-intfw2 last message repeated 14 times
Nov 14 18:24:42 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_process_state_msg: Update state of member id 0 to STANDBY due to the member report message
Nov 14 18:24:42 2018 msgcu-intfw2 kernel: [fw4_1];FW-1: fwha_update_state: ID 0 (state FAILURE -> STANDBY) (time 28205.1)
Nov 14 18:24:44 2018 msgcu-intfw2 kernel: [fw4_0];fwh323_cpas_decide_mon_only: failed
Bookmarks