| CPUG | |
| The Check Point User Group | |
| A Resource For The Check Point Community. Fast. Useful. Independent. | |
|
| |||||||
![]() |
| | LinkBack | Thread Tools | Display Modes |
| |||
| Yesterday I had a total failure of a two node cluster using SPLAT. It appeared that after the first member crashed the standby followed within one minute. Cluster is VSX R60 HFA01 and is hot-standby only. Total of 8 customer VS's totaling about 4000 users. The cluster required a power cycle to recover, the console was blank screen no response from keyboard. These are Dell 2850 with Intel Pro1000 Quad Card. Using the on-board Intel nics for sync and mgt. Layer 3 only, no bridging. The only thing I've found so far is a log message stating the cluster is under high-load and will stop logging to the CLM. Looks like the largest customer was pushing 15k connections steady and the bandwidth throughput was less then 300 mbps. Check Point hasn't said much yet, has anyone seen anything like this using SPLAT and VSX? |
| |||
| I really am having some issues with my R60 VSX clusters. Since Feb 2008 I've had a total 7 cluster crashes. One cluster seems to crash every 7 days. This particular cluster doesn't have a high load at all, it processes about 9 million connections per day. Nothing ever shows up in the messages file before the crash, and the first attempt at getting a stack trace failed, the failed cluster member would not respond to a laptop on the console. The only thing I've found is a lot of high load messages in the log files of each VS, about stopping log forwarding due to high load. Top and cpstat -f os cpu, does not show a high load and the MDS and MLM are not loaded at all. Check Point has not seen this before, and need a stack trace to debug the issue. The hardware is Dell 2850's and run R60 VSX. Any insite on this would really be helpful. |
| |||
| I thought it good to add a follow up on this. Out of the four VSX clusters I manage all of them using identical Dell 2850 and built with R60 VSX and few HFA's. They all crashed multiple times with no real good pattern to chart. The crash was so bad, the clst members required a power cycle to restart the cluster. Toward the end (meaning a upgrade to R65) one cluster was crashing every 5/6 days. Check Point was clueless, and because we could not get a stack trace they were also not helpful other then you have to upgrade to R65. So I upgraded to R65 without any issues, and the clusters have not crashed since. |
![]() |
| Thread Tools | |
| Display Modes | |
| |