PDA

View Full Version : All Edge firewalls rebooted 10/30/2010 8:58 p.m.



greyfeld
2010-10-31, 01:19
All of our Edge firewalls started to lose connectivity and rebooted on their own (30+) on October 30th around 8:58 p.m. CDT. There are reports this is a worldwide issue. Anyone else having this problem?a Can't find a thing about it on Check Point's site. Big wigs want answers for downtime. Anyone know what the cause is or if it is going to happen again? Here are a couple of recent blog posts about it.

Hurricane Labs Engineering Notes: Edge Box Reboots (http://blog.hurricanelabs.com/2010/10/edge-box-reboots.html)
CheckPoint/Sofaware FlashForward Jack of All I.T. (http://jackofallit.wordpress.com/2010/10/30/checkpointsofaware-flashforward/)

Sorry posted in the wrong forum first time. Been up to long trying to figure out what is going on.

marklar
2010-10-31, 05:21
Just got this from SofaWare:


Dear marklar,
Hello
I would like to update you about an issue that has been raised by the field.
Last night we have started to get reports that edge units and safe@office units were unreachable no matter if they were managed or not.

The issue was caused by the edge/Safe@ going into 100% cpu and rebooted a few times.
The issue seems behind us right now as it was caused only while the clock on the unit changed from OCT 30 to OCT 31.
We are currently working on understanding what has caused the edge/Safe@ to act that way.
I will send another update later today


I ended up doing a factory reset and waited a few hours, seems OK for now...

m.

Trevor Rowley
2010-10-31, 05:57
Hi

We run around 300 Edge devices of which roughly half are the new N series. A quick check this morning shows they all rebooted at 2am UK time which coincides exactly with the clocks going backwards in the UK.

We have two boxes that have not come up after the reboot (One X and one NW) but they may work once staff are available onsite to power cycle. Most of our offices are closed on Sundays.

Trevor

Update: Our logs clearly show that these boxes all started to power cycle every few minutes at 1am, the clocks went back at 2am and they carried on power cycling again for another hour. Once we had passed 2am for the second time they all stopped doing this. All our boxes are on 8.x firmware.

apachepro
2010-10-31, 08:11
I work for ISP/MSS and also heard from our clients that all Edges did reboot at night.All boxes returned to normal after restart. Funny :) Thanks Checkpoint for providing me with catchy header for my blog :) The D-day for CheckPoint UTM-1 Edge Appliances happened today – reboots are reported all over the world (http://yurisk.info/2010/10/31/the-d-day-for-checkpoint-utm-1-edge-appliances-happened-today-reboots-are-reported-all-over-the-world/)

ShadowPeak.com
2010-10-31, 13:53
The United States will "fall back" an hour next weekend. Hopefully this isn't a precursor to what will happen to our Edge units as well...

PhoneBoy
2010-11-01, 00:09
Over the weekend, Check Point received a number of reports about this issue with EDGE appliances. While the issue is still being analyzed, the initial findings suggest it is time and date specific and is not expected to happen again. More details will be provided once they are known.

abusharif
2010-11-01, 04:30
*sigh* can confirm this as well :(
of arround 180 units, 60 something didn't come up on their own.

Why isn't there any official warning/information on Checkpoint support site at the time of writing?
Only thing received atm is e-mail from Sofaware.
Or maybe I missed it?

aritz
2010-11-01, 09:57
the edges devices that rebooted during the weekend is result of time and date specific and will not happen again. make sure that the clock on your Edge device is set properly, since if its configured before you will face this issue when its clock show Oct 30.

abusharif
2010-11-01, 15:10
So almost 2 days and still no official info/entry/acknowledgment on usercenter/support pages about this.
Guess Edge as product is either not that intresting from CP pov or bug is that embarassing to post about ;-)

lammbo
2010-11-01, 15:23
bug is that embarassing to post about ;-)

i guess this....

greyfeld
2010-11-01, 17:27
I did get a call from Check Point this afternoon confirming that it was a time/date error in their code, but not much else. I was told that they should be releasing a KB article on the episode later today. I'm surprised that the major security news outlets have not reported on this as it had to have affected many companies/users worldwide.

We did do some testing in our lab with spare Edges and were able to duplicate the issue by setting the clock back to 10/30/2010 and 7:00 p.m. CDT. They started flaking out shortly after we reset the time with a final reboot just before 9:00 p.m. CDT. We have also set one ahead to the Saturday before next Sunday's daylight savings time change in the US. So far that one has not crashed, but it hasn't gotten to the time change just yet. I'll let you know if we encounter any issues with our testing.

greyfeld
2010-11-01, 17:37
I just looked again and the following KB article on the episode has been posted.

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk56641&js_peid=P-114a7ba5fd7-10001

Basically, yeah they rebooted, it was a bug, it won't happen again (before they are expected to be end of support in 2015-my addition).

PhoneBoy
2010-11-01, 20:47
The reason nothing was posted on User Center before now was we wanted to ensure we had the facts correct.

For everyone in the Americas concerned this problem will affect them next weekend as well, the issue has nothing to do with Daylight Saving Time. The fact this bug occurred around the Daylight Saving Time switchover for Europe was coincidental.

aritz
2010-11-02, 02:28
there is sk56641 for explaining the issue better.

boldin
2010-11-02, 12:52
Checkpoint UTM-1 edge VPN boxes worldwide did an unscheduled reboot (http://isc.sans.edu/diary.html?storyid=9862)

hotice_
2010-11-02, 14:16
I knew I wasn't crazy when our monitoring system went Defcon 5 on our dozens of monitored Edges on Saturday night...

greyfeld
2010-11-02, 16:24
We were able to easily duplicate the problem in the lab by setting a couple of Edges back to Oct. 30th. We also tested setting the time ahead to this weekend's daylight savings time adjustment in the US and nothing happened as far as we could tell.

Barry J. Stiefel
2010-11-02, 17:27
It looks like this thread is becoming a bit of an Internet star. It's already picked up 25 linkbacks and we smashed our old traffic record. We've had 6,333 unique visitors in the past 24 hours.

I'm suprised we haven't seen more about this in the mainstream media, especially given that every single Check Point Edge appliance in the world shut down simultaneously. Isn't this a pretty catastrophic failure? Maybe between the U.S. baseball World Series (the San Francisco Giants won last night, thank you very much) and today's U.S. election this story is getting crowded out. I haven't had any interview requests yet...

Maybe some very large, very powerful customers will get pissed off enough so that Check Point will get motivated to spend some resources on QA.

ShadowPeak.com
2010-11-02, 21:34
Check Point's competitors certainly have picked up plenty of anti-Check Point fodder for their sales presentations due to this incident and the Zeus Trojan scareware popup in ZoneAlarm about a month and a half ago. If something like this latest incident occurred on their high-end firewalls that run SecurePlatform/Linux or Nokia IPSO it would be truly disastrous.

serlud
2010-11-03, 04:19
Some times it is better to invest some ammount of money for QA and TAC than lost them (money, customers ...) .....

boldin
2010-11-03, 07:50
Some times it is better to invest some ammount of money for QA and TAC than lost them (money, customers ...) .....

Checkpoint Systems falls 14% on results, forecast - MarketWatch (http://www.marketwatch.com/story/checkpoint-systems-falls-14-on-results-forecast-2010-11-02?siteid=rss)

Good for us value investors. CP isn't going anywhere anytime soon - pick some up at 14% off today!

serlud
2010-11-03, 08:07
Good for us value investors. CP isn't going anywhere anytime soon - pick some up at 14% off today!

Do not worry , CP QA has already implement several 32 bit counters in CP SW & HW - just for value investors like you..

.mark
2010-11-03, 08:36
Some times it is better to invest some ammount of money for QA and TAC than lost them (money, customers ...) .....

Checkpoint Systems falls 14% on results, forecast - MarketWatch (http://www.marketwatch.com/story/checkpoint-systems-falls-14-on-results-forecast-2010-11-02?siteid=rss)

Checkpoint Systems (CKP) and Check Point Software (CHKP) are different companies.

I think the reason this has been widely ignored is that the systems all reset on their own. VPNs went down, and came back up within an hour or so, near midnight (US) on a weekend. If it had been noon on a Tuesday, though....

abusharif
2010-11-03, 08:44
........ is that the systems all reset on their own.....
Not true, some needed to be power cycled.

serlud
2010-11-03, 10:20
Not true, some needed to be power cycled.

It seems that CP can not replicate this behavior in LAB ..
We also have several VPN-1 Edges not reachable after *self* reboot.

boldin
2010-11-04, 08:09
Actually, they could market this as a "feature" so that all systems that have been up for a long period of time will auto-magically reboot themselves every 13.6 years as a "scheduled maintenance."