PDA

View Full Version : High SI load which doesn't go down



iku899
2012-12-04, 11:27
Hello,
We have 4600 appliance with R75.30.
several weeks ago we started to have problem that the load of the gateway suddenly goes up and never return back to usual numbers. When I take a look to the output of "top" command I can see very big number at SI column.

The only way how to recover (without restart of firewall) is to disconnect network connection for a 10 secunds and connect it again. From that time firewall has usual numbers of load again (cca 20%). I attached output from top command and load of firewall via snmp.

Thank you
Ivan

bmolnar
2012-12-04, 16:14
Hello,
We have 4600 appliance with R75.30.
several weeks ago we started to have problem that the load of the gateway suddenly goes up and never return back to usual numbers. When I take a look to the output of "top" command I can see very big number at SI column.
How busy is your firewall? Typical number packet rate, connections, throughput, etc. Do any of those increase when SI is high? Is there a running process that is showing high CPU during this time? Is the firewall also the management server? When running 'top', hit the "1" key to expand the top portion and post again along with the top few processes. If you want to see which CPU core those processes are assigned to, hit "f", then "j" then hit enter.

iku899
2012-12-07, 03:28
How busy is your firewall? Typical number packet rate, connections, throughput, etc. Do any of those increase when SI is high? Is there a running process that is showing high CPU during this time? Is the firewall also the management server? When running 'top', hit the "1" key to expand the top portion and post again along with the top few processes. If you want to see which CPU core those processes are assigned to, hit "f", then "j" then hit enter.

Hello,
How busy is your firewall - I attached picture (firewall_pu_cnn.JPG) where you can see that there is nearly no traffic and SI goes up and up.
Running processes - in the pictures firewall_top1.JPG and firewall_top2.JPG - no processes with high cpu
bytethrough - in the picture firewall_ByteThrough.JPG

How I got back to "ordinary" numbers - for 10 second I disconnected switch port where internal interface of firewall is attached - you can see the gap in firewall-gap.JPG

Best Regards
Ivan694695696697698

ShadowPeak.com
2012-12-07, 11:20
hi (hardware interrupts), si (software interrupts) and sy (system) indicate CPU activity within the kernel of the system which is why you don't see a particular process eating the CPU. hi is the processing of hardware interrupts from things like your NIC cards and sy is short-term execution of kernel driver code such as the INSPECT driver, IP, etc. si is bit complicated to explain but basically it is the rescheduling of a long-running kernel task that exceeded the maximum slice of CPU it is allowed to consume in one continuous interval. It is not desirable to have a piece of kernel code monopolize a CPU for too long even if it has lots of work to do, other code needs CPU access so the si rescheduling process ensures this and also assists with multi-threaded execution of code.

The high si you're seeing is almost certainly something going on with Check Point's code in the kernel. If it is something inside the INSPECT driver itself there is really no way to determine what it is doing without some serious kernel debugging coordinated with Check Point support. However there is another Check Point kernel driver that I've seen cause issues like this and it is the rtm driver which is for SmartView Monitor. Do you have the SmartView Monitor checkbox set on your firewall/cluster object? Try unchecking it and reinstalling policy. Another driver that could possibly be the culprit is qos, try disabling that one as well if you have it set; bear in mind that doing so will halt QoS enforcement. It may be some other kernel-implemented feature on your firewall causing it but these two are the usual suspects.

Even so the CPU load on your system is not excessive and there is plenty of headroom for normal operations or a burst of traffic, if your CPU was consistently north of 75% doing this I'd be a little more worried.

iku899
2013-03-21, 07:51
Finally this problem is solved. Problem is in the blade "Application Control". When I switched it off completaly and restarted the whole machine (not only checkpoint products) problem dissapeared. Checkpoint then made a patch - "fw1_wrapper_HOTFIX_FOXX_HF_019_003 build 983003004_1". Even the number of the build is important, previous builds didn't repaired problem.

I have to mention technical support of Checkpoint - it was by far the worst experience I had with technical support of any product. We spent two months quarelling that 4600 appliance should be able to cope with 600 users. Without help of local Checkpoint representation I wouldn't be able to persuade support engineer. All the time I was pressed for buing more powerfull machine.

After patching our load is around 15%.

Best Regards
Ivan

alienbaby
2013-03-21, 11:47
Are you still running R75.30 or did you upgrade to a later version prior to applying this patch?

iku899
2013-03-21, 12:17
Are you still running R75.30 or did you upgrade to a later version prior to applying this patch?

We are still at 75.30 version.

Best Regards
Ivan