PDA

View Full Version : Per VS SNMP and CPU load



netw0rker
2014-06-02, 02:04
Hi,
have anybody tried to enable per VS SNMP on an VSX gateway with 41 virtual switches and 89 virtual systems?

I have setup a test system with R77.10 on an Open Server with 8 CPU cores and after enabling per VS SNMP the box gets busy:


top - 05:57:57 up 14:19, 2 users, load average: 10.50, 6.85, 3.52
Tasks: 1167 total, 8 running, 1158 sleeping, 0 stopped, 1 zombie
Cpu0 : 39.5%us, 23.0%sy, 0.0%ni, 36.5%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu1 : 41.1%us, 34.5%sy, 0.0%ni, 24.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 39.6%us, 4.0%sy, 0.0%ni, 56.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 47.0%us, 27.8%sy, 0.0%ni, 24.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 38.0%us, 6.9%sy, 0.0%ni, 54.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu5 : 43.1%us, 5.6%sy, 0.0%ni, 50.7%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu6 : 43.3%us, 5.9%sy, 0.0%ni, 49.8%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu7 : 38.7%us, 5.6%sy, 0.0%ni, 55.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 49313096k total, 15680368k used, 33632728k free, 407084k buffers
Swap: 33551744k total, 0k used, 33551744k free, 3756428k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10717 admin 25 0 11236 3024 2328 R 79 0.0 7:17.03 snmpd
5486 admin 15 0 1640 644 548 S 28 0.0 24:36.46 syslogd
11046 admin 16 0 31156 11m 8444 S 12 0.0 0:15.32 snmpd_10
11123 admin 16 0 31156 11m 8444 S 12 0.0 0:11.81 snmpd_36
22256 admin 16 0 31156 11m 8444 S 12 0.0 0:03.88 snmpd_103
20893 admin 16 0 31152 11m 8444 S 12 0.0 0:06.48 snmpd_77
22308 admin 16 0 31156 11m 8444 S 12 0.0 0:02.93 snmpd_111
11118 admin 15 0 31156 11m 8444 S 11 0.0 0:12.64 snmpd_34
14132 admin 16 0 31148 11m 8444 S 11 0.0 0:05.56 snmpd_59
22214 admin 15 0 31152 11m 8444 S 11 0.0 0:04.78 snmpd_94
19565 admin 15 0 31156 11m 8444 S 10 0.0 0:08.21 snmpd_74
11091 admin 16 0 31156 11m 8444 R 8 0.0 0:13.44 snmpd_21

There is no traffic passing the gateway and also no SNMP requests. The load is generated only by a bunch of SNMPD’s.

Rgds
netw0rker

jflemingeds
2014-06-02, 10:20
Hi,
have anybody tried to enable per VS SNMP on an VSX gateway with 41 virtual switches and 89 virtual systems?

I have setup a test system with R77.10 on an Open Server with 8 CPU cores and after enabling per VS SNMP the box gets busy:


top - 05:57:57 up 14:19, 2 users, load average: 10.50, 6.85, 3.52
Tasks: 1167 total, 8 running, 1158 sleeping, 0 stopped, 1 zombie
Cpu0 : 39.5%us, 23.0%sy, 0.0%ni, 36.5%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu1 : 41.1%us, 34.5%sy, 0.0%ni, 24.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 39.6%us, 4.0%sy, 0.0%ni, 56.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 47.0%us, 27.8%sy, 0.0%ni, 24.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 38.0%us, 6.9%sy, 0.0%ni, 54.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu5 : 43.1%us, 5.6%sy, 0.0%ni, 50.7%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu6 : 43.3%us, 5.9%sy, 0.0%ni, 49.8%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu7 : 38.7%us, 5.6%sy, 0.0%ni, 55.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 49313096k total, 15680368k used, 33632728k free, 407084k buffers
Swap: 33551744k total, 0k used, 33551744k free, 3756428k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10717 admin 25 0 11236 3024 2328 R 79 0.0 7:17.03 snmpd
5486 admin 15 0 1640 644 548 S 28 0.0 24:36.46 syslogd
11046 admin 16 0 31156 11m 8444 S 12 0.0 0:15.32 snmpd_10
11123 admin 16 0 31156 11m 8444 S 12 0.0 0:11.81 snmpd_36
22256 admin 16 0 31156 11m 8444 S 12 0.0 0:03.88 snmpd_103
20893 admin 16 0 31152 11m 8444 S 12 0.0 0:06.48 snmpd_77
22308 admin 16 0 31156 11m 8444 S 12 0.0 0:02.93 snmpd_111
11118 admin 15 0 31156 11m 8444 S 11 0.0 0:12.64 snmpd_34
14132 admin 16 0 31148 11m 8444 S 11 0.0 0:05.56 snmpd_59
22214 admin 15 0 31152 11m 8444 S 11 0.0 0:04.78 snmpd_94
19565 admin 15 0 31156 11m 8444 S 10 0.0 0:08.21 snmpd_74
11091 admin 16 0 31156 11m 8444 R 8 0.0 0:13.44 snmpd_21

There is no traffic passing the gateway and also no SNMP requests. The load is generated only by a bunch of SNMPD’s.

Rgds
netw0rker

I've never been lucky enough to support VSX in a production enviroment, but i would also point out that syslog is going bonkers as well. I'm guessing snmpd is generating a crap load of events for some reason. Maybe there is something in the logs that explains whats going on? If no maybe see if you can increase syslog level.

netw0rker
2014-06-02, 13:42
Syslog is running normal after I restart the service.

I try to start an snmpd for VS 95 from CLI with Debug options:


[Expert@scpnek1:0]# /etc/snmp/vsx-proxy/CTX/95/snmpd_95 -D -Lo -f -C -c /etc/snmp/vsx-proxy/CTX/95/snmpd.user.conf,/etc/snmp/vsx-proxy/CTX/95/snmpd.local.conf /tmp/snmpd95_uds localhost | grep wrp
access:ipaddress:container: addr fe800000000000000212c1fffeed0068, index 405, pfx 64, scope 32, flags 0x80, name wrp3840
verbose:access:interface:ioctl: ioctl 35123 for wrp3840
access:ipaddress:container: addr fe800000000000000212c1fffeed004a, index 351, pfx 64, scope 32, flags 0x80, name wrp583
verbose:access:interface:ioctl: ioctl 35123 for wrp583
access:ipaddress:container: addr fe800000000000000212c1fffeed005b, index 286, pfx 64, scope 32, flags 0x80, name wrpj585
verbose:access:interface:ioctl: ioctl 35123 for wrpj585
access:ipaddress:container: addr fe800000000000000212c1fffeed010f, index 294, pfx 64, scope 32, flags 0x80, name wrpj8768
verbose:access:interface:ioctl: ioctl 35123 for wrpj8768
access:ipaddress:container: addr fe800000000000000212c1fffeed0086, index 313, pfx 64, scope 32, flags 0x80, name wrp4544
verbose:access:interface:ioctl: ioctl 35123 for wrp4544
access:ipaddress:container: addr fe800000000000000212c1fffeed000e, index 311, pfx 64, scope 32, flags 0x80, name wrp768
verbose:access:interface:ioctl: ioctl 35123 for wrp768
access:ipaddress:container: addr fe800000000000000212c1fffeed0097, index 166, pfx 64, scope 32, flags 0x80, name wrpj4992
verbose:access:interface:ioctl: ioctl 35123 for wrpj4992
access:ipaddress:container: addr fe800000000000000212c1fffeed0079, index 386, pfx 64, scope 32, flags 0x80, name wrpj4160
verbose:access:interface:ioctl: ioctl 35123 for wrpj4160
access:ipaddress:container: addr fe800000000000000212c1fffeed00d3, index 224, pfx 64, scope 32, flags 0x80, name wrpj6848
verbose:access:interface:ioctl: ioctl 35123 for wrpj6848
access:ipaddress:container: addr fe800000000000000212c1fffeed00f1, index 172, pfx 64, scope 32, flags 0x80, name wrpj7808
verbose:access:interface:ioctl: ioctl 35123 for wrpj7808
access:ipaddress:container: addr fe800000000000000212c1fffeed00a4, index 249, pfx 64, scope 32, flags 0x80, name wrp5440
verbose:access:interface:ioctl: ioctl 35123 for wrp5440
access:ipaddress:container: addr fe800000000000000212c1fffeed00c2, index 241, pfx 64, scope 32, flags 0x80, name wrp596
verbose:access:interface:ioctl: ioctl 35123 for wrp596

It seems that each SNMPD enumarates ALL interfaces of the cluster in a loop. Debug output will produce some 400,000 events per second.
And all these interface belongs not to VS 95.

jflemingeds
2014-06-02, 14:23
Well.. that sounds terrible. Is that debug with only that one snmp running or is there still a parent snmpd running that doesn't have the snmp vs attached to it? Its the highest cpu talker on your top output.

Sounds like a bug either way. I'm guessing your opening a ticket at this point?

netw0rker
2014-06-02, 14:32
All other SNMPD's are still running. I kill only one and start it from CLI to debug. But all SNMPD's behaves in the same way.

Ticket is already open...

jflemingeds
2014-06-02, 15:32
All other SNMPD's are still running. I kill only one and start it from CLI to debug. But all SNMPD's behaves in the same way.

Ticket is already open...

I'm thinking you should debug the parent snmp process. My guess is checkpoint is just going to do that anyway if its not a known issue.

What version are you running btw? Hopefully it will be an easy fix. Keep everyone in the loop so the other brave souls that support vsx know what to do.

netw0rker
2014-06-03, 11:36
Debug output from the parent SNMPD looks similar. The gateway is running R77.10 build 243.

netw0rker
2014-06-05, 13:56
Check Point wasn’t able to fix the issue: With a lot of virtual systems and a lot of interfaces you have to expect high CPU load (even on an idle gateway).

Maybe it will be fixed in future versions… It seems that with R77.10 this feature is not usable on large deployments. Case closed.

Is really nobody using per VS SNMP on an large VSX gateway?

jflemingeds
2014-06-05, 14:04
Check Point wasn’t able to fix the issue: With a lot of virtual systems and a lot of interfaces you have to expect high CPU load (even on an idle gateway).

Maybe it will be fixed in future versions… It seems that with R77.10 this feature is not usable on large deployments. Case closed.

Is really nobody using per VS SNMP on an large VSX gateway?

Holy cow...

Is this a new install or are you evaluating firewalls?

Maybe Phoneboy can help you out?

netw0rker
2014-06-05, 14:59
It’s a test to prepare the update of an R67 cluster.

I will escalated the issue via an local SE anyway.