Hi Folks,
I have a ClusterXL setup with 2 nodes, both SunFireV210's running Solaris10 with R65. Both nodes are running perfectly with no problems, however they both show a constantly high CPU usage, approx 70%-80% at all times (in SmartView Monitor). I have ran a prstat on the nodes and got the following results:
Code:
#prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
622 root 63M 47M run 59 0 3:11:35 3.1% fw/3
574 root 98M 84M sleep 59 0 1:35:48 1.3% cpd/4
23830 root 19M 8656K run 52 0 0:00:00 0.6% fgate/1
23587 root 8456K 4632K sleep 59 0 0:00:00 0.4% sshd/1
684 root 15M 9168K sleep 59 0 0:21:29 0.3% dtls/1
275 root 2288K 832K sleep 59 0 0:13:36 0.2% in.routed/1
23869 root 3240K 2800K cpu0 59 0 0:00:00 0.1% prstat/1
As far as I can see there is no adverse CPU usage from any process however my load averages do seem high:
Code:
load average: 1.88, 1.87, 1.89
Also a vmstat 3, looks like this:
Code:
# vmstat 3
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 -- -- -- in sy cs us sy id
1 0 0 2562480 548144 1184 4596 1 295 295 0 0 22 0 0 0 1439 14086 616 35 52 13
2 0 0 2554920 535784 955 5467 0 508 508 0 0 31 0 0 0 526 17325 674 43 40 17
0 0 0 2556176 537040 948 5470 0 505 505 0 0 30 0 0 0 519 17097 663 45 40 15
1 0 0 2555048 534608 945 5362 0 252 252 0 0 26 0 0 0 482 16041 628 45 39 16
Nothing here looks out of the ordinary but I can't seem to trace what is using the CPU (and the V210's are fairly meaty machines).
That being said a ps -A gives
Code:
PID TTY TIME CMD
0 ? 0:10 sched
1 ? 0:01 init
2 ? 0:00 pageout
3 ? 9:25 fsflush
7 ? 0:11 svc.star
9 ? 0:16 svc.conf
376 ? 0:02 syslogd
312 ? 0:00 cron
31 ? 0:00 fwboot
275 ? 13:37 in.route
316 ? 0:00 rpcbind
186 ? 0:59 nscd
329 console 0:00 sh
166 ? 0:00 sysevent
320 ? 0:00 sac
181 ? 0:00 kcfd
368 ? 0:00 sshd
323 ? 0:00 ttymon
341 ? 0:00 automoun
326 ? 0:09 inetd
23232 pts/1 0:00 ps
179 ? 1:57 picld
342 ? 0:01 automoun
394 ? 0:00 fwboot
338 ? 0:01 utmpd
371 ? 0:09 fmd
5217 ? 0:00 in.dhcpd
386 ? 0:01 sendmail
23185 ? 0:00 fgate
392 ? 0:18 sendmail
13429 ? 0:01 fwssd
542 ? 0:00 cprid
515 ? 0:00 cprid_wd
13430 ? 0:01 fwssd
23587 ? 0:00 sshd
770 ? 0:55 rtm
622 ? 191:53 fw
562 ? 1:08 cpwd
574 ? 95:56 cpd
620 ? 1:26 cphamcse
23230 ? 0:00 sh
678 ? 6:54 vpn
23231 ? 0:00 arp
733 ? 4:54 fgate
683 ? 2:19 dtps
684 ? 21:31 dtls
23784 pts/1 0:00 sh
23782 ? 0:00 sshd There do seem to be some high CPU times there, but prstat says things like fw is only using 3% of the CPU. What am i missing?
Any help in sorting out this rouge CPU usage would be appreciated, i'm the first to admit i'm not a solaris expert!
Also for reference we have an identical V210 with Sol10/R62 in a non-clustered setup at another site that has 1% CPU usage, which is why I find these 2 at 80% odd. It's also why I have posted here in the clustering section as I feel this may have something to do with it!