View Single Post
  #1 (permalink)  
Old 2007-09-24
GordonCopestake GordonCopestake is offline
Member
 
Join Date: 2007-06-06
Posts: 46
Rep Power: 0
GordonCopestake has an average reputation (10+)
Default Solaris10/R65 ClusterXL setup with high CPU usage?

Hi Folks,
I have a ClusterXL setup with 2 nodes, both SunFireV210's running Solaris10 with R65. Both nodes are running perfectly with no problems, however they both show a constantly high CPU usage, approx 70%-80% at all times (in SmartView Monitor). I have ran a prstat on the nodes and got the following results:

Code:
#prstat
PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
   622 root       63M   47M run     59    0   3:11:35 3.1% fw/3
   574 root       98M   84M sleep   59    0   1:35:48 1.3% cpd/4
 23830 root       19M 8656K run     52    0   0:00:00 0.6% fgate/1
 23587 root     8456K 4632K sleep   59    0   0:00:00 0.4% sshd/1
   684 root       15M 9168K sleep   59    0   0:21:29 0.3% dtls/1
   275 root     2288K  832K sleep   59    0   0:13:36 0.2% in.routed/1
 23869 root     3240K 2800K cpu0    59    0   0:00:00 0.1% prstat/1
As far as I can see there is no adverse CPU usage from any process however my load averages do seem high:

Code:
load average: 1.88, 1.87, 1.89
Also a vmstat 3, looks like this:

Code:
# vmstat 3
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 -- -- --   in   sy   cs us sy id
 1 0 0 2562480 548144 1184 4596 1 295 295 0 0 22 0 0 0 1439 14086 616 35 52 13
 2 0 0 2554920 535784 955 5467 0 508 508 0 0 31 0 0  0  526 17325 674 43 40 17
 0 0 0 2556176 537040 948 5470 0 505 505 0 0 30 0 0  0  519 17097 663 45 40 15
 1 0 0 2555048 534608 945 5362 0 252 252 0 0 26 0 0  0  482 16041 628 45 39 16
Nothing here looks out of the ordinary but I can't seem to trace what is using the CPU (and the V210's are fairly meaty machines).

That being said a ps -A gives

Code:
PID TTY         TIME CMD
     0 ?           0:10 sched
     1 ?           0:01 init
     2 ?           0:00 pageout
     3 ?           9:25 fsflush
     7 ?           0:11 svc.star
     9 ?           0:16 svc.conf
   376 ?           0:02 syslogd
   312 ?           0:00 cron
    31 ?           0:00 fwboot
   275 ?          13:37 in.route
   316 ?           0:00 rpcbind
   186 ?           0:59 nscd
   329 console     0:00 sh
   166 ?           0:00 sysevent
   320 ?           0:00 sac
   181 ?           0:00 kcfd
   368 ?           0:00 sshd
   323 ?           0:00 ttymon
   341 ?           0:00 automoun
   326 ?           0:09 inetd
 23232 pts/1       0:00 ps
   179 ?           1:57 picld
   342 ?           0:01 automoun
   394 ?           0:00 fwboot
   338 ?           0:01 utmpd
   371 ?           0:09 fmd
  5217 ?           0:00 in.dhcpd
   386 ?           0:01 sendmail
 23185 ?           0:00 fgate
   392 ?           0:18 sendmail
 13429 ?           0:01 fwssd
   542 ?           0:00 cprid
   515 ?           0:00 cprid_wd
 13430 ?           0:01 fwssd
 23587 ?           0:00 sshd
   770 ?           0:55 rtm
   622 ?         191:53 fw
   562 ?           1:08 cpwd
   574 ?          95:56 cpd
   620 ?           1:26 cphamcse
 23230 ?           0:00 sh
   678 ?           6:54 vpn
 23231 ?           0:00 arp
   733 ?           4:54 fgate
   683 ?           2:19 dtps
   684 ?          21:31 dtls
 23784 pts/1       0:00 sh
 23782 ?           0:00 sshd
There do seem to be some high CPU times there, but prstat says things like fw is only using 3% of the CPU. What am i missing?

Any help in sorting out this rouge CPU usage would be appreciated, i'm the first to admit i'm not a solaris expert!

Also for reference we have an identical V210 with Sol10/R62 in a non-clustered setup at another site that has 1% CPU usage, which is why I find these 2 at 80% odd. It's also why I have posted here in the clustering section as I feel this may have something to do with it!

Last edited by GordonCopestake; 2007-09-24 at 01:37. Reason: Minor typos corrected
Reply With Quote