View Single Post
  #1 (permalink)  
Old 2006-01-03
jobroco jobroco is offline
Junior Member
 
Join Date: 2005-09-02
Posts: 19
Rep Power: 0
jobroco has an average reputation (10+)
Default Splat R60 gateway stops responding

Hi everyone. I am replacing my existing Nokia IP530 gateway running NG-FP3 with a fresh install of NGX R60 on Splat w/the most current hotfixes on a new HP DL320. SIC is fine and I have successfully pushed my existing policy to it. However, when I put in into our production environment, it runs fine for a period of time then just stops responding. The following is what I have and what I have tried. Any suggestions would be appreciated.

Hardware environment:
HP DL320 72gb HD, 2.8 ghz cpu, 2gb ram, two onbaord nics, with an intel PRO/1000 MT quad nic also installed. We're using the two onboard nics (one internal, one extrernal), and only one of the quad ports (for mgmt interface). Nics are configured 100mb/full duplex.

Software/OS:
Splat NGX R60 HFA_01 gateway being managed by a Smartcenter running the same. These were not upgraded OS's, but fresh installs (accepting all defaults) with hotfixes on new servers, using the NGX import tool for the rulebase/users.

Issue:
Upon putting the gateway in place in an MPLS environment (on our external side), it runs great for a nominal period of time (usually 3-6 hours), before it stops responding. Sometimes, the system health indicator on the DL320 comes on when this happens, but not always. It also doesn't always lock up. The cpu utilization runs consistently around .08% until it reaches a point at which it jumps to 8% and stops responding. Upon a fresh boot the virtual memory and the active real memory usage increases at the exact same rate. My other gateway (R55) never touches the virtual memory and it only has 1gb ram, and has much more traffic and rules to process. The Nokia IP530 that I'm replacing runs NG FP3 with 256mb ram that never has a problem. Obviously all communications with the SmartCenter server work fine.

Here's some things that I have noticed and have tried to do to fix it.
- updated to the latest hotfixes.
- replaced and increased ram.
- replaced quad nic
- increased the /proc/sys/net/ipv4/neigh/default/gc_thresh3 since I noticed a "neighbor table overflow" message in the /var/messages logs. The default ARP table size on Splat is 1024, I increased it to 10000. Since doing so I have no longer seen those errors.
- decreased the /proc/sys/net/ipv4/neigh/default/gc_stale_time to 30 from the default of 60.

I have read some posts that some people had some issues with the IntelPro/1000 MT drivers for Splat on R60. Has anyone been able to verify this? And was that fixed in HFA_01? I have also read that R61 will be out 1-06-06. Maybe I should wait til then?

Last edited by jobroco; 2006-01-03 at 10:21.
Reply With Quote