PDA

View Full Version : VPN troubles after upgrade to R65



deimoss
2009-11-08, 06:15
After upgrading a (R62) Performance Pack cluster to VSX NGX R65 in the past, and found that some VPN tunnels failed post upgrade.

Some tunnels come up OK.

The problematic tunnels establish OK, according to vpn tu, and the Tracker.
Having traced the traffic with fw monitor, I see the forward and return traffic in / out for each interface.
It seems that the return traffic is simply not routed successfully back down the tunnel.
Support found no issue in the debugs I captured in the short time the system was at R65.

Since the tunnels are critical, I found myself having to roll back to the previous version.

When I replicated the system in the lab I got weird errors I haven't seen in production, "encryption failure: Different community ID, possible NAT problem"

I attribute this to using hardware that is not supported or certified.


Support told me that "NGX R60, R61 and R62 are a little bit tolerant for some configurations . But, R65 is more sensitive, any simple mistake it will not tolerate them."
There doesn't seem to be any relevant fixes in HFA-10 either, but I can't know for sure without testing that also.

I'm trying to get better lab gear. Until, or indeed if, I do, I seem to be stuck.
Does anyone have experience of VPN issues that arose after this upgrade?

Thorpuse
2009-11-08, 08:24
IPSO or SPLAT?

northlandboy
2009-11-08, 16:18
We're successfully using multiple VPN tunnels on our VSX systems, and these days they're running fine.

We did have a problem for a while where shortly after pushing policy, VPN/NAT combinations got very confused, dropping traffic apparently from the NAT address to itself. This was an ongoing problem for a while. Later we made some changes to our routing tables, and it went away - but it was not apparent from the routing changes just why it should have gone away.

deimoss
2009-12-08, 20:09
Thorpuse, these boxes are SPLAT.

On Sunday I finally got this cluster upgraded to R65 (with HFA_10)

There were more than one change needed for various peerings:

For some peers (Netscreen) I found that I had to change to 'one tunnel per host' to get the tunnels up. This must be the result of some change in R65.

I staged the upgrade in the lab and realised that after upgrading to R65, the IKE ID IP address was the gateway object primary address, (an internal non routable IP) so the remote end rejected. Previously it seems to have defaulted to the 'external' IP.

This was fixed using the 'link selection' tab, which changes in R65, to use the external routable IP.

It seems simple but with so many customers' tunnels it is hard to see this and the documentation does not address these types of changes at all, AFAIK.
HFA_10 played no part but I put it on simply save the need for patching later.