No matter what I did I could not get an IPsec site-to-site tunnel going between an offsite test network and our Microsoft Azure virtual network. Our VPN gateway is a Cisco ASA 5506.
The issue was that the Cisco ASA would try to bring up the tunnel but some part of the negotiation would go wrong at some point. Debug messages on the Cisco ASA would show something like this:
Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, IKE SA Proposal # 1, Transform # 2 acceptable Matches global IKE entry # 5 Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing ISAKMP SA payload Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing NAT-Traversal VID ver RFC payload Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing Fragmentation VID + extended capabilities payload Apr 19 09:21:41 [IKEv1]IP = 13.94.202.38, IKE_DECODE SENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128 Apr 19 09:21:42 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet. Apr 19 09:21:43 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet. Apr 19 09:21:46 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet. Apr 19 09:21:49 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128 Apr 19 09:21:57 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128 Apr 19 09:22:05 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128 Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, IKE MM Responder FSM error history (struct &0x00002aaac1c34cf0) , : MM_DONE, EV_ERROR-->MM_WAIT_MSG3, EV_TIMEOUT-->MM_WAIT_MSG3, NullEvent-->MM_SND_MSG2, EV_SND_MSG-->MM_SND_MSG2, EV_START_TMR-->MM_SND_MSG2, EV_RESEND_MSG-->MM_WAIT_MSG3, EV_TIMEOUT-->MM_WAIT_MSG3, NullEvent Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, IKE SA MM:4af079c0 terminating: flags 0x01000002, refcnt 0, tuncnt 0 Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, sending delete/delete with reason message Apr 19 09:23:06 [IKEv1]IP = 13.94.202.38, IKE_DECODE RECEIVED Message (msgid=6a9f34a4) with payloads : HDR + HASH (8) + DELETE (12) + NONE (0) total length : 68 Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, processing hash payload Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, processing delete Apr 19 09:23:06 [IKEv1]Group = 13.94.202.38, IP = 13.94.202.38, Connection terminated for peer 13.94.202.38. Reason: Peer Terminate Remote Proxy 10.100.152.0, Local Proxy 10.50.0.0 Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, Active unit receives a delete event for remote peer 13.94.202.38.
A couple of key points in the above debug messages:
- “MM_WAIT_MSG3, EV_TIMEOUT” indicates that the Cisco ASA timeout waiting for the Azure VPN gateway.
- “Duplicate first packet detected. Ignoring packet” indicates that the Azure VPN gateway is not liking the previous message that the Cisco ASA sends. Increasing the debug level (not shown above) indicates a mismatch in terms of cookies, and this is apparently what upsets the Azure Virtual Network Gateway.
This is shown on Cisco ASA debug messages at a higher debug level:
Azure InitiatorCookie: 03 83 AD 7C 10 26 CB D6 ResponderCookie: 14 42 19 27 F6 F2 DF 53 RECV PACKET from 13.91.5.150 ISAKMP Header Initiator COOKIE: 03 83 ad 7c 10 26 cb d6 Responder COOKIE: 00 00 00 00 00 00 00 00
These are debug messages produced on the Microsoft Azure side:
2016?-?03?-?02 10:31:37 ERROR user NULL 0000000FE1E59D80 0000000FE1E64320 f74513382e60832f cac68571e57c06d5 Invalid cookies. Try resetting SAs on-prem. IkeProcessPacketDispatch failed with HRESULT 0x80073616(ERROR_IPSEC_IKE_INVALID_COOKIE)
(Note the “ERROR_IPSEC_IKE_INVALID_COOKIE” error code.)
After spending sometime troubleshooting the Cisco ASA side we could not find anything wrong with the Cisco ASA configuration.
In the end, in my desperation, I decided to reset the Azure Virtual
Network Gateway and that seems to have fixed the issue for good.
The process to reset an Azure Virtual Network Gateway is a bit tricky because there is no way to do that using the Azure Portal; it needs to be done using PowerShell instead.
This is what I did to reset the Azure Virtual Network Gateway using
PowerShell:
1. Install Azure PowerShell. I used the instructions here:
https://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/
In particular, I went with the leaner and perhaps more complicated
installation from the PowerShell Gallery (instead of installing from
WebPI).
2. After Azure PowerShell was installed, I opene a PowerShell command window and ran the following commands:
Login-AzureRmAccount Select-AzureRmSubscription -SubscriptionName "<your subscription name>" $vg = Get-AzureRmVirtualNetworkGateway -ResourceGroupName RG Reset-AzureRmVirtualNetworkGateway -VirtualNetworkGateway $vg
Apparently, Azure Virtual Network Gateways are redundant Virtual Machines so resetting one will cause the other to take over.
The other one could be reset by invoking “Reset-AzureRmVirtualNetworkGateway” a few minutes after the first gateway has been reset but in my case the site-to-site VPN tunnel came up after resetting only one of the gateways.
Note that the above XXXXX-AzureRmXXXXX PowerShell cmdlets use the new
Azure Resource Manager deployment model. Similar commands would have to
be used if the classic deployment model is used instead.
This article:
https://azure.microsoft.com/en-us/documentation/articles/vpn-gateway-resetgw-classic/
is a good reference for how to reset Azure Virtual Network Gateways that have been deployed using the classic deployment model. Note that it says that the same cannot be done for the Resource Manager deployment model but I think the capability is there now (I used it) and it is just that the article has not been updated yet.
On a related note, I should mention that another way of dealing with this problem is by deploying a Cisco ASAv virtual appliance and using that to terminate the site-to-site IPsec tunnel instead of terminating it on the Microsoft-provided Azure Virtual Network Gateway. This of course would be more expensive given that licenses for the Cisco ASAv would have to be purchased, plus it is another Virtual Machine that would have to deployed (and pay for).