Site-to-site VPN Between AWS And OpenWRT With strongSwan

We recently had to configure a site-to-site IPsec-based VPN connection between AWS and a small router running OpenWrt 19.07. This post goes over the details so we remember what we did next time we have to do this…

OpenWrt

On the OpenWrt side, what is used is strongSwan. We tried to use as much OpenWrt “configuration flavor” as possible thinking that if we follow the way things are configured in OpenWrt future upgrades might be easier as configuration would not get lost.

To start, on OpenWrt we installed the following packages using opkg:

  1. kmod-ip-vti
  2. strongswan-minimal
  3. vti
  4. vti4

We then added the following to /etc/ipsec.conf (AWS makes it easy by providing this as a template, which you can download from AWS VPC service -> VPN -> Site-to-Site VPN Connections):

conn AWS-Tunnel1
auto=start
left=%defaultroute
leftid=65.x.x.x
right=18.x.x.x
type=tunnel
leftauth=psk
rightauth=psk
keyexchange=ikev2
ike=aes128-sha1-modp1024
ikelifetime=8h
esp=aes128-sha1-modp1024
lifetime=1h
keyingtries=%forever
leftsubnet=0.0.0.0/0
rightsubnet=0.0.0.0/0
dpddelay=10s
dpdtimeout=30s
dpdaction=restart
## Please note the following line assumes you only have two tunnels
## in your Strongswan configuration file. This "mark" value must be
## unique and may need to be changed based on other entries in your
## configuration file.
mark=100

conn AWS-Tunnel2
auto=start
left=%defaultroute
leftid=65.x.x.x
right=19.x.x.x
type=tunnel
leftauth=psk
rightauth=psk
keyexchange=ikev2
ike=aes128-sha1-modp1024
ikelifetime=8h
esp=aes128-sha1-modp1024
lifetime=1h
keyingtries=%forever
leftsubnet=0.0.0.0/0
rightsubnet=0.0.0.0/0
dpddelay=10s
dpdtimeout=30s
dpdaction=restart
## Please note the following line assumes you only have two tunnels in your Strongswan configuration file. This "mark" value must be unique and may need to be changed based on other entries in your configuration file.
mark=200

The following was added to /etc/ipsec.secrets:

65.x.x.x 18.x.x.x : PSK "PSK1 goes here"
65.x.x.x 19.x.x.x : PSK "PSK2 goes here"

The following was added to /etc/config/network (this creates the VTI interfaces on Linux):

config interface 'vti1'
option proto 'vti'
option mtu '1500'
option tunlink 'wan'
option peeraddr '18.x.x.x'
option zone 'vpn'
option ikey '100'
option okey '100'

config interface 'vti1_static'
option proto 'static'
option ifname '@vti1'
list ipaddr '169.254.132.86/30'

config interface 'vti2'
option proto 'vti'
option mtu '1500'
option tunlink 'wan'
option peeraddr '19.x.x.x'
option zone 'vpn'
option ikey '200'
option okey '200'

config interface 'vti2_static'
option proto 'static'
option ifname '@vti2'
list ipaddr '169.254.133.142/30'

config route
option target '10.0.0.0/8'
option interface 'vti1_static'

config route
option target '10.0.0.0/8'
option metric '10'
option interface 'vti2_static'

(The IP addresses under the vtiX_static interfaces are for the point-to-point link. AWS provides those.)

 

AWS

Step 1: Log into your AWS account and go to Services -> VPC

Step 2: Under “Virtual Private Network (VPN)”, go to “Customer Gateways” and create a new customer gateway. We chose “static” routing for this example.

One can attach a site-to-site connection to a Virtual Private Gateway, or to a Transit Gateway. Because our AWS infrastructure had a Transit Gateway we chose to attach the new site-to-site VPN connection to it, so we did not have to create a Virtual Private Gateway.

Step 3: Go to “Virtual Private Network (VPN)” -> “Site-to-Site VPN Connections” and create your site-to-site VPN connection.

Note local and remote IPv4 network CIDRs — they are empty, which means 0.0.0.0/0. That is what we want and it means that we will send through the VPN tunnel whatever needs to be sent based on routing tables. This type of VPN is called “route-based” VPN, and contrasts with “policy-based” VPN.

Note also that from “Virtual Private Network (VPN)” -> “Site-to-Site VPN Connections”  is where you can download IPsec configuration tempates for VPN gateways from different vendors. In our case, because our VPN gateway is a router running strongSwan-based IPsec running on OpenWrt, we chose “Strongswan”, as shown in the following screenshot:

Step 4: Every subnet in your AWS VPC that needs to reach your remote site must have in its route table a route to your remote subnet that points to your Transit Gateway. To (in our case, statically) configure this, go to “Virtual Private Cloud” -> “Subnets”, select the subnet that needs to reach your remote network, edit, and add a static route. For example:

Step 5: Create a Transit Gateway Attachment in “Transit Gateways” -> “Transit Gateway Attachments”. This links your Transit Gateway and your Customer Gateway.

Step 6: Finally, create a route table for your Transit Gateway in “Transit Gateways” -> “Transit Gateway Route Tables”. Only the Transit Gateway needs to be specified:

Edit the Transit Gateway route table that you just created and create an association to the the attachment you created in step #5:

Create a new static route in the Transit Gateway route table that you just created and that uses your VPN attachment:

That should be it — after all this your VPN tunnels should come up.

Monitoring

In AWS, you will see the status of your tunnels in VPC -> Virtual Private Network (VPN) -> Site-to-Site VPN Connections -> select your VPN site-to-site connection -> “Tunnel Details” tab. For example:

In OpenWrt you can use the command “ipsec statusall” to get details about the tunnels. For example:

root@Shangri-La:~# ipsec statusall
Status of IKE charon daemon (strongSwan 5.8.2, Linux 4.14.221, mips):
uptime: 7 days, since Mar 22 14:57:27 2021
worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 6
loaded plugins: charon aes sha1 random nonce x509 pubkey gmp xcbc hmac kernel-netlink socket-default stroke updown
Listening IP addresses:
10.10.10.1
65.x.x.x.x
[...]
Connections:
AWS-Tunnel1: %any...18.x.x.x IKEv2, dpddelay=10s
AWS-Tunnel1: local: [65.x.x.x] uses pre-shared key authentication
AWS-Tunnel1: remote: [19.x.x.x] uses pre-shared key authentication
AWS-Tunnel1: child: 0.0.0.0/0 === 0.0.0.0/0 TUNNEL, dpdaction=restart
AWS-Tunnel2: %any...18.x.x.x IKEv2, dpddelay=10s
AWS-Tunnel2: local: [65.x.x.x] uses pre-shared key authentication
AWS-Tunnel2: remote: [19.x.x.x] uses pre-shared key authentication
AWS-Tunnel2: child: 0.0.0.0/0 === 0.0.0.0/0 TUNNEL, dpdaction=restart
Security Associations (2 up, 0 connecting):
AWS-Tunnel1[47]: ESTABLISHED 65 minutes ago, 65.x.x.x[65.x.x.x]...18.x.x.x[18.x.x.x]
AWS-Tunnel1[47]: IKEv2 SPIs: b0a6f8cf2d2a7bf3_i* 8477683795cbfc06_r, pre-shared key reauthentication in 6 hours
AWS-Tunnel1[47]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
AWS-Tunnel1{495}: INSTALLED, TUNNEL, reqid 47, ESP in UDP SPIs: ce15ae7e_i c004506d_o
AWS-Tunnel1{495}: AES_CBC_128/HMAC_SHA1_96/MODP_1024, 0 bytes_i, 0 bytes_o, rekeying in 24 minutes
AWS-Tunnel1{495}: 0.0.0.0/0 === 0.0.0.0/0
AWS-Tunnel2[46]: ESTABLISHED 4 hours ago, 65.x.x.x[65.x.x.x]...19.x.x.x[19.x.x.x]
AWS-Tunnel2[46]: IKEv2 SPIs: 3d8bdfddb170a01e_i* ebd2e1b67befb4e2_r, pre-shared key reauthentication in 3 hours
AWS-Tunnel2[46]: IKE proposal: AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
AWS-Tunnel2{494}: INSTALLED, TUNNEL, reqid 46, ESP in UDP SPIs: c9df3c20_i c58ec754_o
AWS-Tunnel2{494}: AES_CBC_128/HMAC_SHA1_96/MODP_1024, 549898 bytes_i (4342 pkts, 3s ago), 425444 bytes_o (4300 pkts, 3s ago), rekeying in 16 minutes
AWS-Tunnel2{494}: 0.0.0.0/0 === 0.0.0.0/0
root@Shangri-La:~#

 

 

References

IPsec Site-to-Site (“https://openwrt.org/docs/guide-user/services/vpn/strongswan/site2site”)

strongSwan IPsec Configuration via UCI (“https://openwrt.org/docs/guide-user/services/vpn/strongswan/configuration”)

Tunneling interface protocols (“https://openwrt.org/docs/guide-user/network/tunneling_interface_protocols”)

“How To Establish IPsec Site To Site VPN Tunnel Via VTI. | Linux | OpenWrt” by Geeky Sagar (https://www.youtube.com/watch?v=HDqAl_PozCU)

Cisco FXOS Configuration Export to Cygwin OpenSSH Server Using scp Pulling My Hair Out

The situation: I was trying to export (backup) a Cisco FXOS configuration to an SSH server using Secure Copy (scp), which is one of the methods supported by Cisco FXOS’s configuration export feature.

The SSH server is OpenSSH running in a Cygwin environment on Windows.

The issue is that the configuration export fails and the FXOS GUI just generates a generic and vague “End point timed out. Check for IP, port, password, disk space or network access related issues” error message.

What sheds some light is what the sshd process sends to the Windows Event Log:

sshd: PID 9676: fatal: seteuid 187611: Operation not permitted

Running sshd with -d (for debug), one can see that sshd does not handle that failure gracefully, and instead, terminates immediately. The client (FXOS, which is trying to use SSH to perform a secure copy), sees this as an authentication failure. This can be seen if one gets a fprm tech-support — in some file in the tech-support bundle one will see how an scp is spawned with the right arguments to perform the file copy but after the command runs one sees “Authentication failure” in the log.

After comparing good (from outside FXOS) and bad (from FXOS) scp transfers I realized that the difference is that FXOS is attempting to perform public key authentication. I have no idea where the key it is proposing comes from because I did not configure any public keys, but the fact of the matter is that it proposes a key and tries to authenticate using pubkey.

Normally if pubkey authentication is proposed and there is no matching key on the server, the client moves on to the next authentication method. However, because the SSH server is terminating abnormally because of the seteuid() error, the client cannot proceed with the next authentication method and everything dies there.

So, the main issue is the Cygwin SSH daemon’s handling of the seteuid() error, although one could argue that the real problem is that seteuid() fails. This could be the result of misconfiguration on the Cygwin SSH daemon, but whatever — on a Unix server this does not happen, and it is happening on Windows because of how complicated it is to handle POSIX accounts, permissions, and security — just read the following to get an idea of how complicated this is:

https://cygwin.com/cygwin-ug-net/ntsec.html#ntsec-setuid-overview

Now to the workaround — because pubkey authentication is essentially not working at all and is even preventing SSH clients proposing publey authentication to move to the next preferred authentication method, e.g. “password”, the workaround is to just disable pubkey authentication. On the Cygwin server where I ran into this problem this was accomplished by editing /etc/sshd_config, changing this line:

#PubkeyAuthentication yes

to:

PubkeyAuthentication no

and then restarting the sshd service.

So, if you run into some strange scp issue trying to backup (export) the FXOS configuration, try disabling pubkey authentication on the Cygwin SSH server; you might get lucky and you might get things to work.

Some other good references:

Cygwin FAQ: http://cygwin.com/faq.html#faq.using.sshd-in-domain

Somebody else running into a similar problem: http://cygwin.1069669.n5.nabble.com/seteuid-1019-Operation-not-permitted-td102924.html

Good blog post on configuring Cygwin’s SSHD: https://techtorials.me/cygwin/sshd-configuration/

 

Cisco Stealthwatch Enterprise 7.0.2 Certificate Nightmare

I manage a small Cisco Stealthwatch Enterprise 7.0.2 deployment that consists of a Stealthwatch Management Console (SMC), one FlowCollector for NetFlow (FCNF), and one FlowSensor, all virtual. (This deployment started at version 6.9.x, then got migrated to 6.10.x, and then to 7.0.2.)

The deployment had been running well for a long time, but a couple of weeks ago the identity certificates of the SMC and the FCNF appliances expired.

Stealtwatch 7.x uses a centralized appliance management model where all appliances (except the Endpoint Concentrator; should be added in a future release) are managed from the SMC.

When the certificates of the SMC and the FCNF appliances expired, the configuration/management tunnels, which are SSL connections, stopped working. I still could search flows, receive alarms, etc., but I could not manage the appliances anymore. Basically, the appliance status in the Central Management page of the SMC GUI would say “Management channel down” and have a red dot right next to the status:

For the SMC I could still use the “Edit Applicance Configuration” menu option that one gets by clicking in the “Actions” column, but for the FCNF the option was not even available.

Going to the FCNF appliance GUI to try to configure anything did not work either because, while one can log in, the GUI says that the appliance is managed from Central Management and no options are presented.

So basically absolutely no option to manage in Stealthwatch 7.x an appliance with expired certificates if the appliance is centrally managed. Kind of a Catch-22 situation — Central Management cannot talk to the appliance because the certificate is expired (management channel down), and the appliance GUI says that the appliance is centrally managed and does not allow one to do anything.

After trying lots of things, what worked was the following procedure:

  1. Remove the appliance from Central Management. To do this click on the three dots in the “Actions” menu and select the option “Remove This Appliance”. After this the appliance will not show up in the Central Management page anymore.
  2. For the SMC one is ready to fix the certificate — just go to the appliance management GUI (https://a.b.c.d/smc/index.html) and upload a new certificate via the Configuration -> SSL Certificate screen.
  3. For the other appliances there is an extra step — the appliance still thinks it is centrally managed, so one has to fix that first. The way to do that is to SSH into the appliance (username to use to log in is sysadmin), go to the Advanced menu, and select the option “RemoveAppliance”. After doing this one will be able to browse to the appliance IP address (https://a.b.c.d), log in, and fix the certificate via the Configuration -> SSL Certificate screen.
  4. Now that the certificates are good, the last step is to re-add the appliances to Central Management. That is done using the Appliance Setup Tool (AST) GUI. For non-SMC appliances the link is https://a.b.c.d/swa/loadAst and for an SMC appliance the link is https://a.b.c.d/lc-ast/. The AST will ask for basic configuration parameters, like IP address of the appliance, DNS and NTP servers, etc. It is a wizard-style tool with multiple screens. The AST on the SMC does not ask for the IP address of the SMC (I am guessing it would ask if the SMC is the secondary in a high availability SMC pair) so after running the AST on the SMC one is done and the SMC will now show up in the Central Management page. For the other appliances, however, the last screen of the AST asks for the address of the SMC. Once one provides that IP address and provides the name of the Stealthwatch domain that the appliance is part of the appliance will be added to Central Management and show up in the Central Management page. After a few minutes the status column should show “Up” for the appliance that just got readded to Central Management (see the above screenshot).

The Central Management feature of Stealthwatch 7.x and later is very convinient but it relies heavily on certificates — if certificates are not valid (like trust is not properly configured or certificates have expired) then something is going to break and there is no easy way that I was able to find to fix it.

Note: when adding new identity certificates to the appliances via the appliance management GUI one also needs to set things up so to the appliance can trust the other appliances it will talk to. This is done by making sure that the right certificate is added in the Configuration -> Certificate Authority Certificates screen.

I believe there is room for improvement in how certificates are managed under this central management scheme in Stealthwatch 7.x and later. Hopefully a future Stealthwatch release will make it easier to recover from certificate issues like expired certificates.

In the meantime, do not let your Stealthwatch 7.x and later certificates expire, and if you catch a certificate before it expires and plan to replace it then cross your fingers that the process to renew the certificates through the Central Management GUI works without issues (I did not have a chance to fix my problem that way so I don’t know how well it works).

NetworkManager strongSwan encryption algorithm ‘DES-CBC’ not supported

Recently we ran into an issue involving NetworkManager and strongSwan. The error in the systemd journal was a cryptic “encryption algorithm ‘DES-CBC’ not supported”, as shown in the following log excerpt:

Jul 19 19:14:00 el-valle NetworkManager[733]:  [1532042040.2613] audit: op="connection-activate" uuid="26f20e51-92ba-4a78-a1>
[...]
Jul 19 19:14:00 el-valle NetworkManager[733]:  [1532042040.2764] vpn-connection[0x56050004c1f0,26f20e51-92ba-4a78-a17e-1709b>
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[ASN] encryption algorithm 'DES-CBC' not supported
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[LIB] building CRED_PRIVATE_KEY - RSA failed, tried 8 builders
Jul 19 19:14:00 el-valle NetworkManager[733]:  [1532042040.2862] vpn-connection[0x56050004c1f0,26f20e51-92ba-4a78-a17e-1709b>
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[CFG] received initiate for NetworkManager connection Acme strongSwan
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[CFG] using CA certificate, gateway identity 'vpn.acme.com'
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[ASN] encryption algorithm 'DES-CBC' not supported
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[LIB] building CRED_PRIVATE_KEY - ANY failed, tried 7 builders

In the end we tracked this down to strongSwan being unable to read a private key that had been encrypted with DES. The solution was to re-encrypt the private key using AES-256:

shell$ sudo openssl rsa -in client_key.pem -aes256 -out newkey.pem
Enter pass phrase for client_key.pem:
writing RSA key
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
shell$ sudo mv newkey.pem client_key.pem

The following post was helpful to figure out what was happening:

https://lists.strongswan.org/pipermail/users/2017-June/011088.html

Site-to-site VPN Between Cisco ASA/FTD and strongSwan

I recently wasted about two days to bring up a simple site-to-site IPsec VPN tunnel between a Cisco ASA and Cisco FTD and a Linux machine running strongSwan and using digital certificates to authenticate the peers. The configuration was simple but due to a little “detail” and to a lack of good debugging information on the Cisco ASA/FTD, what should have been a five-minute job ended up taking a couple of days of troubleshooting, looking at the strongSwan source code, and making configuration changes to try to make it work. In the end I was able to bring up the tunnel and I got to the bottom of what the Cisco ASA/FTD was not liking from what strongSwan was sending. I will document here the configurations, and finally, at the end, will show what the Cisco ASA/FTD was choking on.

Cisco ASA Configuration

The basic VPN configuration on the Cisco ASA side looks like this:

access-list traffic-to-encrypt extended permit ip 10.123.0.0 255.255.255.0 10.123.1.0 255.255.255.0 
!
crypto ipsec ikev2 ipsec-proposal IPSEC-PROPOSAL
 protocol esp encryption aes-256
 protocol esp integrity sha-256 sha-1
!
crypto map MYMAP 10 match address traffic-to-encrypt
crypto map MYMAP 10 set peer 10.118.57.149 
crypto map MYMAP 10 set ikev2 ipsec-proposal IPSEC-PROPOSAL
crypto map MYMAP 10 set trustpoint TRUSTPOINT chain
!
crypto map MYMAP interface outside
!
crypto ca trustpoint TRUSTPOINT
 revocation-check crl
 keypair TRUSTPOINT
 crl configure
 policy static
 url 1 http://www.chapus.net/ChapulandiaCA.crl
!
crypto ikev2 policy 10
 encryption aes-256
 integrity sha256
 group 14
 prf sha
 lifetime seconds 86400
!
crypto ikev2 enable outside
!
tunnel-group 10.118.57.149 type ipsec-l2l
tunnel-group 10.118.57.149 ipsec-attributes
 ikev2 remote-authentication certificate
 ikev2 local-authentication certificate TRUSTPOINT

Note that the FTD configuration is very similar, but it has to be performed via the Firepower Management Center (FMC) GUI. In fact, after doing the configuration via FMC one can log into the FTD CLI using SSH and run the command “show running-config” and see the same configuration shown above for the ASA.

strongSwan Configuration (ipsec.conf)

The ipsec.conf configuration file (typically located at /etc/ipsec.conf) is the old way of configuring the strongSwan IPsec subsystem. The following ipsec.conf file contents allowed the tunnel to come up with no problems:

config setup
	strictcrlpolicy=yes
	cachecrls = yes

ca MyCA
	crluri = http://www.example.com/MyCA.crl
	cacert = ca.pem
	auto = add

conn %default
	ikelifetime=60m
	keylife=20m
	rekeymargin=3m
	keyingtries=1
	keyexchange=ikev2
	mobike=no

conn net-net
	leftcert=rpi.pem
	leftsubnet=10.123.1.0/24
	leftfirewall=yes
	right=10.122.109.113
	rightid="C=US, ST=CA, L=SF, O=Acme, OU=CSS, CN=asa, E=admin@example.com"
	rightsubnet=10.123.0.0/24
	auto=add

In addition to the ipsec.conf file, the ipsec.secrets (typically /etc/ipsec.secrets) also has to be edited, in this case to indicate the name of the private RSA key. Our ipsec.secrets file looks like this:

# ipsec.secrets - strongSwan IPsec secrets file

: RSA mykey.pem

Finally, certain certificates and the RSA key must be placed (all in PEM format) in certain directories under /etc/ipsec.d:

The Linux machine’s identity certificate goes into /etc/ipsec.d/cert/. strongSwan automatically loads that certificate upon startup
The Certification Authority (CA) root certificate goes into /etc/ipsec.d/cacerts/
The private key must be placed in /etc/ipsec.d/private/

strongSwan Configuration (swanctl.conf)

swanctl.conf is a new configuration file that is used by the swanctl(8) tool to load configurations and credentials into the strongSwan IKE daemon. This is the “new” way to configure the strongSwan IPsec subsystem. The configuration file syntax is very different, though the parameters that need to be set to be able to bring up the IPsec tunnel are the same as in the case of the ipsec.conf-based configuration.

A swanctl.conf-based configuration is more modular. Configuration files typically exist under /etc/swanctl/. For our specific connection, we put the configuration in the file /etc/swanctl/conf.d/example.conf, which gets included from /etc/swanctl/swanctl.conf. Our /etc/swanctl/example.conf file contains the following:

connections {

    # Section for an IKE connection named .
    my-connection {
        # IKE major version to use for connection.
        version = 2

        # Remote address(es) to use for IKE communication, comma separated.
        # remote_addrs = %any
	remote_addrs = 10.122.109.113

        # Section for a local authentication round.
        local-1 {
            # Comma separated list of certificate candidates to use for
            # authentication.
            certs = rpi.pem
        }

        children {

            # CHILD_SA configuration sub-section.
            my-connection {
                # Local traffic selectors to include in CHILD_SA.
                # local_ts = dynamic
                local_ts = 10.123.1.0/24

                # Remote selectors to include in CHILD_SA.
                # remote_ts = dynamic
		remote_ts = 10.123.0.0/24
            }
        }
    }
}

# Section defining secrets for IKE/EAP/XAuth authentication and private key
# decryption.
secrets {
    # Private key decryption passphrase for a key in the private folder.
    private-rpikey {
        # File name in the private folder for which this passphrase should be
        # used.
        file = rpi.pem

        # Value of decryption passphrase for private key.
        # secret =
    }
}

# Section defining attributes of certification authorities.
authorities {
    # Section defining a certification authority with a unique name.
    MyA {
        # CA certificate belonging to the certification authority.
        cacert = myca.pem

        # Comma-separated list of CRL distribution points.
        crl_uris = http://www.chapus.net/ChapulandiaCA.crl
    }
}

Bringing Up the Tunnel on Interesting Traffic

To bring up the tunnel when “interesting” traffic is received it is necessary to use the “start_action” configuration parameter. Otherwise the IPsec tunnel has to be brought up manually using the swanctl –initiate xxxxx command.

Here’s an example configuration that uses “start_action”:

connections {

# Section for an IKE connection named <conn>.
 lab-vpn {
 version = 2

 remote_addrs = 10.1.10.114

 local-1 {
 certs = rpi.pem
 }

children {
 # CHILD_SA configuration sub-section.
 lab-vpn {
 local_ts = 10.123.1.0/24, 10.10.0.0/16
 remote_ts = 10.123.0.0/24

start_action = trap
 }
 }
 }

}

Automatically Starting Charon

The charon-systemd daemon implements the IKE daemon very similar to charon, but is specifically designed for use with systemd. It uses the systemd libraries for a native integration and comes with a simple systemd service file.

In vesions of strongSwan prior to 5.8.0 one needed to enable the systemd service “strongswan-swanctl”. In versions 5.8.0 and later it is now “strongswan”.

To start the charon-systemd daemon when the system boots just use systemctl to enable the service:

systemctl enable strongswan

Reference: https://wiki.strongswan.org/projects/strongswan/wiki/Charon-systemd

Issues

There were three serious issues that I ran into when trying to bring up the site to site tunnel. All of them appear to be bugs.

Cisco ASA/FTD Unable to Process Downloaded CRL When Cisco WSA in the Middle

In this issue the Cisco ASA/FTD is apparently unable to parse a downloaded CRL when a Cisco WSA proxy server is transparently in the middle. The Cisco WSA is returning the file to the Cisco ASA/FTD but the ASA apparently does not like something in the HTTP headers (the “Via” header? I don’t know). There is nothing wrong with the CRL itself — I performed a packet capture on the ASA itself, extracted the CRL file from the packet capture, and it is not corrupted or anything. In fact, I have seen the revocation check work sometimes; I believe the problem occurs when the CRL is present in the WSA’s cache, which would explain why it works sometimes. I configured the web server hosting the CRL to prevent caching of the file but the problem still persists.

Workaround for this problem: Configure the ASA to fallback to no revocation check, i.e.

crypto ca truspoint X
 revocation-check crl none

PRF Algorithms Other Than SHA1 Do Not Work

No idea if the problem here is on the Cisco ASA/FTD side or on the strongSwan side. All I know is that strongSwan fails to authenticate the peer. I see these messages in the strongSwan logs:

[ENC] parsed IKE_AUTH response 1 [ V IDr CERT AUTH SA TSi TSr N(ESP_TFC_PAD_N) N(NON_FIRST_FRAG) N(MOBIKE_SUP) ]
[IKE] received end entity cert ""
[CFG]   using certificate ""
[CFG]   using trusted ca certificate ""
[CFG] checking certificate status of ""
[CFG]   using trusted certificate ""
[CFG]   crl correctly signed by ""
[CFG]   crl is valid: until Jan 06 02:12:01 2018
[CFG]   using cached crl
[CFG] certificate status is good
[CFG]   reached self-signed root ca with a path length of 0
[IKE] signature validation failed, looking for another key
[CFG]   using certificate ""
[CFG]   using trusted ca certificate ""
[CFG] checking certificate status of ""
[CFG]   using trusted certificate ""
[CFG]   crl correctly signed by ""
[CFG]   crl is valid: until Jan 06 02:12:01 2018
[CFG]   using cached crl
[CFG] certificate status is good
[CFG]   reached self-signed root ca with a path length of 0
[IKE] signature validation failed, looking for another key
[ENC] generating INFORMATIONAL request 2 [ N(AUTH_FAILED) ]
[NET] sending packet: from 10.118.57.151[4500] to 10.122.109.113[4500] (80 bytes)
initiate failed: establishing CHILD_SA 'css-lab' failed

Workaround for this problem: Use SHA-1 as the PRF. For example, on the ASA, one could use:

crypto ikev2 policy 10
 encryption aes-256
 integrity sha256
 group 19
 prf sha

Certificates Using ASN.1 “PRINTABLESTRING” Don’t Work on Cisco ASA/FTD

This one was very difficult to troubleshoot. It might be a bug on the strongSwan side but I am not sure. The issue is that, depending on configuration, strongSwan will use as IKEv2 identity to send to the Cisco ASA/FTD a Distinguished Name (DN) in binary ASN.1 encoding, but when it creates this binary ASN.1 encoding it will use the type “PRINTABLESTRING” instead of “UTF8STRING” to represent fields like Country, stateOrProvince, localityName, organizationName, commonName, etc. The IKEv2 identity is otherwise identical to the identity that strongSwan would obtain directly from the certificate.

On the ASA/FTD side, when the ASA/FTD receives an identity that uses fields of type “PRINTABLESTRING” it seems to consider the identity bad, and it chokes. This is made difficult to troubleshoot by the fact that there apparently are no good debug messages to see what is going on. On a bad case one sees these messages:

%ASA-7-711001: IKEv2-PLAT-3: RECV PKT [IKE_AUTH] [10.118.57.149]:500->[10.122.109.113]:500 InitSPI=0x596a08fccb72412a RespSPI=0x5d757649514ab5e8 MID=00000001
%ASA-7-711001: (34):  
%ASA-7-711001: IKEv2-PROTO-2: (34): Received Packet [From 10.118.57.149:500/To 10.122.109.113:500/VRF i0:f0] 
[...]
%ASA-7-711001:  IDr%ASA-7-711001:   Next payload: CERT, reserved: 0x0, length: 128
%ASA-7-711001:     Id type: DER ASN1 DN, Reserved: 0x0 0x0
%ASA-7-711001: 
%ASA-7-711001:      30 76 31 0b 30 09 06 03 55 04 06 13 02 55 53 31
%ASA-7-711001:      0b 30 09 06 03 55 04 08 13 02 4e 43 31 0c 30 0a
%ASA-7-711001:      06 03 55 04 07 13 03 52 54 50 31 0e 30 0c 06 03
%ASA-7-711001:      55 04 0a 13 05 43 69 73 63 6f 31 0c 30 0a 06 03
%ASA-7-711001:      55 04 0b 13 03 43 53 53 31 0c 30 0a 06 03 55 04
%ASA-7-711001:      03 13 03 72 70 69 31 20 30 1e 06 09 2a 86 48 86
%ASA-7-711001:      f7 0d 01 09 01 16 11 65 6c 70 61 72 69 73 40 63
%ASA-7-711001:      69 73 63 6f 2e 63 6f 6d
[...]
%ASA-7-711001: IKEv2-PROTO-5: (34): SM Trace-> SA: I_SPI=596A08FCCB72412A R_SPI=5D757649514AB5E8 (I) MsgID = 00000001 CurState: I_WAIT_AUTH Event: EV_RECV_AUTH
%ASA-7-711001: IKEv2-PROTO-5: (34): Action: Action_Null
%ASA-7-711001: IKEv2-PROTO-5: (34): SM Trace-> SA: I_SPI=596A08FCCB72412A R_SPI=5D757649514AB5E8 (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_CHK4_NOTIFY
%ASA-7-711001: IKEv2-PROTO-2: (34): Process auth response notify
%ASA-7-711001: IKEv2-PROTO-5: (34): SM Trace-> SA: I_SPI=596A08FCCB72412A R_SPI=5D757649514AB5E8 (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_PROC_MSG
%ASA-7-711001: IKEv2-PLAT-2: (34): peer auth method set to: 1
%ASA-7-711001: IKEv2-PROTO-5: (34): SM Trace-> SA: I_SPI=596A08FCCB72412A R_SPI=5D757649514AB5E8 (I) MsgID = 00000001 CurState: I_WAIT_AUTH Event: EV_RE_XMT
%ASA-7-711001: IKEv2-PROTO-2: (34): Retransmitting packet
%ASA-7-711001: (34):  
%ASA-7-711001: IKEv2-PROTO-2: (34): Sending Packet [To 10.118.57.149:500/From 10.122.109.113:500/VRF i0:f0] 

As can be seen, the state machine goes from I_WAIT_AUTH (wait for authentication payload) to I_PROC_AUTH (process authentication payload), receives an “EV_PROC_MSG” (process message event), and then goes back to the I_WAIT_AUTH state with a retransmit (EV_RE_XMT) event. There is not explanation or message that indicates why process the IKEv2 identity failed.

In the good case, when see messages like:

%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_WAIT_AUTH Event: EV_RECV_AUTH
%ASA-7-711001: IKEv2-PROTO-5: (35): Action: Action_Null
%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_CHK4_NOTIFY
%ASA-7-711001: IKEv2-PROTO-2: (35): Process auth response notify
%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_PROC_MSG
%ASA-7-711001: IKEv2-PLAT-2: (35): peer auth method set to: 1
%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_CHK_IF_PEER_CERT_NEEDS_TO_BE_FETCHED_FOR_PROF_SEL
%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_GET_POLICY_BY_PEERID

Notice that after the EV_PROC_MSG event there is no re-transmit event — in the logs I could see that eventually (after checking revocation of the certificate, etc.) the state machine leaves the I_PROC_AUTH state and the connection finally establishes.

The strongSwan configuration that caused the above problem was specifying “leftid” in /etc/ipsec.conf, i.e.

conn net-net
	leftcert=rpi.pem
	leftsubnet=10.123.1.0/24
	leftid="C=US, ST=CA, L=SF, O=Acme, OU=CSS, CN=myid, E=admin@example.com"
	leftfirewall=yes
	right=10.122.109.113
	rightid="C=US, ST=CA, L=SF, O=Acme, OU=CSS, CN=asa, E=admin@example.com"

If leftid is removed, and strongSwan is left to automatically detect the identity to send to the Cisco ASA/FTD then the problem does not occur. I think it is because it does not create the ID from scratch but instead extracts it from the identity certificate.

Here’s a diff of the output from “openssl asn1” for the case of the IKEv2 ID using an ASN.1 binary encoding that has “UTF8STRING” fields, and for the case where “PRINTABLESTRING” are used:

    15:d=1  hl=2 l=  11 cons: SET               
    17:d=2  hl=2 l=   9 cons: SEQUENCE          
    19:d=3  hl=2 l=   3 prim: OBJECT            :stateOrProvinceName
-   24:d=3  hl=2 l=   2 prim: UTF8STRING        :CA
+   24:d=3  hl=2 l=   2 prim: PRINTABLESTRING   :CA
    28:d=1  hl=2 l=  12 cons: SET               
    30:d=2  hl=2 l=  10 cons: SEQUENCE          
    32:d=3  hl=2 l=   3 prim: OBJECT            :localityName
-   37:d=3  hl=2 l=   3 prim: UTF8STRING        :SF
+   37:d=3  hl=2 l=   3 prim: PRINTABLESTRING   :SF
    42:d=1  hl=2 l=  14 cons: SET               
    44:d=2  hl=2 l=  12 cons: SEQUENCE          
    46:d=3  hl=2 l=   3 prim: OBJECT            :organizationName
-   51:d=3  hl=2 l=   5 prim: UTF8STRING        :Acme
+   51:d=3  hl=2 l=   5 prim: PRINTABLESTRING   :Acme
    58:d=1  hl=2 l=  12 cons: SET               
    60:d=2  hl=2 l=  10 cons: SEQUENCE          
    62:d=3  hl=2 l=   3 prim: OBJECT            :organizationalUnitName
-   67:d=3  hl=2 l=   3 prim: UTF8STRING        :CSS
+   67:d=3  hl=2 l=   3 prim: PRINTABLESTRING   :CSS
    72:d=1  hl=2 l=  12 cons: SET               
    74:d=2  hl=2 l=  10 cons: SEQUENCE          
    76:d=3  hl=2 l=   3 prim: OBJECT            :commonName
-   81:d=3  hl=2 l=   3 prim: UTF8STRING        :rpi
+   81:d=3  hl=2 l=   3 prim: PRINTABLESTRING   :rpi
    86:d=1  hl=2 l=  32 cons: SET               
    88:d=2  hl=2 l=  30 cons: SEQUENCE          
    90:d=3  hl=2 l=   9 prim: OBJECT            :emailAddress

Workaround for this issue: Do not use leftid and let strongSwan figure out the IKEv2 ID that it needs to present to the Cisco ASA/FTD.

Multiple Traffic Selectors Under Same Child SA

If the strongSwan configuration specifies multiple networks in one traffic selector, like in this configuration:

children {
 # CHILD_SA configuration sub-section.
 lab-vpn {
 # Local traffic selectors to include in CHILD_SA.
 # local_ts = dynamic
 local_ts = 10.123.1.0/24, 10.10.0.0/16

# Remote selectors to include in CHILD_SA.
 # remote_ts = dynamic
 remote_ts = 10.123.0.0/24
 }
 }

then the Cisco device will receive a TSi and TSr payloads in an IKEv2 message that look like these:

 TSi Next payload: TSr, reserved: 0x0, length: 56
 Num of TSs: 3, reserved 0x0, reserved 0x0
 TS type: TS_IPV4_ADDR_RANGE, proto id: 1, length: 16
 start port: 2048, end port: 2048
 start addr: 10.123.1.2, end addr: 10.123.1.2
 TS type: TS_IPV4_ADDR_RANGE, proto id: 0, length: 16
 start port: 0, end port: 65535
 start addr: 10.123.1.0, end addr: 10.123.1.255
 TS type: TS_IPV4_ADDR_RANGE, proto id: 0, length: 16
 start port: 0, end port: 65535
 start addr: 10.10.0.0, end addr: 10.10.255.255
 TSr Next payload: NOTIFY, reserved: 0x0, length: 40
 Num of TSs: 2, reserved 0x0, reserved 0x0
 TS type: TS_IPV4_ADDR_RANGE, proto id: 1, length: 16
 start port: 2048, end port: 2048
 start addr: 10.123.0.5, end addr: 10.123.0.5
 TS type: TS_IPV4_ADDR_RANGE, proto id: 0, length: 16
 start port: 0, end port: 65535
 start addr: 10.123.0.0, end addr: 10.123.0.255

As can be seen, the TSi payload contains multiple Traffic Selectors (one for 10.123.1.0/24 and another one for 10.10.0.0/16). This is based on the strongSwan configuration “local_ts = 10.123.1.0/24, 10.10.0.0/16”.

The idea is that the IPsec gateway that strongSwan is talking to should create IPsec Security Associations (SAs) for 10.123.1.0/24 <-> 10.123.0.0 and for 10.10.0.0/16 <-> 10.123.0.0.

Unfortunately, Cisco devices do not support this and instead only create SAs for the first traffic selector in the IKE message. There is a Cisco bug for this issue on Cisco ASA, but it does not appear that it will be fixed any time soon (as of May 2018):

CSCue42170 (“IKEv2: Support Multi Selector under the same child SA”)

strongSwan users have reported the problem:

https://wiki.strongswan.org/issues/758

A workaround has been proposed here. The workaround consists of creating multiple connections, one for each protect netowrk, instead of one connection with multiple protected networks.

Static CRL Revocation Check No Longer Works

If you had configuration like this before ASA 9.13.1:

crypto ca trustpoint <trustpoint>
 crl configure
  policy static
  url 1 http://x

then that will no longer work as the “url” command has been removed in ASA Software version 9.13.1 and later. The new way to configure the same thing is:

crypto ca certificate map <map> 10
 issuer-name attr cn co <cn> issuing
 issuer-name attr dc eq <dc>

crypto ca trustpoint <trustpoint>
 match certificate <map> override cdp 1 url http://x
 crl configure
 policy static

but it requires ASA versions 9.13.1.12 or later, or 9.14.1.12 or later.

This is tracked by Cisco bug ID CSCvu05216 (“cert map to specify CRL CDP Override does not allow backup entries”).

The removal of the “url” command is documented in the ASA Software 9.13 release notes under “Important Notes”:

https://www.cisco.com/c/en/us/td/docs/security/asa/asa913/release/notes/asarn913.html#reference_yw3_ngz_vhb

“Removal of CRL Distribution Point commands—The static CDP URL configuration commands, namely crypto-ca-trustpoint crl and crl url were removed with other related logic.

Note: The CDP URL configuration option was restored later (refer CSCvu05216).”

Conclusion

A site-to-site IPsec-based VPN tunnel between Cisco ASA/FTD and strongSwan running on Linux and using certificates for authentication comes up just fine but I ran into the three issues described above. All issues have reasonable workarounds. They are probably bugs that I’ll try to report to the respective parties.

SSL Certificates Made Easy (and Cheap!)

Running an SSL-enabled website is a best practice but often made difficult by the fact that one needs a Private Key Infrastructure (PKI) to obtain the SSL certificates needed for SSL operation.

There are two options for using a PKI: 1. Deploy your own PKI, and 2. Use a public PKI. The former is cheap (free) but has a steeper learning curve because one needs to know how to set up the Certification Authority (CA) server software and how to manage the PKI (generate Certificate Signing Requests [CSRs], sign certificates, revoke certificates, deploy the root CA certificate to endusers’ devices, etc.). The latter can be non-free but is easier as the PKI is already established and one only needs to request a certificate, sometimes for a price.

The Let’s Encrypt project is “[…] a free, automated, and open certificate authority (CA), run for the public’s benefit. It is a service provided by the Internet Security Research Group (ISRG).” See https://letsencrypt.org/about/ for additional details about the Let’s Encrypt project. Two important details about certificates issued by the Let’s Encrypt project is that: 1. They are free, and 2. Browsers trust the CA that issues them, so there is no need to distribute CA root certificates to endusers’ devices.

We run an Apache web server that serves a few domains via virtual hosts and it was easy to set them up to use certificates issued by the Let’s Encrypt project. Here are the details:

We run Apache on Ubuntu so the first thing we had to do was to install an ACME client (ACME is a protocol used to fetch certificates). The ACME client recommended by the Let’s Encrypt project is called Certbot. According to the Certbot’s website, “Certbot is an easy-to-use automatic client that fetches and deploys SSL/TLS certificates for your webserver. Certbot was developed by EFF and others as a client for Let’s Encrypt and was previously known as “the official Let’s Encrypt client” or “the Let’s Encrypt Python client.” Certbot will also work with any other CAs that support the ACME protocol”.

The Certbot website has clear instructions on how to do this. For us, it was just:

shell$ sudo add-apt-repository ppa:certbot/certbot
shell$ sudo apt-get update
shell$ sudo apt-get install certbot

The next step was to request the certificates. There are Certbot “plugins” that automate the process but we chose a very manual process that gives us a little bit more control over the entire process:

shell$ sudo certbot certonly --webroot -w /srv/www/www.domain1.net/ -d domain1.net -d www.domain1.net -w /usr/share/wordpress -d www.domain2.com -d domain2.com
 Saving debug log to /var/log/letsencrypt/letsencrypt.log
 Starting new HTTPS connection (1): acme-v01.api.letsencrypt.org

-------------------------------------------------------------------------------
 You have an existing certificate that contains a portion of the domains you
 requested (ref: /etc/letsencrypt/renewal/www.domain1.net.conf)

It contains these names: www.domain1.net, domain1.net

You requested these names for the new certificate: domain1.net, www.domain1.net,
 www.domain2.com, domain2.com.

Do you want to expand and replace this existing certificate with the new
 certificate?
 -------------------------------------------------------------------------------
 (E)xpand/(C)ancel: e
 Renewing an existing certificate
 Performing the following challenges:
 http-01 challenge for domain1.net
 http-01 challenge for www.domain1.net
 http-01 challenge for www.domain2.com
 http-01 challenge for domain2.com
 Using the webroot path /usr/share/wordpress for all unmatched domains.
 Waiting for verification...
 Cleaning up challenges
 Unable to clean up challenge directory /srv/www/www.domain1.net/.well-known/acme-challenge
 Generating key (2048 bits): /etc/letsencrypt/keys/0001_key-certbot.pem
 Creating CSR: /etc/letsencrypt/csr/0001_csr-certbot.pem

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at
 /etc/letsencrypt/live/www.domain1.net/fullchain.pem. Your cert will
 expire on 2017-06-26. To obtain a new or tweaked version of this
 certificate in the future, simply run certbot again. To
 non-interactively renew *all* of your certificates, run "certbot
 renew"
 - If you like Certbot, please consider supporting our work by:

Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate
 Donating to EFF: https://eff.org/donate-le

Note that I had previously requested a certificate for www.domain1.net, and when I ran Certbot I requested a new domain to be listed in the certificate (www.domain2.com). Certbot noticed that I had previously requested a certificate for www.domain1.net and asked me if I wanted to expand the certificate to include the new domain.

As mentioned in the output from the certbot, the certificates (identity certificate for the website as well as the CA certificate) are left in the /etc/letsencrypt/live/www.domain1.net directory. At this point one just has to configure Apache to use these certificates.

Printing From Windows 7 to Remote CUPS Printer

It took me a little while to figure out how to get a machine running Windows 7 to print to a remote CUPS printer so I thought I’d document what I did in case it helps others (as well as myself as I am sure I will forget this if I don’t document it)…

The first step is to go to the “Devices and Printers” control panel. There one has to click on the “Add Printer” link at the top (also available in the right-click menu).

In the dialog window that follows, select “Add a network, wireless, or Bluetooth printer”. Windows will then try to automatically find an available network printer. At that point I stop the search by clicking on the “Stop” button, and the click on “The printer that I want isn’t listed”.

In the next dialog, “Find a printer by name or TCP/IP address”, select the option “Select a shared printer by name”, and enter an URL like the following:

http://<IP address or hostname of the CUPS server>:631/printers/<Name of the CUPS printer queue>

Then click “Next”.

The next step is important — it’s where a printer driver must be selected. Normally, the CUPS server knows what printer it has connected. In that case on needs to send print jobs in a format that the CUPS server can understand, like Postscript or PDF — the CUPS server will convert to the appropriate language understood by the printer. However, it might be the case that the CUPS server has a raw queue, in which case the CUPS client must sent the print job in the format that the printer can understand.

So, when selecting a driver in the Windows “Add Printer Wizard”, one can do the folowing:

  1. If not using a raw queue on the CUPS server, select the “Generic” manufacturer, and then the “MS Publisher Color Printer”. This will cause the print job to be of type “application/postscript”, which CUPS can then convert to the right printer language.
  2. If using a raw queue on the CUPS server then select the appropriate printer driver so the Windows client sends the job in the format that the printer can understand.

References

“[…] install the native printer drivers for your printer on the Windows computer. If the CUPS server is set up to use its own printer drivers, then you can just select a generic postscript printer for the Windows client(e.g. ‘HP Color LaserJet 8500 PS’ or ‘Xerox DocuTech 135 PS2’).”

Note that I didn’t have luck using the “HP Color LaserJet 8500 PS” printer driver — it would generate a printer job in the “PJL encapsulated PostScript document text” format, which CUPS would have problems handling. But the “MS Publisher Color Printer” worked fine.

  • This page contains good information on how to create a CUPS raw printer queue:

http://opennomad.com/content/raw-cups-configuration-challenge

 

CUPS strikes again, this time with “Bad Request”

The machine running our CUPS print server at home recently had to be replaced by another one. Today we had to set up a new CUPS client to print to the CUPS server running on the new machine.

It turns out it was not an easy experience — the CUPS client kept saying “Bad Request” and there was nothing in the CUPS log files on the server side. We then realized that we were logging at a level that could be hiding important messages, based on this parameter in /etc/cups/cupsd.conf:

LogLevel warn

Changing this parameter to:

LogLevel debug

and trying again to print or modify the printer produced the following messages in /var/log/cups/error.log:

D [04/Dec/2016:16:30:58 -0500] [Client 1] GET /printers/HP_Officejet_2620_series.ppd HTTP/1.1
D [04/Dec/2016:16:30:58 -0500] cupsdSetBusyState: newbusy="Active clients and dirty files", busy="Dirty files"
D [04/Dec/2016:16:30:58 -0500] [Client 1] Read: status=200
D [04/Dec/2016:16:30:58 -0500] [Client 1] No authentication data provided.
E [04/Dec/2016:16:30:58 -0500] [Client 1] Request from "[v1.2003:1480:eca5:1120:7e7a:91ff:febf:d3b2]" using invalid Host: field "printer.example.com:631".
D [04/Dec/2016:16:30:58 -0500] [Client 1] cupsdSendHeader: code=400, type="text/html", auth_type=0
D [04/Dec/2016:16:30:58 -0500] [Client 1] Closing because Keep-Alive is disabled.
D [04/Dec/2016:16:30:58 -0500] [Client 1] Closing connection.

Aha! ‘Request from “[v1.2003:1480:eca5:1120:7e7a:91ff:febf:d3b2]” using invalid Host: field’!

Now with an actual error message to search for I was able to find Debian bug #530027, which has as a “workaround” to set “ServerAlias *” in /etc/cups/cupsd.conf.

After making that change and reloading CUPS things started to work. Note that the root cause of the problem is that the hostname of the print server changed, and the CUPS client was using the hostname of the old server via a DNS CNAME.

Beats me why such an error message is not produced unless the CUPS logging level is set to “debug”.

CUPS Remote Printing Filter Failed

I keep getting bitten by this issue every time I set up a new printer on a new machine to print to a remote CUPS server that has the physical printer connected to it (via USB, for example), and every time I run into the same problem it takes me a little while until I remember what the probblem is. So, I have decided to document the problem (and the solution) for next time I run in the problem.

The printer is correctly configured on the server, and by “correctly configured” I mean that the printer is using the correct driver, and that printing is working, both locally on the machine with the attached printer, and remotely on other clients.

On the client side, the URI for the printer is correct; for example ipp://servername.example.com:631/printers/printer_name. Everything seems fine until a document is sent to the printer on the client — when this happens, the server’s cupsd (not the client’s), generates runs into a problem and nothing gets printed. The server’s cupsd error file (typically /var/log/cups/error_log) contains something like this:

D [16/Oct/2016:21:25:45 -0400] [Job 2] Queued on "HP_Officejet_2620_series" by "username".
 D [16/Oct/2016:21:25:45 -0400] [Job 2] File of type application/vnd.cups-raster queued by "username".
 D [16/Oct/2016:21:25:45 -0400] [Job 2] Adding end banner page "none".
 D [16/Oct/2016:21:25:45 -0400] [Job 2] time-at-processing=1476667545
 D [16/Oct/2016:21:25:45 -0400] [Job 2] 1 filters for job:
 D [16/Oct/2016:21:25:45 -0400] [Job 2] hpcups (application/vnd.cups-raster to printer/HP_Officejet_2620_series, cost 0)
 D [16/Oct/2016:21:25:45 -0400] [Job 2] job-sheets=none,none
 D [16/Oct/2016:21:25:45 -0400] [Job 2] argv[0]="HP_Officejet_2620_series"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] argv[1]="2"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] argv[2]="username"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] argv[3]="Untitled Document 1"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] argv[4]="1"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] argv[5]="job-uuid=urn:uuid:09c01fb4-c9b4-39de-6e4c-3d33a0710d25 job-originating-host-name=[v1.2002:4170:e35:1:81ae:ffff:ffff:83a7] date-time-at-creation= date-time-at-processing= time-at-creation=1476667545 time-at-processing=1476667545"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] argv[6]="/var/spool/cups/d00002-001"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[0]="CUPS_CACHEDIR=/var/cache/cups"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[1]="CUPS_DATADIR=/usr/share/cups"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[2]="CUPS_DOCROOT=/usr/share/cups/doc"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[3]="CUPS_FONTPATH=/usr/share/cups/fonts"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[4]="CUPS_REQUESTROOT=/var/spool/cups"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[5]="CUPS_SERVERBIN=/usr/lib/cups"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[6]="CUPS_SERVERROOT=/eD [16/Oct/2016:21:25:45 -0400] [Job 2] envp[26]="PRINTER=HP_Officejet_2620_series"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[27]="PRINTER_STATE_REASONS=none"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[28]="CUPS_FILETYPE=document"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[29]="FINAL_CONTENT_TYPE=printer/HP_Officejet_2620_series"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[30]="AUTH_I****"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] Started filter /usr/lib/cups/filter/hpcups (PID 5259)
 D [16/Oct/2016:21:25:45 -0400] [Job 2] Started backend /usr/lib/cups/backend/hp (PID 5260)
 D [16/Oct/2016:21:25:45 -0400] [Job 2] PID 5259 (/usr/lib/cups/filter/hpcups) stopped with status 1.
 D [16/Oct/2016:21:25:45 -0400] [Job 2] Hint: Try setting the LogLevel to "debug" to find out more.
 D [16/Oct/2016:21:25:45 -0400] [Job 2] PID 5260 (/usr/lib/cups/backend/hp) exited with no errors.
 D [16/Oct/2016:21:25:45 -0400] [Job 2] prnt/hpcups/HPCupsFilter.cpp 565: cupsRasterOpen failed, fd = 6
 D [16/Oct/2016:21:25:45 -0400] [Job 2] prnt/backend/hp.c 919: ERROR: null print job total=0
 D [16/Oct/2016:21:25:45 -0400] [Job 2] End of messages
 D [16/Oct/2016:21:25:45 -0400] [Job 2] printer-state=3(idle)
 D [16/Oct/2016:21:25:45 -0400] [Job 2] printer-state-message="Filter failed"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] printer-state-reasons=none
 tc/cups"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[7]="CUPS_STATEDIR=/run/cups"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[8]="HOME=/var/spool/cups/tmp"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[9]="PATH=/usr/lib/cups/filter:/usr/bin:/usr/bin:/bin:/usr/bin"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[10]="SERVER_ADMIN=root@server.example.com"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[11]="SOFTWARE=CUPS/2.2.1"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[12]="TMPDIR=/var/spool/cups/tmp"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[13]="USER=root"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[14]="CUPS_MAX_MESSAGE=2047"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[15]="CUPS_SERVER=/run/cups/cups.sock"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[16]="CUPS_ENCRYPTION=IfRequested"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[17]="IPP_PORT=631"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[18]="CHARSET=utf-8"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[19]="LANG=en_US.UTF-8"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[20]="PPD=/etc/cups/ppd/HP_Officejet_2620_series.ppd"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[21]="RIP_MAX_CACHE=128m"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[22]="CONTENT_TYPE=application/vnd.cups-raster"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[23]="DEVICE_URI=hp:/usb/Officejet_2620_series?serial=CN4654G2BG0600"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[24]="PRINTER_INFO=HP Officejet 2620 series"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[25]="PRINTER_LOCATION=My desk"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[26]="PRINTER=HP_Officejet_2620_series"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[27]="PRINTER_STATE_REASONS=none"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[28]="CUPS_FILETYPE=document"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[29]="FINAL_CONTENT_TYPE=printer/HP_Officejet_2620_series"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] envp[30]="AUTH_I****"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] Started filter /usr/lib/cups/filter/hpcups (PID 5259)
 D [16/Oct/2016:21:25:45 -0400] [Job 2] Started backend /usr/lib/cups/backend/hp (PID 5260)
 D [16/Oct/2016:21:25:45 -0400] [Job 2] PID 5259 (/usr/lib/cups/filter/hpcups) stopped with status 1.
 D [16/Oct/2016:21:25:45 -0400] [Job 2] Hint: Try setting the LogLevel to "debug" to find out more.
 D [16/Oct/2016:21:25:45 -0400] [Job 2] PID 5260 (/usr/lib/cups/backend/hp) exited with no errors.
 D [16/Oct/2016:21:25:45 -0400] [Job 2] prnt/hpcups/HPCupsFilter.cpp 565: cupsRasterOpen failed, fd = 6
 D [16/Oct/2016:21:25:45 -0400] [Job 2] prnt/backend/hp.c 919: ERROR: null print job total=0
 D [16/Oct/2016:21:25:45 -0400] [Job 2] End of messages
 D [16/Oct/2016:21:25:45 -0400] [Job 2] printer-state=3(idle)
 D [16/Oct/2016:21:25:45 -0400] [Job 2] printer-state-message="Filter failed"
 D [16/Oct/2016:21:25:45 -0400] [Job 2] printer-state-reasons=none

The problem is that “Filter failed”, and my understanding of the root cause of the problem is that the printer is configured on the client with the correct make and model for the printer that is physically connected to the server. So, if the printer is, for example, an Hewlett-Packard printer, the client will render the print job and send to the server the rendered job. The server is expecting the job in some other format (Postscript? PDF? It’s not important) and when it receives it in rendered format, for the exact printer make and model, then the filter on the server fails to render the print job.

The solution is to configure the printer on the client as a “raw” printer, i.e. a printer where the printer driver is not specified. This way the client sends the job “unrendered” and lets the server do the rendering according to the correct printer driver that is installed (on the server).

I remember that when I first ran into this problem it was not easy to figure out what was wrong. I enabled all the debugging knobs that I could find and nothing helped. It probably was some post to some random blog or Internet forum what gave me a clue, but it was not easy to find.

And to add insult to injury, using the Printer control panel on Ubuntu to modify the default settings of a printer (on a CUPS client), would change the local printer configuration from “raw” to a specific make and model, which would then trigger the problem explained above. This made me scratch my head and waste hours trying to get a previously working printer that stopped working to working state again.

Ggggrrrrr.

References

This RedHat bug report has good information on the issue and how clients and servers should be configured:

https://bugzilla cialis 5mg preis.redhat.com/show_bug.cgi?id=1010580

This ArchLinux forum discussion is very relevant to the problem:

https://bbs.archlinux.org/viewtopic.php?pid=1589908#p1589908

Finally, this ArchLinux wiki page:

https://wiki.archlinux.org/index.php/CUPS#Network_2

contains the following note, which describes precisely what the issue is:

Warning: Avoid configuring both the server and the client with a printer filter – either the print queue on the client or the server should be ‘raw’. This avoids sending a print job through the filters for a printer twice, which can cause problems (for instance, [3]). See #Usage for an example of setting a print queue to ‘raw’.”

Reset Azure Virtual Netwok Gateway

No matter what I did I could not get an IPsec site-to-site tunnel going between an offsite test network and our Microsoft Azure virtual network. Our VPN gateway is a Cisco ASA 5506.

The issue was that the Cisco ASA would try to bring up the tunnel but some part of the negotiation would go wrong at some point. Debug messages on the Cisco ASA would show something like this:

Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, IKE SA Proposal # 1, Transform # 2 acceptable Matches global IKE entry # 5
 Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing ISAKMP SA payload
 Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing NAT-Traversal VID ver RFC payload
 Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing Fragmentation VID + extended capabilities payload
 Apr 19 09:21:41 [IKEv1]IP = 13.94.202.38, IKE_DECODE SENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:21:42 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:43 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:46 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:49 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:21:57 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:22:05 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, IKE MM Responder FSM error history (struct &0x00002aaac1c34cf0) , : MM_DONE, EV_ERROR-->MM_WAIT_MSG3,
 EV_TIMEOUT-->MM_WAIT_MSG3, NullEvent-->MM_SND_MSG2, EV_SND_MSG-->MM_SND_MSG2, EV_START_TMR-->MM_SND_MSG2, EV_RESEND_MSG-->MM_WAIT_MSG3, EV_TIMEOUT-->MM_WAIT_MSG3, NullEvent
 Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, IKE SA MM:4af079c0 terminating: flags 0x01000002, refcnt 0, tuncnt 0
 Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, sending delete/delete with reason message
 Apr 19 09:23:06 [IKEv1]IP = 13.94.202.38, IKE_DECODE RECEIVED Message (msgid=6a9f34a4) with payloads : HDR + HASH (8) + DELETE (12) + NONE (0) total length : 68
 Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, processing hash payload
 Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, processing delete
 Apr 19 09:23:06 [IKEv1]Group = 13.94.202.38, IP = 13.94.202.38, Connection terminated for peer 13.94.202.38. Reason: Peer Terminate Remote Proxy 10.100.152.0, Local Proxy 10.50.0.0
 Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, Active unit receives a delete event for remote peer 13.94.202.38.

A couple of key points in the above debug messages:

  1. “MM_WAIT_MSG3, EV_TIMEOUT” indicates that the Cisco ASA timeout waiting for the Azure VPN gateway.
  2. “Duplicate first packet detected. Ignoring packet” indicates that the Azure VPN gateway is not liking the previous message that the Cisco ASA sends. Increasing the debug level (not shown above) indicates a mismatch in terms of cookies, and this is apparently what upsets the Azure Virtual Network Gateway.

This is shown on Cisco ASA debug messages at a higher debug level:

Azure
 InitiatorCookie: 03 83 AD 7C 10 26 CB D6
 ResponderCookie: 14 42 19 27 F6 F2 DF 53

RECV PACKET from 13.91.5.150
 ISAKMP Header
 Initiator COOKIE: 03 83 ad 7c 10 26 cb d6
 Responder COOKIE: 00 00 00 00 00 00 00 00

These are debug messages produced on the Microsoft Azure side:

2016?-?03?-?02 10:31:37 ERROR user NULL 0000000FE1E59D80 0000000FE1E64320 f74513382e60832f cac68571e57c06d5 Invalid cookies. Try resetting SAs on-prem. IkeProcessPacketDispatch failed with HRESULT 0x80073616(ERROR_IPSEC_IKE_INVALID_COOKIE)

(Note the “ERROR_IPSEC_IKE_INVALID_COOKIE” error code.)

After spending sometime troubleshooting the Cisco ASA side we could not find anything wrong with the Cisco ASA configuration.

In the end, in my desperation, I decided to reset the Azure Virtual
Network Gateway and that seems to have fixed the issue for good.

The process to reset an Azure Virtual Network Gateway is a bit tricky because there is no way to do that using the Azure Portal; it needs to be done using PowerShell instead.

This is what I did to reset the Azure Virtual Network Gateway using
PowerShell:

1. Install Azure PowerShell. I used the instructions here:

https://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/

In particular, I went with the leaner and perhaps more complicated
installation from the PowerShell Gallery (instead of installing from
WebPI).

2. After Azure PowerShell was installed, I opene a PowerShell command window and ran the following commands:

Login-AzureRmAccount
Select-AzureRmSubscription -SubscriptionName "<your subscription name>"
$vg = Get-AzureRmVirtualNetworkGateway -ResourceGroupName RG
Reset-AzureRmVirtualNetworkGateway -VirtualNetworkGateway $vg

Apparently, Azure Virtual Network Gateways are redundant Virtual Machines so resetting one will cause the other to take over.

The other one could be reset by invoking “Reset-AzureRmVirtualNetworkGateway” a few minutes after the first gateway has been reset but in my case the site-to-site VPN tunnel came up after resetting only one of the gateways.

Note that the above XXXXX-AzureRmXXXXX PowerShell cmdlets use the new
Azure Resource Manager deployment model. Similar commands would have to
be used if the classic deployment model is used instead.

This article:

https://azure.microsoft.com/en-us/documentation/articles/vpn-gateway-resetgw-classic/

is a good reference for how to reset Azure Virtual Network Gateways that have been deployed using the classic deployment model. Note that it says that the same cannot be done for the Resource Manager deployment model but I think the capability is there now (I used it) and it is just that the article has not been updated yet.

On a related note, I should mention that another way of dealing with this problem is by deploying a Cisco ASAv virtual appliance and using that to terminate the site-to-site IPsec tunnel instead of terminating it on the Microsoft-provided Azure Virtual Network Gateway. This of course would be more expensive given that licenses for the Cisco ASAv would have to be purchased, plus it is another Virtual Machine that would have to deployed (and pay for).