NetworkManager strongSwan encryption algorithm ‘DES-CBC’ not supported

Recently we ran into an issue involving NetworkManager and strongSwan. The error in the systemd journal was a cryptic “encryption algorithm ‘DES-CBC’ not supported”, as shown in the following log excerpt:

Jul 19 19:14:00 el-valle NetworkManager[733]:  [1532042040.2613] audit: op="connection-activate" uuid="26f20e51-92ba-4a78-a1>
Jul 19 19:14:00 el-valle NetworkManager[733]:  [1532042040.2764] vpn-connection[0x56050004c1f0,26f20e51-92ba-4a78-a17e-1709b>
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[ASN] encryption algorithm 'DES-CBC' not supported
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[LIB] building CRED_PRIVATE_KEY - RSA failed, tried 8 builders
Jul 19 19:14:00 el-valle NetworkManager[733]:  [1532042040.2862] vpn-connection[0x56050004c1f0,26f20e51-92ba-4a78-a17e-1709b>
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[CFG] received initiate for NetworkManager connection Acme strongSwan
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[CFG] using CA certificate, gateway identity ''
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[ASN] encryption algorithm 'DES-CBC' not supported
Jul 19 19:14:00 el-valle charon-nm[17026]: 05[LIB] building CRED_PRIVATE_KEY - ANY failed, tried 7 builders

In the end we tracked this down to strongSwan being unable to read a private key that had been encrypted with DES. The solution was to re-encrypt the private key using AES-256:

shell$ sudo openssl rsa -in client_key.pem -aes256 -out newkey.pem
Enter pass phrase for client_key.pem:
writing RSA key
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
shell$ sudo mv newkey.pem client_key.pem

The following post was helpful to figure out what was happening:

Site-to-site VPN Between Cisco ASA/FTD and strongSwan

I recently wasted about two days to bring up a simple site-to-site IPsec VPN tunnel between a Cisco ASA and Cisco FTD and a Linux machine running strongSwan and using digital certificates to authenticate the peers. The configuration was simple but due to a little “detail” and to a lack of good debugging information on the Cisco ASA/FTD, what should have been a five-minute job ended up taking a couple of days of troubleshooting, looking at the strongSwan source code, and making configuration changes to try to make it work. In the end I was able to bring up the tunnel and I got to the bottom of what the Cisco ASA/FTD was not liking from what strongSwan was sending. I will document here the configurations, and finally, at the end, will show what the Cisco ASA/FTD was choking on.

Cisco ASA Configuration

The basic VPN configuration on the Cisco ASA side looks like this:

access-list traffic-to-encrypt extended permit ip 
crypto ipsec ikev2 ipsec-proposal IPSEC-PROPOSAL
 protocol esp encryption aes-256
 protocol esp integrity sha-256 sha-1
crypto map MYMAP 10 match address traffic-to-encrypt
crypto map MYMAP 10 set peer 
crypto map MYMAP 10 set ikev2 ipsec-proposal IPSEC-PROPOSAL
crypto map MYMAP 10 set trustpoint TRUSTPOINT chain
crypto map MYMAP interface outside
crypto ca trustpoint TRUSTPOINT
 revocation-check crl
 crl configure
 policy static
 url 1
crypto ikev2 policy 10
 encryption aes-256
 integrity sha256
 group 14
 prf sha
 lifetime seconds 86400
crypto ikev2 enable outside
tunnel-group type ipsec-l2l
tunnel-group ipsec-attributes
 ikev2 remote-authentication certificate
 ikev2 local-authentication certificate TRUSTPOINT

Note that the FTD configuration is very similar, but it has to be performed via the Firepower Management Center (FMC) GUI. In fact, after doing the configuration via FMC one can log into the FTD CLI using SSH and run the command “show running-config” and see the same configuration shown above for the ASA.

strongSwan Configuration (ipsec.conf)

The ipsec.conf configuration file (typically located at /etc/ipsec.conf) is the old way of configuring the strongSwan IPsec subsystem. The following ipsec.conf file contents allowed the tunnel to come up with no problems:

config setup
	cachecrls = yes

ca MyCA
	crluri =
	cacert = ca.pem
	auto = add

conn %default

conn net-net
	rightid="C=US, ST=CA, L=SF, O=Acme, OU=CSS, CN=asa,"

In addition to the ipsec.conf file, the ipsec.secrets (typically /etc/ipsec.secrets) also has to be edited, in this case to indicate the name of the private RSA key. Our ipsec.secrets file looks like this:

# ipsec.secrets - strongSwan IPsec secrets file

: RSA mykey.pem

Finally, certain certificates and the RSA key must be placed (all in PEM format) in certain directories under /etc/ipsec.d:

The Linux machine’s identity certificate goes into /etc/ipsec.d/cert/. strongSwan automatically loads that certificate upon startup
The Certification Authority (CA) root certificate goes into /etc/ipsec.d/cacerts/
The private key must be placed in /etc/ipsec.d/private/

strongSwan Configuration (swanctl.conf)

swanctl.conf is a new configuration file that is used by the swanctl(8) tool to load configurations and credentials into the strongSwan IKE daemon. This is the “new” way to configure the strongSwan IPsec subsystem. The configuration file syntax is very different, though the parameters that need to be set to be able to bring up the IPsec tunnel are the same as in the case of the ipsec.conf-based configuration.

A swanctl.conf-based configuration is more modular. Configuration files typically exist under /etc/swanctl/. For our specific connection, we put the configuration in the file /etc/swanctl/conf.d/example.conf, which gets included from /etc/swanctl/swanctl.conf. Our /etc/swanctl/example.conf file contains the following:

connections {

    # Section for an IKE connection named .
    my-connection {
        # IKE major version to use for connection.
        version = 2

        # Remote address(es) to use for IKE communication, comma separated.
        # remote_addrs = %any
	remote_addrs =

        # Section for a local authentication round.
        local-1 {
            # Comma separated list of certificate candidates to use for
            # authentication.
            certs = rpi.pem

        children {

            # CHILD_SA configuration sub-section.
            my-connection {
                # Local traffic selectors to include in CHILD_SA.
                # local_ts = dynamic
                local_ts =

                # Remote selectors to include in CHILD_SA.
                # remote_ts = dynamic
		remote_ts =

# Section defining secrets for IKE/EAP/XAuth authentication and private key
# decryption.
secrets {
    # Private key decryption passphrase for a key in the private folder.
    private-rpikey {
        # File name in the private folder for which this passphrase should be
        # used.
        file = rpi.pem

        # Value of decryption passphrase for private key.
        # secret =

# Section defining attributes of certification authorities.
authorities {
    # Section defining a certification authority with a unique name.
    MyA {
        # CA certificate belonging to the certification authority.
        cacert = myca.pem

        # Comma-separated list of CRL distribution points.
        crl_uris =

Bringing Up the Tunnel on Interesting Traffic

To bring up the tunnel when “interesting” traffic is received it is necessary to use the “start_action” configuration parameter. Otherwise the IPsec tunnel has to be brought up manually using the swanctl –initiate xxxxx command.

Here’s an example configuration that uses “start_action”:

connections {

# Section for an IKE connection named <conn>.
 lab-vpn {
 version = 2

 remote_addrs =

 local-1 {
 certs = rpi.pem

children {
 # CHILD_SA configuration sub-section.
 lab-vpn {
 local_ts =,
 remote_ts =

start_action = trap




There were three serious issues that I ran into when trying to bring up the site to site tunnel. All of them appear to be bugs.

Cisco ASA/FTD Unable to Process Downloaded CRL When Cisco WSA in the Middle

In this issue the Cisco ASA/FTD is apparently unable to parse a downloaded CRL when a Cisco WSA proxy server is transparently in the middle. The Cisco WSA is returning the file to the Cisco ASA/FTD but the ASA apparently does not like something in the HTTP headers (the “Via” header? I don’t know). There is nothing wrong with the CRL itself — I performed a packet capture on the ASA itself, extracted the CRL file from the packet capture, and it is not corrupted or anything. In fact, I have seen the revocation check work sometimes; I believe the problem occurs when the CRL is present in the WSA’s cache, which would explain why it works sometimes. I configured the web server hosting the CRL to prevent caching of the file but the problem still persists.

Workaround for this problem: Configure the ASA to fallback to no revocation check, i.e.

crypto ca truspoint X
 revocation-check crl none

PRF Algorithms Other Than SHA1 Do Not Work

No idea if the problem here is on the Cisco ASA/FTD side or on the strongSwan side. All I know is that strongSwan fails to authenticate the peer. I see these messages in the strongSwan logs:

[IKE] received end entity cert ""
[CFG]   using certificate ""
[CFG]   using trusted ca certificate ""
[CFG] checking certificate status of ""
[CFG]   using trusted certificate ""
[CFG]   crl correctly signed by ""
[CFG]   crl is valid: until Jan 06 02:12:01 2018
[CFG]   using cached crl
[CFG] certificate status is good
[CFG]   reached self-signed root ca with a path length of 0
[IKE] signature validation failed, looking for another key
[CFG]   using certificate ""
[CFG]   using trusted ca certificate ""
[CFG] checking certificate status of ""
[CFG]   using trusted certificate ""
[CFG]   crl correctly signed by ""
[CFG]   crl is valid: until Jan 06 02:12:01 2018
[CFG]   using cached crl
[CFG] certificate status is good
[CFG]   reached self-signed root ca with a path length of 0
[IKE] signature validation failed, looking for another key
[ENC] generating INFORMATIONAL request 2 [ N(AUTH_FAILED) ]
[NET] sending packet: from[4500] to[4500] (80 bytes)
initiate failed: establishing CHILD_SA 'css-lab' failed

Workaround for this problem: Use SHA-1 as the PRF. For example, on the ASA, one could use:

crypto ikev2 policy 10
 encryption aes-256
 integrity sha256
 group 19
 prf sha

Certificates Using ASN.1 “PRINTABLESTRING” Don’t Work on Cisco ASA/FTD

This one was very difficult to troubleshoot. It might be a bug on the strongSwan side but I am not sure. The issue is that, depending on configuration, strongSwan will use as IKEv2 identity to send to the Cisco ASA/FTD a Distinguished Name (DN) in binary ASN.1 encoding, but when it creates this binary ASN.1 encoding it will use the type “PRINTABLESTRING” instead of “UTF8STRING” to represent fields like Country, stateOrProvince, localityName, organizationName, commonName, etc. The IKEv2 identity is otherwise identical to the identity that strongSwan would obtain directly from the certificate.

On the ASA/FTD side, when the ASA/FTD receives an identity that uses fields of type “PRINTABLESTRING” it seems to consider the identity bad, and it chokes. This is made difficult to troubleshoot by the fact that there apparently are no good debug messages to see what is going on. On a bad case one sees these messages:

%ASA-7-711001: IKEv2-PLAT-3: RECV PKT [IKE_AUTH] []:500->[]:500 InitSPI=0x596a08fccb72412a RespSPI=0x5d757649514ab5e8 MID=00000001
%ASA-7-711001: (34):  
%ASA-7-711001: IKEv2-PROTO-2: (34): Received Packet [From i0:f0] 
%ASA-7-711001:  IDr%ASA-7-711001:   Next payload: CERT, reserved: 0x0, length: 128
%ASA-7-711001:     Id type: DER ASN1 DN, Reserved: 0x0 0x0
%ASA-7-711001:      30 76 31 0b 30 09 06 03 55 04 06 13 02 55 53 31
%ASA-7-711001:      0b 30 09 06 03 55 04 08 13 02 4e 43 31 0c 30 0a
%ASA-7-711001:      06 03 55 04 07 13 03 52 54 50 31 0e 30 0c 06 03
%ASA-7-711001:      55 04 0a 13 05 43 69 73 63 6f 31 0c 30 0a 06 03
%ASA-7-711001:      55 04 0b 13 03 43 53 53 31 0c 30 0a 06 03 55 04
%ASA-7-711001:      03 13 03 72 70 69 31 20 30 1e 06 09 2a 86 48 86
%ASA-7-711001:      f7 0d 01 09 01 16 11 65 6c 70 61 72 69 73 40 63
%ASA-7-711001:      69 73 63 6f 2e 63 6f 6d
%ASA-7-711001: IKEv2-PROTO-5: (34): SM Trace-> SA: I_SPI=596A08FCCB72412A R_SPI=5D757649514AB5E8 (I) MsgID = 00000001 CurState: I_WAIT_AUTH Event: EV_RECV_AUTH
%ASA-7-711001: IKEv2-PROTO-5: (34): Action: Action_Null
%ASA-7-711001: IKEv2-PROTO-5: (34): SM Trace-> SA: I_SPI=596A08FCCB72412A R_SPI=5D757649514AB5E8 (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_CHK4_NOTIFY
%ASA-7-711001: IKEv2-PROTO-2: (34): Process auth response notify
%ASA-7-711001: IKEv2-PROTO-5: (34): SM Trace-> SA: I_SPI=596A08FCCB72412A R_SPI=5D757649514AB5E8 (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_PROC_MSG
%ASA-7-711001: IKEv2-PLAT-2: (34): peer auth method set to: 1
%ASA-7-711001: IKEv2-PROTO-5: (34): SM Trace-> SA: I_SPI=596A08FCCB72412A R_SPI=5D757649514AB5E8 (I) MsgID = 00000001 CurState: I_WAIT_AUTH Event: EV_RE_XMT
%ASA-7-711001: IKEv2-PROTO-2: (34): Retransmitting packet
%ASA-7-711001: (34):  
%ASA-7-711001: IKEv2-PROTO-2: (34): Sending Packet [To i0:f0] 

As can be seen, the state machine goes from I_WAIT_AUTH (wait for authentication payload) to I_PROC_AUTH (process authentication payload), receives an “EV_PROC_MSG” (process message event), and then goes back to the I_WAIT_AUTH state with a retransmit (EV_RE_XMT) event. There is not explanation or message that indicates why process the IKEv2 identity failed.

In the good case, when see messages like:

%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_WAIT_AUTH Event: EV_RECV_AUTH
%ASA-7-711001: IKEv2-PROTO-5: (35): Action: Action_Null
%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_CHK4_NOTIFY
%ASA-7-711001: IKEv2-PROTO-2: (35): Process auth response notify
%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_PROC_MSG
%ASA-7-711001: IKEv2-PLAT-2: (35): peer auth method set to: 1
%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_CHK_IF_PEER_CERT_NEEDS_TO_BE_FETCHED_FOR_PROF_SEL
%ASA-7-711001: IKEv2-PROTO-5: (35): SM Trace-> SA: I_SPI=03542332E12C42F4 R_SPI=3E95373B6C8C25AF (I) MsgID = 00000001 CurState: I_PROC_AUTH Event: EV_GET_POLICY_BY_PEERID

Notice that after the EV_PROC_MSG event there is no re-transmit event — in the logs I could see that eventually (after checking revocation of the certificate, etc.) the state machine leaves the I_PROC_AUTH state and the connection finally establishes.

The strongSwan configuration that caused the above problem was specifying “leftid” in /etc/ipsec.conf, i.e.

conn net-net
	leftid="C=US, ST=CA, L=SF, O=Acme, OU=CSS, CN=myid,"
	rightid="C=US, ST=CA, L=SF, O=Acme, OU=CSS, CN=asa,"

If leftid is removed, and strongSwan is left to automatically detect the identity to send to the Cisco ASA/FTD then the problem does not occur. I think it is because it does not create the ID from scratch but instead extracts it from the identity certificate.

Here’s a diff of the output from “openssl asn1” for the case of the IKEv2 ID using an ASN.1 binary encoding that has “UTF8STRING” fields, and for the case where “PRINTABLESTRING” are used:

    15:d=1  hl=2 l=  11 cons: SET               
    17:d=2  hl=2 l=   9 cons: SEQUENCE          
    19:d=3  hl=2 l=   3 prim: OBJECT            :stateOrProvinceName
-   24:d=3  hl=2 l=   2 prim: UTF8STRING        :CA
+   24:d=3  hl=2 l=   2 prim: PRINTABLESTRING   :CA
    28:d=1  hl=2 l=  12 cons: SET               
    30:d=2  hl=2 l=  10 cons: SEQUENCE          
    32:d=3  hl=2 l=   3 prim: OBJECT            :localityName
-   37:d=3  hl=2 l=   3 prim: UTF8STRING        :SF
+   37:d=3  hl=2 l=   3 prim: PRINTABLESTRING   :SF
    42:d=1  hl=2 l=  14 cons: SET               
    44:d=2  hl=2 l=  12 cons: SEQUENCE          
    46:d=3  hl=2 l=   3 prim: OBJECT            :organizationName
-   51:d=3  hl=2 l=   5 prim: UTF8STRING        :Acme
+   51:d=3  hl=2 l=   5 prim: PRINTABLESTRING   :Acme
    58:d=1  hl=2 l=  12 cons: SET               
    60:d=2  hl=2 l=  10 cons: SEQUENCE          
    62:d=3  hl=2 l=   3 prim: OBJECT            :organizationalUnitName
-   67:d=3  hl=2 l=   3 prim: UTF8STRING        :CSS
+   67:d=3  hl=2 l=   3 prim: PRINTABLESTRING   :CSS
    72:d=1  hl=2 l=  12 cons: SET               
    74:d=2  hl=2 l=  10 cons: SEQUENCE          
    76:d=3  hl=2 l=   3 prim: OBJECT            :commonName
-   81:d=3  hl=2 l=   3 prim: UTF8STRING        :rpi
+   81:d=3  hl=2 l=   3 prim: PRINTABLESTRING   :rpi
    86:d=1  hl=2 l=  32 cons: SET               
    88:d=2  hl=2 l=  30 cons: SEQUENCE          
    90:d=3  hl=2 l=   9 prim: OBJECT            :emailAddress

Workaround for this issue: Do not use leftid and let strongSwan figure out the IKEv2 ID that it needs to present to the Cisco ASA/FTD.

Multiple Traffic Selectors Under Same Child SA

If the strongSwan configuration specifies multiple networks in one traffic selector, like in this configuration:

children {
 # CHILD_SA configuration sub-section.
 lab-vpn {
 # Local traffic selectors to include in CHILD_SA.
 # local_ts = dynamic
 local_ts =,

# Remote selectors to include in CHILD_SA.
 # remote_ts = dynamic
 remote_ts =

then the Cisco device will receive a TSi and TSr payloads in an IKEv2 message that look like these:

 TSi Next payload: TSr, reserved: 0x0, length: 56
 Num of TSs: 3, reserved 0x0, reserved 0x0
 TS type: TS_IPV4_ADDR_RANGE, proto id: 1, length: 16
 start port: 2048, end port: 2048
 start addr:, end addr:
 TS type: TS_IPV4_ADDR_RANGE, proto id: 0, length: 16
 start port: 0, end port: 65535
 start addr:, end addr:
 TS type: TS_IPV4_ADDR_RANGE, proto id: 0, length: 16
 start port: 0, end port: 65535
 start addr:, end addr:
 TSr Next payload: NOTIFY, reserved: 0x0, length: 40
 Num of TSs: 2, reserved 0x0, reserved 0x0
 TS type: TS_IPV4_ADDR_RANGE, proto id: 1, length: 16
 start port: 2048, end port: 2048
 start addr:, end addr:
 TS type: TS_IPV4_ADDR_RANGE, proto id: 0, length: 16
 start port: 0, end port: 65535
 start addr:, end addr:

As can be seen, the TSi payload contains multiple Traffic Selectors (one for and another one for This is based on the strongSwan configuration “local_ts =,”.

The idea is that the IPsec gateway that strongSwan is talking to should create IPsec Security Associations (SAs) for <-> and for <->

Unfortunately, Cisco devices do not support this and instead only create SAs for the first traffic selector in the IKE message. There is a Cisco bug for this issue on Cisco ASA, but it does not appear that it will be fixed any time soon (as of May 2018):

CSCue42170 (“IKEv2: Support Multi Selector under the same child SA”)

strongSwan users have reported the problem:

A workaround has been proposed here. The workaround consists of creating multiple connections, one for each protect netowrk, instead of one connection with multiple protected networks.


A site-to-site IPsec-based VPN tunnel between Cisco ASA/FTD and strongSwan running on Linux and using certificates for authentication comes up just fine but I ran into the three issues described above. All issues have reasonable workarounds. They are probably bugs that I’ll try to report to the respective parties.

SSL Certificates Made Easy (and Cheap!)

Running an SSL-enabled website is a best practice but often made difficult by the fact that one needs a Private Key Infrastructure (PKI) to obtain the SSL certificates needed for SSL operation.

There are two options for using a PKI: 1. Deploy your own PKI, and 2. Use a public PKI. The former is cheap (free) but has a steeper learning curve because one needs to know how to set up the Certification Authority (CA) server software and how to manage the PKI (generate Certificate Signing Requests [CSRs], sign certificates, revoke certificates, deploy the root CA certificate to endusers’ devices, etc.). The latter can be non-free but is easier as the PKI is already established and one only needs to request a certificate, sometimes for a price.

The Let’s Encrypt project is “[…] a free, automated, and open certificate authority (CA), run for the public’s benefit. It is a service provided by the Internet Security Research Group (ISRG).” See for additional details about the Let’s Encrypt project. Two important details about certificates issued by the Let’s Encrypt project is that: 1. They are free, and 2. Browsers trust the CA that issues them, so there is no need to distribute CA root certificates to endusers’ devices.

We run an Apache web server that serves a few domains via virtual hosts and it was easy to set them up to use certificates issued by the Let’s Encrypt project. Here are the details:

We run Apache on Ubuntu so the first thing we had to do was to install an ACME client (ACME is a protocol used to fetch certificates). The ACME client recommended by the Let’s Encrypt project is called Certbot. According to the Certbot’s website, “Certbot is an easy-to-use automatic client that fetches and deploys SSL/TLS certificates for your webserver. Certbot was developed by EFF and others as a client for Let’s Encrypt and was previously known as “the official Let’s Encrypt client” or “the Let’s Encrypt Python client.” Certbot will also work with any other CAs that support the ACME protocol”.

The Certbot website has clear instructions on how to do this. For us, it was just:

shell$ sudo add-apt-repository ppa:certbot/certbot
shell$ sudo apt-get update
shell$ sudo apt-get install certbot

The next step was to request the certificates. There are Certbot “plugins” that automate the process but we chose a very manual process that gives us a little bit more control over the entire process:

shell$ sudo certbot certonly --webroot -w /srv/www/ -d -d -w /usr/share/wordpress -d -d
 Saving debug log to /var/log/letsencrypt/letsencrypt.log
 Starting new HTTPS connection (1):

 You have an existing certificate that contains a portion of the domains you
 requested (ref: /etc/letsencrypt/renewal/

It contains these names:,

You requested these names for the new certificate:,,,

Do you want to expand and replace this existing certificate with the new
 (E)xpand/(C)ancel: e
 Renewing an existing certificate
 Performing the following challenges:
 http-01 challenge for
 http-01 challenge for
 http-01 challenge for
 http-01 challenge for
 Using the webroot path /usr/share/wordpress for all unmatched domains.
 Waiting for verification...
 Cleaning up challenges
 Unable to clean up challenge directory /srv/www/
 Generating key (2048 bits): /etc/letsencrypt/keys/0001_key-certbot.pem
 Creating CSR: /etc/letsencrypt/csr/0001_csr-certbot.pem

 - Congratulations! Your certificate and chain have been saved at
 /etc/letsencrypt/live/ Your cert will
 expire on 2017-06-26. To obtain a new or tweaked version of this
 certificate in the future, simply run certbot again. To
 non-interactively renew *all* of your certificates, run "certbot
 - If you like Certbot, please consider supporting our work by:

Donating to ISRG / Let's Encrypt:
 Donating to EFF:

Note that I had previously requested a certificate for, and when I ran Certbot I requested a new domain to be listed in the certificate ( Certbot noticed that I had previously requested a certificate for and asked me if I wanted to expand the certificate to include the new domain.

As mentioned in the output from the certbot, the certificates (identity certificate for the website as well as the CA certificate) are left in the /etc/letsencrypt/live/ directory. At this point one just has to configure Apache to use these certificates.

Reset Azure Virtual Netwok Gateway

No matter what I did I could not get an IPsec site-to-site tunnel going between an offsite test network and our Microsoft Azure virtual network. Our VPN gateway is a Cisco ASA 5506.

The issue was that the Cisco ASA would try to bring up the tunnel but some part of the negotiation would go wrong at some point. Debug messages on the Cisco ASA would show something like this:

Apr 19 09:21:41 [IKEv1 DEBUG]IP =, IKE SA Proposal # 1, Transform # 2 acceptable Matches global IKE entry # 5
 Apr 19 09:21:41 [IKEv1 DEBUG]IP =, constructing ISAKMP SA payload
 Apr 19 09:21:41 [IKEv1 DEBUG]IP =, constructing NAT-Traversal VID ver RFC payload
 Apr 19 09:21:41 [IKEv1 DEBUG]IP =, constructing Fragmentation VID + extended capabilities payload
 Apr 19 09:21:41 [IKEv1]IP =, IKE_DECODE SENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:21:42 [IKEv1]IP =, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:43 [IKEv1]IP =, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:46 [IKEv1]IP =, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:49 [IKEv1]IP =, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:21:57 [IKEv1]IP =, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:22:05 [IKEv1]IP =, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:22:13 [IKEv1 DEBUG]IP =, IKE MM Responder FSM error history (struct &0x00002aaac1c34cf0) , : MM_DONE, EV_ERROR-->MM_WAIT_MSG3,
 Apr 19 09:22:13 [IKEv1 DEBUG]IP =, IKE SA MM:4af079c0 terminating: flags 0x01000002, refcnt 0, tuncnt 0
 Apr 19 09:22:13 [IKEv1 DEBUG]IP =, sending delete/delete with reason message
 Apr 19 09:23:06 [IKEv1]IP =, IKE_DECODE RECEIVED Message (msgid=6a9f34a4) with payloads : HDR + HASH (8) + DELETE (12) + NONE (0) total length : 68
 Apr 19 09:23:06 [IKEv1 DEBUG]Group =, IP =, processing hash payload
 Apr 19 09:23:06 [IKEv1 DEBUG]Group =, IP =, processing delete
 Apr 19 09:23:06 [IKEv1]Group =, IP =, Connection terminated for peer Reason: Peer Terminate Remote Proxy, Local Proxy
 Apr 19 09:23:06 [IKEv1 DEBUG]Group =, IP =, Active unit receives a delete event for remote peer

A couple of key points in the above debug messages:

  1. “MM_WAIT_MSG3, EV_TIMEOUT” indicates that the Cisco ASA timeout waiting for the Azure VPN gateway.
  2. “Duplicate first packet detected. Ignoring packet” indicates that the Azure VPN gateway is not liking the previous message that the Cisco ASA sends. Increasing the debug level (not shown above) indicates a mismatch in terms of cookies, and this is apparently what upsets the Azure Virtual Network Gateway.

This is shown on Cisco ASA debug messages at a higher debug level:

 InitiatorCookie: 03 83 AD 7C 10 26 CB D6
 ResponderCookie: 14 42 19 27 F6 F2 DF 53

 ISAKMP Header
 Initiator COOKIE: 03 83 ad 7c 10 26 cb d6
 Responder COOKIE: 00 00 00 00 00 00 00 00

These are debug messages produced on the Microsoft Azure side:

2016?-?03?-?02 10:31:37 ERROR user NULL 0000000FE1E59D80 0000000FE1E64320 f74513382e60832f cac68571e57c06d5 Invalid cookies. Try resetting SAs on-prem. IkeProcessPacketDispatch failed with HRESULT 0x80073616(ERROR_IPSEC_IKE_INVALID_COOKIE)

(Note the “ERROR_IPSEC_IKE_INVALID_COOKIE” error code.)

After spending sometime troubleshooting the Cisco ASA side we could not find anything wrong with the Cisco ASA configuration.

In the end, in my desperation, I decided to reset the Azure Virtual
Network Gateway and that seems to have fixed the issue for good.

The process to reset an Azure Virtual Network Gateway is a bit tricky because there is no way to do that using the Azure Portal; it needs to be done using PowerShell instead.

This is what I did to reset the Azure Virtual Network Gateway using

1. Install Azure PowerShell. I used the instructions here:

In particular, I went with the leaner and perhaps more complicated
installation from the PowerShell Gallery (instead of installing from

2. After Azure PowerShell was installed, I opene a PowerShell command window and ran the following commands:

Select-AzureRmSubscription -SubscriptionName "<your subscription name>"
$vg = Get-AzureRmVirtualNetworkGateway -ResourceGroupName RG
Reset-AzureRmVirtualNetworkGateway -VirtualNetworkGateway $vg

Apparently, Azure Virtual Network Gateways are redundant Virtual Machines so resetting one will cause the other to take over.

The other one could be reset by invoking “Reset-AzureRmVirtualNetworkGateway” a few minutes after the first gateway has been reset but in my case the site-to-site VPN tunnel came up after resetting only one of the gateways.

Note that the above XXXXX-AzureRmXXXXX PowerShell cmdlets use the new
Azure Resource Manager deployment model. Similar commands would have to
be used if the classic deployment model is used instead.

This article:

is a good reference for how to reset Azure Virtual Network Gateways that have been deployed using the classic deployment model. Note that it says that the same cannot be done for the Resource Manager deployment model but I think the capability is there now (I used it) and it is just that the article has not been updated yet.

On a related note, I should mention that another way of dealing with this problem is by deploying a Cisco ASAv virtual appliance and using that to terminate the site-to-site IPsec tunnel instead of terminating it on the Microsoft-provided Azure Virtual Network Gateway. This of course would be more expensive given that licenses for the Cisco ASAv would have to be purchased, plus it is another Virtual Machine that would have to deployed (and pay for).

SSL Traffic Decryption

We recently had a need to inspect the contents of an HTTPS (SSL/TLS) connection. As we had never had the opportunity to set things up to facilitate decryption of SSL/TLS connection we had to do a little bit of research.

The way we approached this was by running the software that establishes the HTTPS connection we need to decrypt on a VirtualBox Virtual Machine (VM), and then running a Man-in-the-Middle (MitM) proxy on the VM host, which runs Ubuntu 15.04. The MitM proxy that we used was mitmproxy. No reason in particular for choosing mitmproxy other than it was the first solution that we tried, it was very well documented, and it worked on first try. We are very impressed with this little piece of software — its design is well thought, and the text-based user interface is very powerful.

This post documents the steps involved in setting things up for decryption of SSL sessions.


There are several possible network topologies to use. The one that we chose was one where the client machine and the proxy machine are on the same physical network. Because we are using VirtualBox, where the Virtual Machine is the client machine and the Virtual Machine host is a physical machine, we configured the network settings of the (client) Virtual Machine to use bridged networking. This is equivalent to having two different machines on the same physical network segment.

Note: Our proxy machine (not a Virtual Machine) only had a wi-fi network interface so the Virtual Machine, through bridged networking, was using this wi-fi network interface to reach the network.

Set Up Of Proxy Machine


Nothing much to do here, really, as there is an Ubuntu binary package for mitmproxy, so installation boils down to a simple “apt-get install mitmproxy”.

VirtualBox Settings

The network interface of the virtual machine must be configured in bridged mode. The VM host machine only needs one interface (for example, the wireless NIC “wlan0”). That interface will be used for both the VM host and the actual VM to have network connectivity. Make sure the VM NIC is configured to use the VM host NIC as the bridge interface.

Also, VirtualBox must be configured to allow promiscuous mode on the bridge interface. This is configured in the “Advanced” section of the network adapter properties (where the interface mode [bridged, NAT, etc.] is configured). “Allow VMs” for the “Promiscuous Mode” setting is appropriate.


After installing the mitmproxy software, the following things must be done:

  • Enable IP forwarding, which is normally disabled by default:
shell$ sudo sh -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
  • Disable ICMP redirects:
shell$ sudo sh -c 'echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects'
shell$ sudo sh -c 'echo 0 > /proc/sys/net/ipv4/conf/wlan0/send_redirects'
  • Add iptables rules to redirect traffic going to destination TCP port 443 to port 8080, which is where mitmproxy is listening on:
shell$ sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 80 -j REDIRECT --to-port 8080
shell$ sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 443 -j REDIRECT --to-port 8080
  • Run mitmproxy:
shell$ mitmproxy -T --host

Setup Of Client Machine

The following things need to be configured on the client machine:

  • Configure the machine running mitmproxy (the proxy machine) as the default gateway. This will cause all SSL/TLS traffic going towards the server to be sent through the proxy machine, assuming that the server is on a different (remote) subnet.
  • Install the Certificate Authority certificate that the proxy machine will present to the client when the client establishes SSL/TLS sessions. mitmproxy really shines in this area, making the certificate installation a very seamless process. We will not repeat here the excellent documentation on how to do this. Instead, we will point readers to the documentation:
  • The default gateway of the client machine must obviously be the proxy machine. The easiest way to accomplish this is by configuring manually the TCP/IP settings of the client machine. If DHCP is used for IP configuration then the default gateway will be whatever the DHCP sends, which might be different from the IP address of the proxy machine. In that case the client machine can be forced to use the proxy machine as its default gateway by adding a new default route using a lower metric, for example: ip route add default via <IP address of proxy machine> metric 50.

Decrypting The SSL/TLS Session

Once the machine running the mitmproxy software (the “proxy” machine) and the machine running the SSL/TLS client (the “client” machine) are configured, we are ready to establish the SSL/TLS sessions that we want to decrypt — just open your browser and go to the https:// URL you are interested in examining, launch your SSL VPN client, etc.

The proxy machine will intercept the connection and do what it does well, i.e. pretend to the server to be the client, and pretend to the client to be the server, while decrypting traffic going in both directions.

mitmproxy provides a fantastic text-based user interface that allows the user to easily navigate through each SSL/TLS request and response going through the proxy.

The following screenshot (click on image for larger version) shows the main mitmproxy window, which lists all the captured flows:


The following screenshot (click on image for larger version) shows a particular flow, especifically the request part of the flow:


And finally, this screenshot (click on image for larger version) shows the server’s response to the previous request:


And that is it; there really isn’t anything to it. It took longer to read the mitmproxy documentation than to set things up and run the SSL/TLS session.

Final Thoughts

From the main mitmproxy window, all flows can be saved to a file for later analysis by pressing the ‘w’ (write) key, which will prompt if all flows must be saved or just the one at the cursor, and the name of the file to save the flows to.

Flows can be loaded later by running mitmproxy with the -r (read) switch.


Be aware of mitmproxy bug 659 ( This bug causes HTTP HEAD requests to return a Content-Length equal to zero instead of the correct value. This will cause some applications to fail as they will think there is nothing to download. This trip me up pretty good until I found the previously mentioned bug and I applied the fix that was committed to resolve the bug.


More GNOME/Ubuntu Unity System-wide Defaults

In this post we discussed how to set up system-wide defaults using GSettings schema overrides. This works great but we recently ran into a situation where this was not possible because the schema we wanted to modify was “relocatable”. Trying to modify such a schema without specifying a DConf path results in the following error:

$ gsettings set org.compiz.opengl sync-to-vblank true
Schema 'org.compiz.opengl' is relocatable (path must be specified)

The correct way to change a relocatable schema is by appending the path, as the error message above states. For example:

$ gsettings get org.compiz.opengl:/apps/compiz-1/plugins/opengl/screen0/options/sync_to_vblank/ sync-to-vblank

(Note the “:/apps/compiz-1/plugins/opengl/screen0/options/sync_to_vblank/ after the schema name; this is the path to the preference.)

The problem with this approach is that either by design or because it is a bug, it is not possible to write a schema override file that includes a DConf path. This Ubuntu bug seems to imply that this is a bug.

Another way to accomplish a system-wide default is by going to a lower level than GSettings and configuring DConf directly. It is probably better to use GSettings but in this particular case we had no option.

Here’s what we did:

First, we created the file /etc/dconf/profile/user with the following contents:


Next, we created the directory /etc/dconf/db/system-wide.d and the file /etc/dconf/db/system-wide.d/00_compiz_site_settings with the following contents:


We then ran the command “dconf update” (as root) which created the DConf database /etc/dconf/db/system-wide (a binary file).

This causes the “opengl” Compiz plugin preference “enable-x11-sync” to be set to “false” for all users in the system.


This blog post from Ross Burton has a good discussion on how to set system-wide settings using GSettings: It would have been a good reference to provide in my previous post but we missed it when we wrote that post.

The dconf System Administrator Guide is a fantastic reference to understand how to set system-wide defaults using DConf. One thing that was not clear to use after reading this document was that of DConf profile selection — the explanation above uses the file /etc/dconf/profiles/user because if no other DConf profile is selected (via the DCONF_PROFILE environment variable) then the profile called “user” is the one that is opened.

This post by Matt Fischer was extremely useful to understand how things work with DConf. Based on this post is that we realized we needed to use a profile called “user”.

Finally, this question has very good insight into the differences between DConf and GSettings.

IPv6 Automatic Configuration

The Information Technology folks at the place I work enabled IPv6 a few months ago. Things worked great for a while but I recently noticed that I was not able to reach the IPv6 Internet. A quick investigation showed that IT disabled IPv6 Stateless Address Autoconfiguration (SLAAC) and enabled DHCPv6:

router>sh ipv6 interface vlan 320 
Vlan320 is up, line protocol is up
  IPv6 is enabled, link-local address is FE80::208:E3FF:FEFF:FD90 
  No Virtual link-local address(es):
  Description: data320
  Global unicast address(es):
    20xx:xxx:xxx:xxx::1, subnet is 20xx:xxx:xxx:xxx::/64 
  Joined group address(es):
  MTU is 1500 bytes
  ICMP error messages limited to one every 100 milliseconds
  ICMP redirects are disabled
  ICMP unreachables are disabled
  Input features: Verify Unicast Reverse-Path
  Output features: MFIB Adjacency HW Shortcut Installation
  Post_Encap features: HW shortcut
 IPv6 verify source reachable-via any
   0 verification drop(s) (process), 0 (CEF)
   9 suppressed verification drop(s) (process), 9 (CEF)
  ND DAD is enabled, number of DAD attempts: 1
  ND reachable time is 30000 milliseconds (using 30000)
  ND advertised reachable time is 0 (unspecified)
  ND advertised retransmit interval is 0 (unspecified)
  ND router advertisements are sent every 200 seconds
  ND router advertisements live for 1800 seconds
  ND advertised default router preference is Medium
  Hosts use DHCP to obtain routable addresses.

Note the “Hosts use DHCP to obtain routable addresses” message — it used to be “Hosts use stateless autoconfig for addresses”.

I am using NetworkManager on Ubuntu 14.10 to manage my network configuration. The version of NetworkManager on Ubuntu 14.10 is The IPv6 configuration methods available for NetworkManager can be seen in the following screenshot:

When SLAAC was enabled, I had my network interface configured using the “Automatic” IPv6 configuration method. After IT switched to DHCPv6 this setting prevented my computer from getting an IPv6 address.

After switching to the “Automatic, DHCP only” method I was able to obtain an IPv6 address.

Unfortunately, it seems like the version of NetworkManager in Ubuntu 14.10 has a bug that prevents the installation of a default route (which is not obtained via DHCPv6 but via Neighbor Discovery Router Advertisement messages). The root cause of the bug seems to be that NetworkManager instructs the kernel to ignore Router Advertisement messages. It looks like this bug is fixed in NetworkManager versions and later, but I decided to just live in an IPv4 at work instead of trying to backport the fix to NetworkManager 0.9.8, or trying to build NetworkManager or later for Ubuntu 14.10.

Note: This blog post was helpful for me to understand what was happening:

ZoneMinder Hash Logins

ZoneMinder is a fantastic Linux video camera security and surveillance solution dosierung cialis. I have a ZoneMinder installation at home that I use to monitor a few IP-based video cameras.

My ZoneMinder installation uses the built-in authentication system, which means that only authenticated users can access the system.

One problem with using the built-in authentication system, however, is that it makes it hard to access video from the cameras from outside of ZoneMinder. For example, if I wanted to take a snapshot from a shell script of what one of the ZoneMinder cameras is currently recording, that would not be very easy to accomplish because the shell script would have to somehow log in first, establish a fake web browsing session (session cookie, etc.), and then finally request the snapshot.

Fortunately, ZoneMinder offers a way to accomplish this without too much hassle via a  feature called “hash logins”, which is enabled by setting the option ZM_AUTH_HASH_LOGINS (Options->System->ZM_AUTH_HASH_LOGINS).

The way to use this is by appending a ‘&auth=<login hash>’ parameter to the ZoneMinder URL one wants to access. For example, running the following command would retrieve a snapshot (in JPEG format) from the camera with monitor ID 8:

shell$ curl 'http://www.example.comt/cgi-bin/nph-zms?mode=single&monitor=8&auth=d8b45b3cf3b24407d09cbc16123f3549' -o /tmp/snapshot.jpg
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
 Dload  Upload   Total   Spent    Left  Speed
 100 22813  100 22813    0     0   9488      0  0:00:02  0:00:02 --:--:--  9485

The only complicated part here is knowing how to generate the hash used in the “auth” parameter. This does not seem to be documented anywhere, nor could I find any examples, but a quick read of the ZoneMinder code provided the necessary clues: The hash is calculated by the function getAuthUser() in the file includes/functions.php like this:

$time = localtime( $now );
$authKey = ZM_AUTH_HASH_SECRET.$user['Username'].$user['Password'].$remoteAddr.$time[2].$time[3].$time[4].$time[5];
$authHash = md5( $authKey );

ZM_AUTH_HASH_SECRET is a string chosen by the user via ZoneMinder options. $user[‘Username’] and $user[‘Password’] come from the ‘Users’ table in the ZoneMinder database — getAuthUser() will iterate over all existing users trying to find a match. $time contains the local time and $time[2], $time[3], $time[4], and $time[5] contain the current hour, day of the month, month and year, respectively. The code tries to find a match in the last two hours (in other words a login hash will found to be valid for up to two hours).

A simple test program to generate the hash looks like this:

 // Better not to use ZM_AUTH_HASH_IPS as this will break authentication
 // when client is behind NAT because the IP address of the client is used
 // to calculate the hash if ZM_AUTH_HASH_IPS is set.
 $time = localtime();
 $authKey = 'mykey' . 'myuser' . '*0945FE11CAC14C0A4A72A01234DD00388DE250EC' . $time[2] . $time[3] . $time[4] . $time[5];
 echo "\$authKey = $authKey\n";
 $authHash = md5($authKey);
 echo "\$authHash = $authHash\n";

Note that the hashed password for the user needs to be provided to the script (the user password is passed through the MySQL PASSWORD() function to finally obtain the password hash that is stored in the ZoneMinder database).

How to make practical use of all this? A script could generate the login hash and then access some part of ZoneMinder via an HTTP request. One could use this, for example, in a script run via cron every few minutes to take snapshots to produce a time-lapse video. This is left as an exercise for the reader.

Slow SSH logins/spun-down disk woken during SSH logins

For a while I have been troubleshooting why SSH logins into my Ubuntu server running 14.04 LTS are seemingly slow. I enabled SSH debugs (LogLevel set to DEBUG in /etc/ssh/{sshd_config,ssh_config}) on both the client and the server and did not find anything that pointed to an issue with the SSH negotiation/login itself. Recently I discovered that the 10-second delay in logging in had to do with SSH logins causing a hard disk that I keep spun down to be spun up. That takes about 10 seconds, which explains the delay.

Turns out that by default, console and SSH logins cause the scripts in /etc/update-motd.d/ to be executed. The script /etc/update-motd.d/98-fsck-at-reboot in particular is what causes disks to be spun up.

The scripts in /etc/update-motd-d/ are responsible for generating the file /run/motd.dynamic, which looks like this:

Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-37-generic x86_64)

 * Documentation:

  System information as of Thu Oct  9 10:05:21 EDT 2014

  System load:  0.37                Processes:           164
  Usage of /:   18.1% of 915.51GB   Users logged in:     1
  Memory usage: 55%                 IP address for eth0:
  Swap usage:   12%

  Graph this data and manage this system at:

That is pretty and provides some good information but for me I rather have a shell prompt immediately after I hit the <Enter> key after typing “ssh <my server>”. I then can manually execute some useful commands like “w” to see who is logged in, the uptime, etc.

I have disabled execution of the /etc/update-motd.d/ scripts by tweaking the files /etc/pam.d/{login,sshd} in the following way:

# Prints the message of the day upon succesful login.
# (Replaces the `MOTD_FILE' option in login.defs)
# This includes a dynamically generated part from /run/motd.dynamic
# and a static (admin-editable) part from /etc/motd.
#session    optional  motd=/run/motd.dynamic noupdate
#session    optional

# Disable display of /run/motd.dynamic and add "noupdate" so
# scripts in /etc/update-motd.d/* are not called.
# 20141009
session    optional noupdate

The first two commented out “session” lines are what were there by default. I commented them out and then added a new “session” line that has the “noupdate” keyword, which is actually what causes the module to not execute the scripts in /etc/update-motd.d/. This uncommented “session” line is what will display the standard /etc/motd file (if it exists).

Now a SSH login is fast, does not provide a lot of information upon log in (which I like), and, most important, spun down disks are not spun up upon log in:

$ ssh altamira
You have new mail.
Last login: Thu Oct  9 10:23:44 2014 from 2003:450:e6a5:100:a412:6z7a:897d:1387
altamira$ # That was a very fast login!

The following question is what pointed me in the right direction:

This is an issue that seems to have been introduced when I upgraded this server from Ubuntu 12.04 LTS to 14.04LTS. In particular, my Ubuntu 12.04LTS installation did not have the file /etc/update-motd.d/98-fsck-at-reboot.

Buffer Overflow in Embedded Microcontroller. Ouch.

We have been chasing down a mysterious problem in a embedded application we are developing. It is a sprinkler timer to water our yard. The microcontroller is an ATmega AVR 328p, the same microcontroller that some of the Arduino platforms use.

The application has an RFM12B wireless transceiver from Hope RF, and we are using this RFM12 library:

The issue is that the application works fine for days, sending and receiving data through the RFM12B wireless transceiver, but all of the sudden, it would stop working. Resetting the device restores operation.

The device is outside, in our yard, right next to a garden bib. It is not easy to troubleshoot this problem because all connectivity that we have to the device is via the RFM12B transceiver, so if it the wireless subsystem of the application is hosed, then there really is nothing we can do to troubleshoot. In addition, the wireless command set in in the firmware is currently very limited so we do not have good debugging tools even if the wireless subsystem is up. The firmware does send debugging information through a serial port on the microcontroller, though, so we have to use that in absence of other debugging tools.

The first step we took to debug the issue was to provide remote access to the microcontroller’s serial port. For this, we used a Raspberry Pi with an USB-to-serial adapter and with a WiFi USB adapter. We then left minicom (a serial communication program for Linux) running and came back every once in while, via remote SSH session, to check. Today we found an occurrence of the issue and investigated. This is what we found:

First, we noticed that the wireless subsystem was not working when we saw the following messages on the device’s console:

[2014-08-06 17:26:53] rfm12_tx() error 3
[2014-08-06 17:26:58] rfm12_tx() error 3
[2014-08-06 17:27:03] rfm12_tx() error 3
[2014-08-06 17:27:08] rfm12_tx() error 3

We then ran a diagnostic command on the console:

# show rfm12
[2014-08-06 17:29:11] My RFM12 node ID: 111.
[2014-08-06 17:29:11] RFM12 status register: 0x023f.
[2014-08-06 17:29:11] RFM12 state = 0
[2014-08-06 17:29:11] RFM12 tx state = 162
[2014-08-06 17:29:11] Number of bytes to transmit or receive = 219
[2014-08-06 17:29:11] Current byte count = 1
[2014-08-06 17:29:11] Total tx bytes = 40622
[2014-08-06 17:29:11] Total errors = 29983

A few things caught our attention:

  1. The RFM12 node ID should be 12 but the output says that it is 111.
  2. There is no such thing as RFM12 tx state 162 in the RFM12 library; it goes up to 3 or 4.
  3. The command prompt should be “frontlawn# “, but it is just “# “.

Obviously, some bad memory corruption has taken place here. Let’s see what we find…

The other piece of information we found in the minicom history buffer was this:

[2014-08-06 14:36:29] frontlawn# RFM12 rx packet: flags AD., hdr = 0x6c, len = 13.                                                
[2014-08-06 14:37:46] 00000000  3445 7365 7420 7661 6c20 6f20 33        4Eset val o 3                                             
[2014-08-06 14:37:46]                                                                                                             
[2014-08-06 14:38:27] RFM12 rx packet: flags AD., hdr = 0x6c, len = 14.                                                           
[2014-08-06 14:38:27] 00000000  3446 7365 7420 7661 6c20 636c 2033      4Fset val cl 3                                            
[2014-08-06 14:38:27]                                                                                                             
[2014-08-06 15:07:10] RFM12 rx packet: flags ..., hdr = 0x0f, len = 254.                                                          
[2014-08-06 15:07:11] 00000000  3445 7365 7420 7661 6c20 6f20 336e 2033 4Eset val o 3n 3                                          
[2014-08-06 15:07:11] 00000010  ffff ffff 3337 fd7f fc0f fd1c 201e 07ff ....37...... ...                                          
[2014-08-06 15:07:11] 00000020  07fb 03df 407c 741f 2dd4 040c f721 a302 ....@|t.-....!..                                          
[2014-08-06 15:07:11] 00000030  3529 0b05 080e 0200 0000 0000 0000 0000 5)..............                                          
[2014-08-06 15:07:11] 00000040  0000 0000 0000 0000 0000 0000 0000 0000 ................                                          
[2014-08-06 15:07:11] 00000050  0000 0000 0000 0000 0000 0100 0000 008a ................                                          
[2014-08-06 15:07:11] 00000060  e955 ae3c b9e0 c7f0 6f10 179c bd65 625f .U.<....o....eb_                                          
[2014-08-06 15:07:11] 00000070  3aea 998e 925c 1f43 152b 6b49 5e0b cca2 :....\.C.+kI^...                                          
[2014-08-06 15:07:11] 00000080  cdd5 313e ae9e 1f75 9a3d 1eea d964 bfee ..1>...u.=...d..                                          
[2014-08-06 15:07:11] 00000090  bffe 96dc 5df9 6b2c fdb7 8dd5 daab b8aa ....].k,........                                          
[2014-08-06 15:07:11] 000000a0  8fc5 6e42 8df9 9e7b 53bf 3cf6 fd19 7737 ..nB...{S.<...w7                                          
[2014-08-06 15:07:11] 000000b0  2767 f67c 975b 8f5a a7ea 6e63 bd39 2258 'g.|.["X                                          
[2014-08-06 15:07:11] 000000c0  756e a1bf 80fd 4b56 f0e3 e7fb bb28 ef93 un....KV.....(..                                          
[2014-08-06 15:07:11] 000000d0  e353 2308 50fe 49f6 7b4f 2300 b087 f1fa .S#.P.I.{O#.....                                          
[2014-08-06 15:07:12] 000000e0  9581 ff47 aba7 75a2 c0eb 91fa 6b7a 80e3 ...G..u.. cialis 20mg                                          
[2014-08-06 15:07:12] 000000f0  52e0 93c0 da66 bdfb 279b 8a07 af12      R....f..'.....                                            
[2014-08-06 15:07:12]                                                                                                             
[2014-08-06 15:07:12] Invalid valve number.                                                                                       
[2014-08-06 15:07:12] rfm12_tx() error 3                                                                                          
[2014-08-06 15:07:17] rfm12_tx() error 3

At 14:36:29 the controller received the command “set val o 3” (short for “set valve open 3”, which instructs the controller to open valve number 3).

At 14:38:27 the controller received the command “set val cl 3” (short for “set valve close 3”, which instructs the controller to close valve number 3).

We sent these commands and everything was good until this point.

However, at 15:07:10, we receive a giant packet of 254 bytes. That is the first sign of trouble — we only have 2048 bytes of RAM in an ATmega 328p microcontroller, so we have never configured such a large receive buffer in our application. We check the library source code and see that the size of the receive buffer is configurable and we have configured it to be 40 bytes.

Is it possible that the RFM12 library is buggy and is accepting such a large packet and storing it in a 40-byte buffer? No, we checked and it has the correct validations. This points to an issue in our code…

This is our code:

        if (rfm12_rx_status() == STATUS_COMPLETE) {
            pkt_hdr.__hdr_val = rfm12_rx_type();
            pkt_len = rfm12_rx_len();
            pkt_data = rfm12_rx_buffer();
            if (pkt_len == 0)
                goto pkt_processed;

             * Action to take on received packet depends on the payload
             * type, which is the first byte of the payload.
            if (pkt_data[0] == RFM12_PAYLOAD_CLICMD) {
                memcpy(cli_cmdbuf, pkt_data + 2, pkt_len - 2);
                cli_cmdbuf[pkt_len - 2] = '\0';

The first byte of the packet (pkt_data[0]) is 0x34, according to the packet dump above. 0x34 happens to be RFM12_PAYLOAD_CLICMD, so the condition of the “if ()” statement is true and we take that code path. The first instruction in that code path is a memcpy() that will copy pkt_len – 2 bytes, i.e. 254 – 2 = 252 bytes into the array cli_cmdbuf[].

cli_cmdbuf[] is a global variable declared as:

/* A command cannot be longer that this many characters. */
#define CLI_MAX_CMDLEN 80

char cli_cmdbuf[CLI_MAX_CMDLEN];

Ouch! Yes, our buffer has been overflowed with this memcpy(). By 252 – 80 = 172 bytes! This explains the memory corruption and the failure of the wireless subsystem of the application.

A couple of observations about this whole issue:

  1. The received packet that caused the entire problem should not have been received. We have no idea who sent it, or why it is so large, but the fact of the matter is that we received it. Yes, it obviously is bogus, but we cannot allow a stray packet to bring us down like that. The lesson re-learned here is that our application needs to be able to handle exceptional conditions such as this.
  2. Of course, we were not aware of such exceptional condition before. Otherwise we would have put a check in place to prevent the disastrous memcpy(). The main problem here is that we were acting under the assumption that the low level RFM12 driver would not pass to us a packet that has a received, advertised length that is larger than the configured size of the receive buffer — the low level driver checks for the available space in the receive buffer and does not store more data if there is no more space left. However, it notifies the application that there is a packet ready to be handled, even if that packet is incomplete because it did not fit in the receive buffer. We guess this behavior is debatable because the application might want to consume that packet even if it is not complete (for example, do some error handling, print a debug message, etc.).
  3. In the world of embedded applications, the importance of debugging using a serial port cannot be overstated — we would have had a very hard time to find this issue if we had not seen the packet dump of that 254-byte packet on the serial console.

In terms of addressing the issue — we are going to try to address the problem by checking in our application that the size of the received advertised length of the packet is less than the size of the receive buffer. If the size is larger then we will ignore the packet. We will see how this plays out…