SSL Certificates Made Easy (and Cheap!)

Running an SSL-enabled website is a best practice but often made difficult by the fact that one needs a Private Key Infrastructure (PKI) to obtain the SSL certificates needed for SSL operation.

There are two options for using a PKI: 1. Deploy your own PKI, and 2. Use a public PKI. The former is cheap (free) but has a steeper learning curve because one needs to know how to set up the Certification Authority (CA) server software and how to manage the PKI (generate Certificate Signing Requests [CSRs], sign certificates, revoke certificates, deploy the root CA certificate to endusers’ devices, etc.). The latter can be non-free but is easier as the PKI is already established and one only needs to request a certificate, sometimes for a price.

The Let’s Encrypt project is “[…] a free, automated, and open certificate authority (CA), run for the public’s benefit. It is a service provided by the Internet Security Research Group (ISRG).” See for additional details about the Let’s Encrypt project. Two important details about certificates issued by the Let’s Encrypt project is that: 1. They are free, and 2. Browsers trust the CA that issues them, so there is no need to distribute CA root certificates to endusers’ devices.

We run an Apache web server that serves a few domains via virtual hosts and it was easy to set them up to use certificates issued by the Let’s Encrypt project. Here are the details:

We run Apache on Ubuntu so the first thing we had to do was to install an ACME client (ACME is a protocol used to fetch certificates). The ACME client recommended by the Let’s Encrypt project is called Certbot. According to the Certbot’s website, “Certbot is an easy-to-use automatic client that fetches and deploys SSL/TLS certificates for your webserver. Certbot was developed by EFF and others as a client for Let’s Encrypt and was previously known as “the official Let’s Encrypt client” or “the Let’s Encrypt Python client.” Certbot will also work with any other CAs that support the ACME protocol”.

The Certbot website has clear instructions on how to do this. For us, it was just:

shell$ sudo add-apt-repository ppa:certbot/certbot
shell$ sudo apt-get update
shell$ sudo apt-get install certbot

The next step was to request the certificates. There are Certbot “plugins” that automate the process but we chose a very manual process that gives us a little bit more control over the entire process:

shell$ sudo certbot certonly --webroot -w /srv/www/ -d -d -w /usr/share/wordpress -d -d
 Saving debug log to /var/log/letsencrypt/letsencrypt.log
 Starting new HTTPS connection (1):

 You have an existing certificate that contains a portion of the domains you
 requested (ref: /etc/letsencrypt/renewal/

It contains these names:,

You requested these names for the new certificate:,,,

Do you want to expand and replace this existing certificate with the new
 (E)xpand/(C)ancel: e
 Renewing an existing certificate
 Performing the following challenges:
 http-01 challenge for
 http-01 challenge for
 http-01 challenge for
 http-01 challenge for
 Using the webroot path /usr/share/wordpress for all unmatched domains.
 Waiting for verification...
 Cleaning up challenges
 Unable to clean up challenge directory /srv/www/
 Generating key (2048 bits): /etc/letsencrypt/keys/0001_key-certbot.pem
 Creating CSR: /etc/letsencrypt/csr/0001_csr-certbot.pem

 - Congratulations! Your certificate and chain have been saved at
 /etc/letsencrypt/live/ Your cert will
 expire on 2017-06-26. To obtain a new or tweaked version of this
 certificate in the future, simply run certbot again. To
 non-interactively renew *all* of your certificates, run "certbot
 - If you like Certbot, please consider supporting our work by:

Donating to ISRG / Let's Encrypt:
 Donating to EFF:

Note that I had previously requested a certificate for, and when I ran Certbot I requested a new domain to be listed in the certificate ( Certbot noticed that I had previously requested a certificate for and asked me if I wanted to expand the certificate to include the new domain.

As mentioned in the output from the certbot, the certificates (identity certificate for the website as well as the CA certificate) are left in the /etc/letsencrypt/live/ directory. At this point one just has to configure Apache to use these certificates.

Reset Azure Virtual Netwok Gateway

No matter what I did I could not get an IPsec site-to-site tunnel going between an offsite test network and our Microsoft Azure virtual network. Our VPN gateway is a Cisco ASA 5506.

The issue was that the Cisco ASA would try to bring up the tunnel but some part of the negotiation would go wrong at some point. Debug messages on the Cisco ASA would show something like this:

Apr 19 09:21:41 [IKEv1 DEBUG]IP =, IKE SA Proposal # 1, Transform # 2 acceptable Matches global IKE entry # 5
 Apr 19 09:21:41 [IKEv1 DEBUG]IP =, constructing ISAKMP SA payload
 Apr 19 09:21:41 [IKEv1 DEBUG]IP =, constructing NAT-Traversal VID ver RFC payload
 Apr 19 09:21:41 [IKEv1 DEBUG]IP =, constructing Fragmentation VID + extended capabilities payload
 Apr 19 09:21:41 [IKEv1]IP =, IKE_DECODE SENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:21:42 [IKEv1]IP =, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:43 [IKEv1]IP =, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:46 [IKEv1]IP =, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:49 [IKEv1]IP =, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:21:57 [IKEv1]IP =, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:22:05 [IKEv1]IP =, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:22:13 [IKEv1 DEBUG]IP =, IKE MM Responder FSM error history (struct &0x00002aaac1c34cf0) , : MM_DONE, EV_ERROR-->MM_WAIT_MSG3,
 Apr 19 09:22:13 [IKEv1 DEBUG]IP =, IKE SA MM:4af079c0 terminating: flags 0x01000002, refcnt 0, tuncnt 0
 Apr 19 09:22:13 [IKEv1 DEBUG]IP =, sending delete/delete with reason message
 Apr 19 09:23:06 [IKEv1]IP =, IKE_DECODE RECEIVED Message (msgid=6a9f34a4) with payloads : HDR + HASH (8) + DELETE (12) + NONE (0) total length : 68
 Apr 19 09:23:06 [IKEv1 DEBUG]Group =, IP =, processing hash payload
 Apr 19 09:23:06 [IKEv1 DEBUG]Group =, IP =, processing delete
 Apr 19 09:23:06 [IKEv1]Group =, IP =, Connection terminated for peer Reason: Peer Terminate Remote Proxy, Local Proxy
 Apr 19 09:23:06 [IKEv1 DEBUG]Group =, IP =, Active unit receives a delete event for remote peer

A couple of key points in the above debug messages:

  1. “MM_WAIT_MSG3, EV_TIMEOUT” indicates that the Cisco ASA timeout waiting for the Azure VPN gateway.
  2. “Duplicate first packet detected. Ignoring packet” indicates that the Azure VPN gateway is not liking the previous message that the Cisco ASA sends. Increasing the debug level (not shown above) indicates a mismatch in terms of cookies, and this is apparently what upsets the Azure Virtual Network Gateway.

This is shown on Cisco ASA debug messages at a higher debug level:

 InitiatorCookie: 03 83 AD 7C 10 26 CB D6
 ResponderCookie: 14 42 19 27 F6 F2 DF 53

 ISAKMP Header
 Initiator COOKIE: 03 83 ad 7c 10 26 cb d6
 Responder COOKIE: 00 00 00 00 00 00 00 00

These are debug messages produced on the Microsoft Azure side:

2016?-?03?-?02 10:31:37 ERROR user NULL 0000000FE1E59D80 0000000FE1E64320 f74513382e60832f cac68571e57c06d5 Invalid cookies. Try resetting SAs on-prem. IkeProcessPacketDispatch failed with HRESULT 0x80073616(ERROR_IPSEC_IKE_INVALID_COOKIE)

(Note the “ERROR_IPSEC_IKE_INVALID_COOKIE” error code.)

After spending sometime troubleshooting the Cisco ASA side we could not find anything wrong with the Cisco ASA configuration.

In the end, in my desperation, I decided to reset the Azure Virtual
Network Gateway and that seems to have fixed the issue for good.

The process to reset an Azure Virtual Network Gateway is a bit tricky because there is no way to do that using the Azure Portal; it needs to be done using PowerShell instead.

This is what I did to reset the Azure Virtual Network Gateway using

1. Install Azure PowerShell. I used the instructions here:

In particular, I went with the leaner and perhaps more complicated
installation from the PowerShell Gallery (instead of installing from

2. After Azure PowerShell was installed, I opene a PowerShell command window and ran the following commands:

Select-AzureRmSubscription -SubscriptionName "<your subscription name>"
$vg = Get-AzureRmVirtualNetworkGateway -ResourceGroupName RG
Reset-AzureRmVirtualNetworkGateway -VirtualNetworkGateway $vg

Apparently, Azure Virtual Network Gateways are redundant Virtual Machines so resetting one will cause the other to take over.

The other one could be reset by invoking “Reset-AzureRmVirtualNetworkGateway” a few minutes after the first gateway has been reset but in my case the site-to-site VPN tunnel came up after resetting only one of the gateways.

Note that the above XXXXX-AzureRmXXXXX PowerShell cmdlets use the new
Azure Resource Manager deployment model. Similar commands would have to
be used if the classic deployment model is used instead.

This article:

is a good reference for how to reset Azure Virtual Network Gateways that have been deployed using the classic deployment model. Note that it says that the same cannot be done for the Resource Manager deployment model but I think the capability is there now (I used it) and it is just that the article has not been updated yet.

On a related note, I should mention that another way of dealing with this problem is by deploying a Cisco ASAv virtual appliance and using that to terminate the site-to-site IPsec tunnel instead of terminating it on the Microsoft-provided Azure Virtual Network Gateway. This of course would be more expensive given that licenses for the Cisco ASAv would have to be purchased, plus it is another Virtual Machine that would have to deployed (and pay for).

SSL Traffic Decryption

We recently had a need to inspect the contents of an HTTPS (SSL/TLS) connection. As we had never had the opportunity to set things up to facilitate decryption of SSL/TLS connection we had to do a little bit of research.

The way we approached this was by running the software that establishes the HTTPS connection we need to decrypt on a VirtualBox Virtual Machine (VM), and then running a Man-in-the-Middle (MitM) proxy on the VM host, which runs Ubuntu 15.04. The MitM proxy that we used was mitmproxy. No reason in particular for choosing mitmproxy other than it was the first solution that we tried, it was very well documented, and it worked on first try. We are very impressed with this little piece of software — its design is well thought, and the text-based user interface is very powerful.

This post documents the steps involved in setting things up for decryption of SSL sessions.


There are several possible network topologies to use. The one that we chose was one where the client machine and the proxy machine are on the same physical network. Because we are using VirtualBox, where the Virtual Machine is the client machine and the Virtual Machine host is a physical machine, we configured the network settings of the (client) Virtual Machine to use bridged networking. This is equivalent to having two different machines on the same physical network segment.

Note: Our proxy machine (not a Virtual Machine) only had a wi-fi network interface so the Virtual Machine, through bridged networking, was using this wi-fi network interface to reach the network.

Set Up Of Proxy Machine


Nothing much to do here, really, as there is an Ubuntu binary package for mitmproxy, so installation boils down to a simple “apt-get install mitmproxy”.

VirtualBox Settings

The network interface of the virtual machine must be configured in bridged mode. The VM host machine only needs one interface (for example, the wireless NIC “wlan0”). That interface will be used for both the VM host and the actual VM to have network connectivity. Make sure the VM NIC is configured to use the VM host NIC as the bridge interface.

Also, VirtualBox must be configured to allow promiscuous mode on the bridge interface. This is configured in the “Advanced” section of the network adapter properties (where the interface mode [bridged, NAT, etc.] is configured). “Allow VMs” for the “Promiscuous Mode” setting is appropriate.


After installing the mitmproxy software, the following things must be done:

  • Enable IP forwarding, which is normally disabled by default:
shell$ sudo sh -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
  • Disable ICMP redirects:
shell$ sudo sh -c 'echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects'
shell$ sudo sh -c 'echo 0 > /proc/sys/net/ipv4/conf/wlan0/send_redirects'
  • Add iptables rules to redirect traffic going to destination TCP port 443 to port 8080, which is where mitmproxy is listening on:
shell$ sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 80 -j REDIRECT --to-port 8080
shell$ sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 443 -j REDIRECT --to-port 8080
  • Run mitmproxy:
shell$ mitmproxy -T --host

Setup Of Client Machine

The following things need to be configured on the client machine:

  • Configure the machine running mitmproxy (the proxy machine) as the default gateway. This will cause all SSL/TLS traffic going towards the server to be sent through the proxy machine, assuming that the server is on a different (remote) subnet.
  • Install the Certificate Authority certificate that the proxy machine will present to the client when the client establishes SSL/TLS sessions. mitmproxy really shines in this area, making the certificate installation a very seamless process. We will not repeat here the excellent documentation on how to do this. Instead, we will point readers to the documentation:
  • The default gateway of the client machine must obviously be the proxy machine. The easiest way to accomplish this is by configuring manually the TCP/IP settings of the client machine. If DHCP is used for IP configuration then the default gateway will be whatever the DHCP sends, which might be different from the IP address of the proxy machine. In that case the client machine can be forced to use the proxy machine as its default gateway by adding a new default route using a lower metric, for example: ip route add default via <IP address of proxy machine> metric 50.

Decrypting The SSL/TLS Session

Once the machine running the mitmproxy software (the “proxy” machine) and the machine running the SSL/TLS client (the “client” machine) are configured, we are ready to establish the SSL/TLS sessions that we want to decrypt — just open your browser and go to the https:// URL you are interested in examining, launch your SSL VPN client, etc.

The proxy machine will intercept the connection and do what it does well, i.e. pretend to the server to be the client, and pretend to the client to be the server, while decrypting traffic going in both directions.

mitmproxy provides a fantastic text-based user interface that allows the user to easily navigate through each SSL/TLS request and response going through the proxy.

The following screenshot (click on image for larger version) shows the main mitmproxy window, which lists all the captured flows:


The following screenshot (click on image for larger version) shows a particular flow, especifically the request part of the flow:


And finally, this screenshot (click on image for larger version) shows the server’s response to the previous request:


And that is it; there really isn’t anything to it. It took longer to read the mitmproxy documentation than to set things up and run the SSL/TLS session.

Final Thoughts

From the main mitmproxy window, all flows can be saved to a file for later analysis by pressing the ‘w’ (write) key, which will prompt if all flows must be saved or just the one at the cursor, and the name of the file to save the flows to.

Flows can be loaded later by running mitmproxy with the -r (read) switch.


Be aware of mitmproxy bug 659 ( This bug causes HTTP HEAD requests to return a Content-Length equal to zero instead of the correct value. This will cause some applications to fail as they will think there is nothing to download. This trip me up pretty good until I found the previously mentioned bug and I applied the fix that was committed to resolve the bug.


More GNOME/Ubuntu Unity System-wide Defaults

In this post we discussed how to set up system-wide defaults using GSettings schema overrides. This works great but we recently ran into a situation where this was not possible because the schema we wanted to modify was “relocatable”. Trying to modify such a schema without specifying a DConf path results in the following error:

$ gsettings set org.compiz.opengl sync-to-vblank true
Schema 'org.compiz.opengl' is relocatable (path must be specified)

The correct way to change a relocatable schema is by appending the path, as the error message above states. For example:

$ gsettings get org.compiz.opengl:/apps/compiz-1/plugins/opengl/screen0/options/sync_to_vblank/ sync-to-vblank

(Note the “:/apps/compiz-1/plugins/opengl/screen0/options/sync_to_vblank/ after the schema name; this is the path to the preference.)

The problem with this approach is that either by design or because it is a bug, it is not possible to write a schema override file that includes a DConf path. This Ubuntu bug seems to imply that this is a bug.

Another way to accomplish a system-wide default is by going to a lower level than GSettings and configuring DConf directly. It is probably better to use GSettings but in this particular case we had no option.

Here’s what we did:

First, we created the file /etc/dconf/profile/user with the following contents:


Next, we created the directory /etc/dconf/db/system-wide.d and the file /etc/dconf/db/system-wide.d/00_compiz_site_settings with the following contents:


We then ran the command “dconf update” (as root) which created the DConf database /etc/dconf/db/system-wide (a binary file).

This causes the “opengl” Compiz plugin preference “enable-x11-sync” to be set to “false” for all users in the system.


This blog post from Ross Burton has a good discussion on how to set system-wide settings using GSettings: It would have been a good reference to provide in my previous post but we missed it when we wrote that post.

The dconf System Administrator Guide is a fantastic reference to understand how to set system-wide defaults using DConf. One thing that was not clear to use after reading this document was that of DConf profile selection — the explanation above uses the file /etc/dconf/profiles/user because if no other DConf profile is selected (via the DCONF_PROFILE environment variable) then the profile called “user” is the one that is opened.

This post by Matt Fischer was extremely useful to understand how things work with DConf. Based on this post is that we realized we needed to use a profile called “user”.

Finally, this question has very good insight into the differences between DConf and GSettings.

IPv6 Automatic Configuration

The Information Technology folks at the place I work enabled IPv6 a few months ago. Things worked great for a while but I recently noticed that I was not able to reach the IPv6 Internet. A quick investigation showed that IT disabled IPv6 Stateless Address Autoconfiguration (SLAAC) and enabled DHCPv6:

router>sh ipv6 interface vlan 320 
Vlan320 is up, line protocol is up
  IPv6 is enabled, link-local address is FE80::208:E3FF:FEFF:FD90 
  No Virtual link-local address(es):
  Description: data320
  Global unicast address(es):
    20xx:xxx:xxx:xxx::1, subnet is 20xx:xxx:xxx:xxx::/64 
  Joined group address(es):
  MTU is 1500 bytes
  ICMP error messages limited to one every 100 milliseconds
  ICMP redirects are disabled
  ICMP unreachables are disabled
  Input features: Verify Unicast Reverse-Path
  Output features: MFIB Adjacency HW Shortcut Installation
  Post_Encap features: HW shortcut
 IPv6 verify source reachable-via any
   0 verification drop(s) (process), 0 (CEF)
   9 suppressed verification drop(s) (process), 9 (CEF)
  ND DAD is enabled, number of DAD attempts: 1
  ND reachable time is 30000 milliseconds (using 30000)
  ND advertised reachable time is 0 (unspecified)
  ND advertised retransmit interval is 0 (unspecified)
  ND router advertisements are sent every 200 seconds
  ND router advertisements live for 1800 seconds
  ND advertised default router preference is Medium
  Hosts use DHCP to obtain routable addresses.

Note the “Hosts use DHCP to obtain routable addresses” message — it used to be “Hosts use stateless autoconfig for addresses”.

I am using NetworkManager on Ubuntu 14.10 to manage my network configuration. The version of NetworkManager on Ubuntu 14.10 is The IPv6 configuration methods available for NetworkManager can be seen in the following screenshot:

When SLAAC was enabled, I had my network interface configured using the “Automatic” IPv6 configuration method. After IT switched to DHCPv6 this setting prevented my computer from getting an IPv6 address.

After switching to the “Automatic, DHCP only” method I was able to obtain an IPv6 address.

Unfortunately, it seems like the version of NetworkManager in Ubuntu 14.10 has a bug that prevents the installation of a default route (which is not obtained via DHCPv6 but via Neighbor Discovery Router Advertisement messages). The root cause of the bug seems to be that NetworkManager instructs the kernel to ignore Router Advertisement messages. It looks like this bug is fixed in NetworkManager versions and later, but I decided to just live in an IPv4 at work instead of trying to backport the fix to NetworkManager 0.9.8, or trying to build NetworkManager or later for Ubuntu 14.10.

Note: This blog post was helpful for me to understand what was happening:

ZoneMinder Hash Logins

ZoneMinder is a fantastic Linux video camera security and surveillance solution dosierung cialis. I have a ZoneMinder installation at home that I use to monitor a few IP-based video cameras.

My ZoneMinder installation uses the built-in authentication system, which means that only authenticated users can access the system.

One problem with using the built-in authentication system, however, is that it makes it hard to access video from the cameras from outside of ZoneMinder. For example, if I wanted to take a snapshot from a shell script of what one of the ZoneMinder cameras is currently recording, that would not be very easy to accomplish because the shell script would have to somehow log in first, establish a fake web browsing session (session cookie, etc.), and then finally request the snapshot.

Fortunately, ZoneMinder offers a way to accomplish this without too much hassle via a  feature called “hash logins”, which is enabled by setting the option ZM_AUTH_HASH_LOGINS (Options->System->ZM_AUTH_HASH_LOGINS).

The way to use this is by appending a ‘&auth=<login hash>’ parameter to the ZoneMinder URL one wants to access. For example, running the following command would retrieve a snapshot (in JPEG format) from the camera with monitor ID 8:

shell$ curl 'http://www.example.comt/cgi-bin/nph-zms?mode=single&monitor=8&auth=d8b45b3cf3b24407d09cbc16123f3549' -o /tmp/snapshot.jpg
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
 Dload  Upload   Total   Spent    Left  Speed
 100 22813  100 22813    0     0   9488      0  0:00:02  0:00:02 --:--:--  9485

The only complicated part here is knowing how to generate the hash used in the “auth” parameter. This does not seem to be documented anywhere, nor could I find any examples, but a quick read of the ZoneMinder code provided the necessary clues: The hash is calculated by the function getAuthUser() in the file includes/functions.php like this:

$time = localtime( $now );
$authKey = ZM_AUTH_HASH_SECRET.$user['Username'].$user['Password'].$remoteAddr.$time[2].$time[3].$time[4].$time[5];
$authHash = md5( $authKey );

ZM_AUTH_HASH_SECRET is a string chosen by the user via ZoneMinder options. $user[‘Username’] and $user[‘Password’] come from the ‘Users’ table in the ZoneMinder database — getAuthUser() will iterate over all existing users trying to find a match. $time contains the local time and $time[2], $time[3], $time[4], and $time[5] contain the current hour, day of the month, month and year, respectively. The code tries to find a match in the last two hours (in other words a login hash will found to be valid for up to two hours).

A simple test program to generate the hash looks like this:

 // Better not to use ZM_AUTH_HASH_IPS as this will break authentication
 // when client is behind NAT because the IP address of the client is used
 // to calculate the hash if ZM_AUTH_HASH_IPS is set.
 $time = localtime();
 $authKey = 'mykey' . 'myuser' . '*0945FE11CAC14C0A4A72A01234DD00388DE250EC' . $time[2] . $time[3] . $time[4] . $time[5];
 echo "\$authKey = $authKey\n";
 $authHash = md5($authKey);
 echo "\$authHash = $authHash\n";

Note that the hashed password for the user needs to be provided to the script (the user password is passed through the MySQL PASSWORD() function to finally obtain the password hash that is stored in the ZoneMinder database).

How to make practical use of all this? A script could generate the login hash and then access some part of ZoneMinder via an HTTP request. One could use this, for example, in a script run via cron every few minutes to take snapshots to produce a time-lapse video. This is left as an exercise for the reader.

Slow SSH logins/spun-down disk woken during SSH logins

For a while I have been troubleshooting why SSH logins into my Ubuntu server running 14.04 LTS are seemingly slow. I enabled SSH debugs (LogLevel set to DEBUG in /etc/ssh/{sshd_config,ssh_config}) on both the client and the server and did not find anything that pointed to an issue with the SSH negotiation/login itself. Recently I discovered that the 10-second delay in logging in had to do with SSH logins causing a hard disk that I keep spun down to be spun up. That takes about 10 seconds, which explains the delay.

Turns out that by default, console and SSH logins cause the scripts in /etc/update-motd.d/ to be executed. The script /etc/update-motd.d/98-fsck-at-reboot in particular is what causes disks to be spun up.

The scripts in /etc/update-motd-d/ are responsible for generating the file /run/motd.dynamic, which looks like this:

Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-37-generic x86_64)

 * Documentation:

  System information as of Thu Oct  9 10:05:21 EDT 2014

  System load:  0.37                Processes:           164
  Usage of /:   18.1% of 915.51GB   Users logged in:     1
  Memory usage: 55%                 IP address for eth0:
  Swap usage:   12%

  Graph this data and manage this system at:

That is pretty and provides some good information but for me I rather have a shell prompt immediately after I hit the <Enter> key after typing “ssh <my server>”. I then can manually execute some useful commands like “w” to see who is logged in, the uptime, etc.

I have disabled execution of the /etc/update-motd.d/ scripts by tweaking the files /etc/pam.d/{login,sshd} in the following way:

# Prints the message of the day upon succesful login.
# (Replaces the `MOTD_FILE' option in login.defs)
# This includes a dynamically generated part from /run/motd.dynamic
# and a static (admin-editable) part from /etc/motd.
#session    optional  motd=/run/motd.dynamic noupdate
#session    optional

# Disable display of /run/motd.dynamic and add "noupdate" so
# scripts in /etc/update-motd.d/* are not called.
# 20141009
session    optional noupdate

The first two commented out “session” lines are what were there by default. I commented them out and then added a new “session” line that has the “noupdate” keyword, which is actually what causes the module to not execute the scripts in /etc/update-motd.d/. This uncommented “session” line is what will display the standard /etc/motd file (if it exists).

Now a SSH login is fast, does not provide a lot of information upon log in (which I like), and, most important, spun down disks are not spun up upon log in:

$ ssh altamira
You have new mail.
Last login: Thu Oct  9 10:23:44 2014 from 2003:450:e6a5:100:a412:6z7a:897d:1387
altamira$ # That was a very fast login!

The following question is what pointed me in the right direction:

This is an issue that seems to have been introduced when I upgraded this server from Ubuntu 12.04 LTS to 14.04LTS. In particular, my Ubuntu 12.04LTS installation did not have the file /etc/update-motd.d/98-fsck-at-reboot.

Buffer Overflow in Embedded Microcontroller. Ouch.

We have been chasing down a mysterious problem in a embedded application we are developing. It is a sprinkler timer to water our yard. The microcontroller is an ATmega AVR 328p, the same microcontroller that some of the Arduino platforms use.

The application has an RFM12B wireless transceiver from Hope RF, and we are using this RFM12 library:

The issue is that the application works fine for days, sending and receiving data through the RFM12B wireless transceiver, but all of the sudden, it would stop working. Resetting the device restores operation.

The device is outside, in our yard, right next to a garden bib. It is not easy to troubleshoot this problem because all connectivity that we have to the device is via the RFM12B transceiver, so if it the wireless subsystem of the application is hosed, then there really is nothing we can do to troubleshoot. In addition, the wireless command set in in the firmware is currently very limited so we do not have good debugging tools even if the wireless subsystem is up. The firmware does send debugging information through a serial port on the microcontroller, though, so we have to use that in absence of other debugging tools.

The first step we took to debug the issue was to provide remote access to the microcontroller’s serial port. For this, we used a Raspberry Pi with an USB-to-serial adapter and with a WiFi USB adapter. We then left minicom (a serial communication program for Linux) running and came back every once in while, via remote SSH session, to check. Today we found an occurrence of the issue and investigated. This is what we found:

First, we noticed that the wireless subsystem was not working when we saw the following messages on the device’s console:

[2014-08-06 17:26:53] rfm12_tx() error 3
[2014-08-06 17:26:58] rfm12_tx() error 3
[2014-08-06 17:27:03] rfm12_tx() error 3
[2014-08-06 17:27:08] rfm12_tx() error 3

We then ran a diagnostic command on the console:

# show rfm12
[2014-08-06 17:29:11] My RFM12 node ID: 111.
[2014-08-06 17:29:11] RFM12 status register: 0x023f.
[2014-08-06 17:29:11] RFM12 state = 0
[2014-08-06 17:29:11] RFM12 tx state = 162
[2014-08-06 17:29:11] Number of bytes to transmit or receive = 219
[2014-08-06 17:29:11] Current byte count = 1
[2014-08-06 17:29:11] Total tx bytes = 40622
[2014-08-06 17:29:11] Total errors = 29983

A few things caught our attention:

  1. The RFM12 node ID should be 12 but the output says that it is 111.
  2. There is no such thing as RFM12 tx state 162 in the RFM12 library; it goes up to 3 or 4.
  3. The command prompt should be “frontlawn# “, but it is just “# “.

Obviously, some bad memory corruption has taken place here. Let’s see what we find…

The other piece of information we found in the minicom history buffer was this:

[2014-08-06 14:36:29] frontlawn# RFM12 rx packet: flags AD., hdr = 0x6c, len = 13.                                                
[2014-08-06 14:37:46] 00000000  3445 7365 7420 7661 6c20 6f20 33        4Eset val o 3                                             
[2014-08-06 14:37:46]                                                                                                             
[2014-08-06 14:38:27] RFM12 rx packet: flags AD., hdr = 0x6c, len = 14.                                                           
[2014-08-06 14:38:27] 00000000  3446 7365 7420 7661 6c20 636c 2033      4Fset val cl 3                                            
[2014-08-06 14:38:27]                                                                                                             
[2014-08-06 15:07:10] RFM12 rx packet: flags ..., hdr = 0x0f, len = 254.                                                          
[2014-08-06 15:07:11] 00000000  3445 7365 7420 7661 6c20 6f20 336e 2033 4Eset val o 3n 3                                          
[2014-08-06 15:07:11] 00000010  ffff ffff 3337 fd7f fc0f fd1c 201e 07ff ....37...... ...                                          
[2014-08-06 15:07:11] 00000020  07fb 03df 407c 741f 2dd4 040c f721 a302 ....@|t.-....!..                                          
[2014-08-06 15:07:11] 00000030  3529 0b05 080e 0200 0000 0000 0000 0000 5)..............                                          
[2014-08-06 15:07:11] 00000040  0000 0000 0000 0000 0000 0000 0000 0000 ................                                          
[2014-08-06 15:07:11] 00000050  0000 0000 0000 0000 0000 0100 0000 008a ................                                          
[2014-08-06 15:07:11] 00000060  e955 ae3c b9e0 c7f0 6f10 179c bd65 625f .U.<....o....eb_                                          
[2014-08-06 15:07:11] 00000070  3aea 998e 925c 1f43 152b 6b49 5e0b cca2 :....\.C.+kI^...                                          
[2014-08-06 15:07:11] 00000080  cdd5 313e ae9e 1f75 9a3d 1eea d964 bfee ..1>...u.=...d..                                          
[2014-08-06 15:07:11] 00000090  bffe 96dc 5df9 6b2c fdb7 8dd5 daab b8aa ....].k,........                                          
[2014-08-06 15:07:11] 000000a0  8fc5 6e42 8df9 9e7b 53bf 3cf6 fd19 7737 ..nB...{S.<...w7                                          
[2014-08-06 15:07:11] 000000b0  2767 f67c 975b 8f5a a7ea 6e63 bd39 2258 'g.|.["X                                          
[2014-08-06 15:07:11] 000000c0  756e a1bf 80fd 4b56 f0e3 e7fb bb28 ef93 un....KV.....(..                                          
[2014-08-06 15:07:11] 000000d0  e353 2308 50fe 49f6 7b4f 2300 b087 f1fa .S#.P.I.{O#.....                                          
[2014-08-06 15:07:12] 000000e0  9581 ff47 aba7 75a2 c0eb 91fa 6b7a 80e3 ...G..u.. cialis 20mg                                          
[2014-08-06 15:07:12] 000000f0  52e0 93c0 da66 bdfb 279b 8a07 af12      R....f..'.....                                            
[2014-08-06 15:07:12]                                                                                                             
[2014-08-06 15:07:12] Invalid valve number.                                                                                       
[2014-08-06 15:07:12] rfm12_tx() error 3                                                                                          
[2014-08-06 15:07:17] rfm12_tx() error 3

At 14:36:29 the controller received the command “set val o 3” (short for “set valve open 3”, which instructs the controller to open valve number 3).

At 14:38:27 the controller received the command “set val cl 3” (short for “set valve close 3”, which instructs the controller to close valve number 3).

We sent these commands and everything was good until this point.

However, at 15:07:10, we receive a giant packet of 254 bytes. That is the first sign of trouble — we only have 2048 bytes of RAM in an ATmega 328p microcontroller, so we have never configured such a large receive buffer in our application. We check the library source code and see that the size of the receive buffer is configurable and we have configured it to be 40 bytes.

Is it possible that the RFM12 library is buggy and is accepting such a large packet and storing it in a 40-byte buffer? No, we checked and it has the correct validations. This points to an issue in our code…

This is our code:

        if (rfm12_rx_status() == STATUS_COMPLETE) {
            pkt_hdr.__hdr_val = rfm12_rx_type();
            pkt_len = rfm12_rx_len();
            pkt_data = rfm12_rx_buffer();
            if (pkt_len == 0)
                goto pkt_processed;

             * Action to take on received packet depends on the payload
             * type, which is the first byte of the payload.
            if (pkt_data[0] == RFM12_PAYLOAD_CLICMD) {
                memcpy(cli_cmdbuf, pkt_data + 2, pkt_len - 2);
                cli_cmdbuf[pkt_len - 2] = '\0';

The first byte of the packet (pkt_data[0]) is 0x34, according to the packet dump above. 0x34 happens to be RFM12_PAYLOAD_CLICMD, so the condition of the “if ()” statement is true and we take that code path. The first instruction in that code path is a memcpy() that will copy pkt_len – 2 bytes, i.e. 254 – 2 = 252 bytes into the array cli_cmdbuf[].

cli_cmdbuf[] is a global variable declared as:

/* A command cannot be longer that this many characters. */
#define CLI_MAX_CMDLEN 80

char cli_cmdbuf[CLI_MAX_CMDLEN];

Ouch! Yes, our buffer has been overflowed with this memcpy(). By 252 – 80 = 172 bytes! This explains the memory corruption and the failure of the wireless subsystem of the application.

A couple of observations about this whole issue:

  1. The received packet that caused the entire problem should not have been received. We have no idea who sent it, or why it is so large, but the fact of the matter is that we received it. Yes, it obviously is bogus, but we cannot allow a stray packet to bring us down like that. The lesson re-learned here is that our application needs to be able to handle exceptional conditions such as this.
  2. Of course, we were not aware of such exceptional condition before. Otherwise we would have put a check in place to prevent the disastrous memcpy(). The main problem here is that we were acting under the assumption that the low level RFM12 driver would not pass to us a packet that has a received, advertised length that is larger than the configured size of the receive buffer — the low level driver checks for the available space in the receive buffer and does not store more data if there is no more space left. However, it notifies the application that there is a packet ready to be handled, even if that packet is incomplete because it did not fit in the receive buffer. We guess this behavior is debatable because the application might want to consume that packet even if it is not complete (for example, do some error handling, print a debug message, etc.).
  3. In the world of embedded applications, the importance of debugging using a serial port cannot be overstated — we would have had a very hard time to find this issue if we had not seen the packet dump of that 254-byte packet on the serial console.

In terms of addressing the issue — we are going to try to address the problem by checking in our application that the size of the received advertised length of the packet is less than the size of the receive buffer. If the size is larger then we will ignore the packet. We will see how this plays out…


Designing Printed Circuit Boards

Designing Printed Circuit Boards (PCBs) is not as hard as it seems. A couple of years ago I got into embedded electronics as a hobby and since then I have designed a couple of boards for simple projects. I used CadSoft’s EAGLE PCB design software. The learning curve was (is, because I think I am still learning) steep but there are good tutorials to be found on-line, including a tutorial from CadSoft itself.

Obviously, after designing a board you will want to manufacture it, either by yourself or by paying someone to do it cialis 10mg bestellen. In either case, catching errors in the design before the board is manufactured will prevent frustrations after you have the board in your hands and notice during testing that it does not work because of some problem like incorrect part footprints, incorrect board size, etc.

What I have done a couple of times to try to catch problems before it is too late has been to make a 1:1 copy of the board, glue it to some material that allows to stick real parts to it, and then lay out the parts on top, as shown in the following picture:

Nothing earth-shattering but it definitely has been useful and saved me a couple of times.

Upload non-Arduino firmware to Arduino-compatible board

The Atmel AVR family of microcontrollers is our favorite solution for embedded electronic projects. As you know, this family is what the popular Arduino platform uses cialis preis apotheke. However, for our projects we have never used the Arduino integrated development environment (IDE), or Arduino libraries. Instead, we have stuck with AVR libc mainly so we can have more control over our firmware and to try to have firmware with a smaller footprint in terms of both execution and image size.

Because we have not been using the Arduino IDE, we have always uploaded firmware to the AVR microcontroller using an in-system programmer (ISP) (the ISP programmer we use is the inexpensive USBtiny). This has been a requirement because our microcontrollers do not normally have a boot loader that allows them to be flashed via the serial port, unlike the Arduino, which does not expose the SPI through a connector that allows someone to hookup an ISP programmer, and therefore has to be programmed

Recently, however, we bought a Moteino, a very nice Arduino-compatible board, that has a boot loader that is compatible with the Arduino IDE. This required us to come up with a way to upload firmware to this board. We originally thought about soldering an adapter to use the AVR’s Serial Peripheral Interface (SPI), which uses the three signals Serial ClocK (SCK), Master In-Slave Out (MISO) and Master Out-Slave In (MOSI), and then use our USBtiny ISP programmer.

However, it turns out that is a lot easier than that — the AVRDUDE software can actually program Arduino-compatible boards directly, without using the SPI interface. We probably should have guessed this because, as we understand it, the Arduino IDE actually uses AVRDUDE to upload firmware to the target.

A sample AVRDUDE invocation looks like this:

shell$ avrdude -p atmega328p -c arduino -P /dev/ttyUSB2   -y -U flash:w:primus.hex

This is what the command-line arguments do:

  • -p atmega328p: specifies the type of target microcontroller.
  • -c arduino: specifies the programmer type. We understand that stk500v1 should work, and we have seen online references that seem to indicate that is what the Arduino IDE uses, but stk500v1 did not work for us. Instead, our AVRDUDE executable (from Ubuntu 13.10) does have an “arduino” programmer that worked very well.
  • -P /dev/ttyUSB2: this is the device to use to talk to the Arduino-compatible target.
  • -U flash:w:primus.hex: this is the name of the firmware file to upload to the Arduino-compatible target.

When we use our ISP programmer, one argument that we like to use is -y, which increments a 16-bit counter stored in the last two bytes of the AVR EEPROM (this counter has to be initialized with a one-time AVRDUDE invocation that uses the arguments -W <initial counter value>). We found out that the boot loader used by Arduino and Arduino-compatible boards (apparently Optiboot) does not handle AVR EEPROM access, so the use of -y when invoking AVRDUDE to upload flash to the Arduino-compatible board is not possible (and neither is initializing the counter with -Y, nor uploading EEPROM data using -U eeprom:w:data.hex).

But this is a small price to pay for the convenience of uploading our AVR libc firmware without having to use the Arduino IDE.