Reset Azure Virtual Netwok Gateway

No matter what I did I could not get an IPsec site-to-site tunnel going between an offsite test network and our Microsoft Azure virtual network. Our VPN gateway is a Cisco ASA 5506.

The issue was that the Cisco ASA would try to bring up the tunnel but some part of the negotiation would go wrong at some point. Debug messages on the Cisco ASA would show something like this:

Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, IKE SA Proposal # 1, Transform # 2 acceptable Matches global IKE entry # 5
 Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing ISAKMP SA payload
 Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing NAT-Traversal VID ver RFC payload
 Apr 19 09:21:41 [IKEv1 DEBUG]IP = 13.94.202.38, constructing Fragmentation VID + extended capabilities payload
 Apr 19 09:21:41 [IKEv1]IP = 13.94.202.38, IKE_DECODE SENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:21:42 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:43 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:46 [IKEv1]IP = 13.94.202.38, Duplicate first packet detected. Ignoring packet.
 Apr 19 09:21:49 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:21:57 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:22:05 [IKEv1]IP = 13.94.202.38, IKE_DECODE RESENDING Message (msgid=0) with payloads : HDR + SA (1) + VENDOR (13) + VENDOR (13) + NONE (0) total length : 128
 Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, IKE MM Responder FSM error history (struct &0x00002aaac1c34cf0) , : MM_DONE, EV_ERROR-->MM_WAIT_MSG3,
 EV_TIMEOUT-->MM_WAIT_MSG3, NullEvent-->MM_SND_MSG2, EV_SND_MSG-->MM_SND_MSG2, EV_START_TMR-->MM_SND_MSG2, EV_RESEND_MSG-->MM_WAIT_MSG3, EV_TIMEOUT-->MM_WAIT_MSG3, NullEvent
 Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, IKE SA MM:4af079c0 terminating: flags 0x01000002, refcnt 0, tuncnt 0
 Apr 19 09:22:13 [IKEv1 DEBUG]IP = 13.94.202.38, sending delete/delete with reason message
 Apr 19 09:23:06 [IKEv1]IP = 13.94.202.38, IKE_DECODE RECEIVED Message (msgid=6a9f34a4) with payloads : HDR + HASH (8) + DELETE (12) + NONE (0) total length : 68
 Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, processing hash payload
 Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, processing delete
 Apr 19 09:23:06 [IKEv1]Group = 13.94.202.38, IP = 13.94.202.38, Connection terminated for peer 13.94.202.38. Reason: Peer Terminate Remote Proxy 10.100.152.0, Local Proxy 10.50.0.0
 Apr 19 09:23:06 [IKEv1 DEBUG]Group = 13.94.202.38, IP = 13.94.202.38, Active unit receives a delete event for remote peer 13.94.202.38.

A couple of key points in the above debug messages:

  1. “MM_WAIT_MSG3, EV_TIMEOUT” indicates that the Cisco ASA timeout waiting for the Azure VPN gateway.
  2. “Duplicate first packet detected. Ignoring packet” indicates that the Azure VPN gateway is not liking the previous message that the Cisco ASA sends. Increasing the debug level (not shown above) indicates a mismatch in terms of cookies, and this is apparently what upsets the Azure Virtual Network Gateway.

This is shown on Cisco ASA debug messages at a higher debug level:

Azure
 InitiatorCookie: 03 83 AD 7C 10 26 CB D6
 ResponderCookie: 14 42 19 27 F6 F2 DF 53

RECV PACKET from 13.91.5.150
 ISAKMP Header
 Initiator COOKIE: 03 83 ad 7c 10 26 cb d6
 Responder COOKIE: 00 00 00 00 00 00 00 00

These are debug messages produced on the Microsoft Azure side:

2016?-?03?-?02 10:31:37 ERROR user NULL 0000000FE1E59D80 0000000FE1E64320 f74513382e60832f cac68571e57c06d5 Invalid cookies. Try resetting SAs on-prem. IkeProcessPacketDispatch failed with HRESULT 0x80073616(ERROR_IPSEC_IKE_INVALID_COOKIE)

(Note the “ERROR_IPSEC_IKE_INVALID_COOKIE” error code.)

After spending sometime troubleshooting the Cisco ASA side we could not find anything wrong with the Cisco ASA configuration.

In the end, in my desperation, I decided to reset the Azure Virtual
Network Gateway and that seems to have fixed the issue for good.

The process to reset an Azure Virtual Network Gateway is a bit tricky because there is no way to do that using the Azure Portal; it needs to be done using PowerShell instead.

This is what I did to reset the Azure Virtual Network Gateway using
PowerShell:

1. Install Azure PowerShell. I used the instructions here:

https://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/

In particular, I went with the leaner and perhaps more complicated
installation from the PowerShell Gallery (instead of installing from
WebPI).

2. After Azure PowerShell was installed, I opene a PowerShell command window and ran the following commands:

Login-AzureRmAccount
Select-AzureRmSubscription -SubscriptionName "<your subscription name>"
$vg = Get-AzureRmVirtualNetworkGateway -ResourceGroupName RG
Reset-AzureRmVirtualNetworkGateway -VirtualNetworkGateway $vg

Apparently, Azure Virtual Network Gateways are redundant Virtual Machines so resetting one will cause the other to take over.

The other one could be reset by invoking “Reset-AzureRmVirtualNetworkGateway” a few minutes after the first gateway has been reset but in my case the site-to-site VPN tunnel came up after resetting only one of the gateways.

Note that the above XXXXX-AzureRmXXXXX PowerShell cmdlets use the new
Azure Resource Manager deployment model. Similar commands would have to
be used if the classic deployment model is used instead.

This article:

https://azure.microsoft.com/en-us/documentation/articles/vpn-gateway-resetgw-classic/

is a good reference for how to reset Azure Virtual Network Gateways that have been deployed using the classic deployment model. Note that it says that the same cannot be done for the Resource Manager deployment model but I think the capability is there now (I used it) and it is just that the article has not been updated yet.

On a related note, I should mention that another way of dealing with this problem is by deploying a Cisco ASAv virtual appliance and using that to terminate the site-to-site IPsec tunnel instead of terminating it on the Microsoft-provided Azure Virtual Network Gateway. This of course would be more expensive given that licenses for the Cisco ASAv would have to be purchased, plus it is another Virtual Machine that would have to deployed (and pay for).

SSL Traffic Decryption

We recently had a need to inspect the contents of an HTTPS (SSL/TLS) connection. As we had never had the opportunity to set things up to facilitate decryption of SSL/TLS connection we had to do a little bit of research.

The way we approached this was by running the software that establishes the HTTPS connection we need to decrypt on a VirtualBox Virtual Machine (VM), and then running a Man-in-the-Middle (MitM) proxy on the VM host, which runs Ubuntu 15.04. The MitM proxy that we used was mitmproxy. No reason in particular for choosing mitmproxy other than it was the first solution that we tried, it was very well documented, and it worked on first try. We are very impressed with this little piece of software — its design is well thought, and the text-based user interface is very powerful.

This post documents the steps involved in setting things up for decryption of SSL sessions.

General

There are several possible network topologies to use. The one that we chose was one where the client machine and the proxy machine are on the same physical network. Because we are using VirtualBox, where the Virtual Machine is the client machine and the Virtual Machine host is a physical machine, we configured the network settings of the (client) Virtual Machine to use bridged networking. This is equivalent to having two different machines on the same physical network segment.

Note: Our proxy machine (not a Virtual Machine) only had a wi-fi network interface so the Virtual Machine, through bridged networking, was using this wi-fi network interface to reach the network.

Set Up Of Proxy Machine

Installation

Nothing much to do here, really, as there is an Ubuntu binary package for mitmproxy, so installation boils down to a simple “apt-get install mitmproxy”.

VirtualBox Settings

The network interface of the virtual machine must be configured in bridged mode. The VM host machine only needs one interface (for example, the wireless NIC “wlan0″). That interface will be used for both the VM host and the actual VM to have network connectivity. Make sure the VM NIC is configured to use the VM host NIC as the bridge interface.

Also, VirtualBox must be configured to allow promiscuous mode on the bridge interface. This is configured in the “Advanced” section of the network adapter properties (where the interface mode [bridged, NAT, etc.] is configured). “Allow VMs” for the “Promiscuous Mode” setting is appropriate.

Configuration

After installing the mitmproxy software, the following things must be done:

  • Enable IP forwarding, which is normally disabled by default:
shell$ sudo sh -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
  • Disable ICMP redirects:
shell$ sudo sh -c 'echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects'
shell$ sudo sh -c 'echo 0 > /proc/sys/net/ipv4/conf/wlan0/send_redirects'
  • Add iptables rules to redirect traffic going to destination TCP port 443 to port 8080, which is where mitmproxy is listening on:
shell$ sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 80 -j REDIRECT --to-port 8080
shell$ sudo iptables -t nat -A PREROUTING -i wlan0 -p tcp --dport 443 -j REDIRECT --to-port 8080
  • Run mitmproxy:
shell$ mitmproxy -T --host

Setup Of Client Machine

The following things need to be configured on the client machine:

  • Configure the machine running mitmproxy (the proxy machine) as the default gateway. This will cause all SSL/TLS traffic going towards the server to be sent through the proxy machine, assuming that the server is on a different (remote) subnet.
  • Install the Certificate Authority certificate that the proxy machine will present to the client when the client establishes SSL/TLS sessions. mitmproxy really shines in this area, making the certificate installation a very seamless process. We will not repeat here the excellent documentation on how to do this. Instead, we will point readers to the documentation: http://mitmproxy.org/doc/certinstall.html.
  • The default gateway of the client machine must obviously be the proxy machine. The easiest way to accomplish this is by configuring manually the TCP/IP settings of the client machine. If DHCP is used for IP configuration then the default gateway will be whatever the DHCP sends, which might be different from the IP address of the proxy machine. In that case the client machine can be forced to use the proxy machine as its default gateway by adding a new default route using a lower metric, for example: ip route add default via <IP address of proxy machine> metric 50.

Decrypting The SSL/TLS Session

Once the machine running the mitmproxy software (the “proxy” machine) and the machine running the SSL/TLS client (the “client” machine) are configured, we are ready to establish the SSL/TLS sessions that we want to decrypt — just open your browser and go to the https:// URL you are interested in examining, launch your SSL VPN client, etc.

The proxy machine will intercept the connection and do what it does well, i.e. pretend to the server to be the client, and pretend to the client to be the server, while decrypting traffic going in both directions.

mitmproxy provides a fantastic text-based user interface that allows the user to easily navigate through each SSL/TLS request and response going through the proxy.

The following screenshot (click on image for larger version) shows the main mitmproxy window, which lists all the captured flows:

mitmproxy1

The following screenshot (click on image for larger version) shows a particular flow, especifically the request part of the flow:

mitmproxy2

And finally, this screenshot (click on image for larger version) shows the server’s response to the previous request:

mitmproxy3

And that is it; there really isn’t anything to it. It took longer to read the mitmproxy documentation than to set things up and run the SSL/TLS session.

Final Thoughts

From the main mitmproxy window, all flows can be saved to a file for later analysis by pressing the ‘w’ (write) key, which will prompt if all flows must be saved or just the one at the cursor, and the name of the file to save the flows to.

Flows can be loaded later by running mitmproxy with the -r (read) switch.

 Caveats

Be aware of mitmproxy bug 659 (https://github.com/mitmproxy/mitmproxy/issues/659). This bug causes HTTP HEAD requests to return a Content-Length equal to zero instead of the correct value. This will cause some applications to fail as they will think there is nothing to download. This trip me up pretty good until I found the previously mentioned bug and I applied the fix that was committed to resolve the bug.

 

FreeRADIUS: Active Directory authentication and group check via winbind + rlm_unix, not LDAP

[This blog post is based on an email that I sent to the freeradius-users mailing list in September 2014.]

I have a pretty common requirement: authenticate wireless users against Active Directory and prevent SSID cross-connections, i.e. users in Active Directory group A can only connect to SSID A and users in Active Directory group B can only connect to SSID B.

I have seen plenty of messages in the freeradius-users mailing list archives about how to accomplish this. The authentication part is easy and has excellent documentation, e.g.

http://deployingradius.com/documents/configuration/active_directory.html

The group checking part is also well understood and documented (rlm_ldap), and whoever asks apparently gets told to use LDAP. At least I could not find anything on the alternate approach described here.

I decided to look into a different approach, which does not involve LDAP — because the machine already has winbind running (for ntlm_auth), why not to use the Name Service Switch and rlm_unix to check for group membership and avoid usind LDAP altogether? Group membership information is already available if one has added “winbind” to “passwd” and “group” in /etc/nsswitch.conf.

I apparently got this to work and wanted to share the solution in case someone finds it useful:

First, I configured winbind for ntlm_auth use by FreeRADIUS, as explained elsewhere. Then, I verified that the system can see Active Directory users and groups as if they were Unix users and groups:

shell$ id DOMAIN\\username
uid=10017(DOMAIN\username) gid=10002(DOMAIN\domain users) groups=10002(DOMAIN\domain users),10024(DOMAIN\computerlab-monitoring),10008(DOMAIN\domain admins),10034(DOMAIN\vdi teachers),10010(DOMAIN\teachers),10026(DOMAIN\vdi users),10011(DOMAIN\teacher assistants),10074(DOMAIN\schema admins)

Then, I put this logic, which is similar to what one would normally use if LDAP and Ldap-Groups were in use, in the post-authentication section of my sites-enabled/default:

if (NAS-Port-Type == Wireless-802.11) {
  if (Called-Station-Id =~ /.*:SSID-A/i) {
    # Can't do 'if (Group != "xxxxx")' because !=
    # operator doesn't work for group checking. Careful
    # with the number of backslashes.
    if (!(Group == "DOMAIN\\\\group A") ) {
      update reply {
        Reply-Message = "User not allowed to join this wireless network"
      }
      reject
    }
  }
  elsif (Called-Station-Id =~ /.*:SSID-B/i) {
    if (!(Group == "DOMAIN\\\\group B") ) {
      update reply {
        Reply-Message = "User not allowed to join this wireless network"
      }
      reject
    }
  }
}

This works if the EAP identity is “DOMAIN\username”. However, I did not want to make things unnecessarily complicated for my users and wanted them to be able to enter just “username” when they configure their devices.

The main issue to address, however, is that if the identity is entered as “username” (not “DOMAIN\username”), the group check will fail because the Unix user ID for users that are known to the system via winbind is “DOMAIN\username”, not “username” (see output from the “id” command above).

“Not a problem”, I thought, “I’ll just manipulate User-Name before anything happens and prefix it with “DOMAIN\”. Turns out that was a bad idea because that made User-Name different than the EAP identity hidden in the EAP message, which caused the “rlm_eap: identity does not match User-Name, setting from EAP identity” message that has bitten so many people before.

The solution I came up with was to still manipulate the User-Name but towards the end, in the post-auth section, instead of at the beginning.  This way EAP uses the right identity, but the rlm_unix Group check uses the correct “DOMAIN\username” User-Name.

The final configuration looks like this:

if (NAS-Port-Type == Wireless-802.11) {
  # If User-Name doesn't contain our domain then add it.
  # It's needed for the Group check to use the correct
  # username.
  if (User-Name !~ /DOMAIN\\\\/i) {
    update request {
      User-Name := "DOMAIN\\\\\\\\%{User-Name}"
    }
  }

  if (Called-Station-Id =~ /.*:SSID-A/i) {
    # Can't do 'if (Group != "xxxxx")' because !=
    # operator doesn't work for group checking. Careful
    # with the number of backslashes.
    if (!(Group == "DOMAIN\\\\group A") ) {
      update reply {
        Reply-Message = "User not allowed to join this wireless network"
      }
      reject
    }
  }
  elsif (Called-Station-Id =~ /.*:SSID-B/i) {
    if (!(Group == "DOMAIN\\\\group B") ) {
      update reply {
        Reply-Message = "User not allowed to join this wireless network"
      }
      reject
    }
  }
}

I do not know about the performance impact but this should not require additional network traffic to check for group membership because winbind caches this information.

Another possible advantage is redundancy: Using winbindd (I theorize, I am not sure about this) provides redundancy because the group membership comes from the domain controller, which is found using DNS lookups — if a controller goes down then another (hopefully) takes its place and winbindd will be able to find it with no configuration changes.

This approach seemed simple to me — no additional configuration other than manipulating the User-Name in the post-authentication phase.

Things can also be made to work if the user chooses to configure the supplicant with “DOMAIN\user” as the identity — in this case one needs to configure “with_ntdomain_hack = yes” in modules/mschap, create an empty “DOMAIN” realm in proxy.conf, and enable the ntdomain realm in the authorize section of sites-enabled/inner-tunnel. Doing this will allow both “DOMAIN\user” and just “user” to work.

I have this being used in production on a small network (less than 50 users) and have not encountered any show stoppers yet. If you read the freeradius-users thread referenced at the top of this post you will notice that modifying User-Name is discouraged. I was told to use Stripped-User-Name instead but I never had a chance to go back and look into it.  Caveat emptor.

Update May 22, 20016: I had to set up another Linux server for winbindd. I ran into a couple of minor problems that took me some time to troubleshoot and fix. First, it seems like the Ubuntu 14.04 upstart script configures the wrong winbindd privileged directory — instead of /var/lib/samba/winbindd_privileged/ it configures /var/run/samba/winbindd_privileged/. The fix is to edit the upstart job configuration file (/etc/init/winbind.conf) and change it to use the correct directory path. Second, “wbinfo -a user%password” will not work when testing things (as suggested at the deployingradius.com page above) unless the user running wbinfo is a member of the group “winbindd_priv”. And third, running “radtest -t mschap paris mG2eudPas 127.0.0.1:18120 0 testing123″, also suggested at the deployingradius.com page above, will not work unless the user the FreeRADIUS daemon run as is a member of the group “winbindd_priv” — this is also required for correct FreeRADIUS operation. So, it is minor issues but they can be a hassle to troubleshoot.

Cisco ASA, “show service-policy”, and SNMP

Recently, a co-worker asked if it is possible to obtain the counters shown in the output from the command “show service-policy” on a Cisco ASA. I did not know the answer so I had to do a little bit of digging…

The list of MIBs supported by the Cisco ASA is documented here.

Based on a quick reading of that document, it seemed like CISCO-UNIFIED-FIREWALL-MIB could have provided this information, *if* it had been completely implemented. However, there is a documented caveat for CISCO-UNIFIED-FIREWALL-MIB at the above page:

“Limited support for objects under cuFwConnectionGrp and cuFwUrlFilterGrp.”

And an snmpwalk confirmed that the information is not there:

paris@bethlehem[1]:~$ snmpwalk -m CISCO-UNIFIED-FIREWALL-MIB -Os -v2c -c ****** 1.2.3.4 ciscoUnifiedFirewallMIB
cufwConnGlobalNumResDeclined.0 = Counter64: 0 Connections
cufwConnGlobalNumActive.0 = Gauge32: 168 Connections
cufwConnGlobalConnSetupRate1.0 = Gauge32: 2 Connections per second
cufwConnGlobalConnSetupRate5.0 = Gauge32: 0 Connections per second
cufwConnSetupRate1.udp = Gauge32: 1 Connections Per Second
cufwConnSetupRate1.tcp = Gauge32: 0 Connections Per Second
cufwConnSetupRate5.udp = Gauge32: 0 Connections Per Second
cufwConnSetupRate5.tcp = Gauge32: 0 Connections Per Second
cufwUrlfRequestsNumProcessed.0 = Counter64: 0 Requests
cufwUrlfRequestsProcRate1.0 = Gauge32: 0 Requests per second
cufwUrlfRequestsProcRate5.0 = Gauge32: 0 Requests per second
cufwUrlfRequestsNumAllowed.0 = Counter64: 0 Requests
cufwUrlfRequestsNumDenied.0 = Counter64: 0 Requests
cufwUrlfRequestsDeniedRate1.0 = Gauge32: 0 Requests per second
cufwUrlfRequestsDeniedRate5.0 = Gauge32: 0 Requests Per Second
cufwUrlfRequestsNumCacheAllowed.0 = Counter64: 0 Requests
cufwUrlfRequestsNumCacheDenied.0 = Counter64: 0 Requests
cufwUrlfRequestsNumResDropped.0 = Counter64: 0 Requests
cufwUrlfRequestsResDropRate1.0 = Gauge32: 0 Requests Per Second
cufwUrlfRequestsResDropRate5.0 = Gauge32: 0 Requests Per Second
cufwUrlfNumServerTimeouts.0 = Counter64: 0
cufwUrlfNumServerRetries.0 = Counter64: 0
paris@bethlehem[1]:~$

That does not mean that another MIB cannot provide the information we are looking for. However, the “sh snmp-server oidlist” command doesn’t show any promising OIDs so it seems like we are out of luck.

Useful References

https://supportforums.cisco.com/document/7336/snmp-mibs-and-traps-asa-additional-information

Cisco 4500X, Flexible NetFlow, NFDUMP, and NfSen

We recently needed to gain extra visibility into network traffic going through a Cisco 4500X Layer 3 switch. We basically wanted to know things like top downloaders, top protocols, etc. For this task we chose to use Cisco Flexible NetFlow on the Cisco 4500X switch and, because we needed to be frugal (i.e. we had no money to spend on a commercial solution), we configured the Cisco 4500X to export flows to a machine running the open source NFDUMP suite and used NfSen (also open source) as the frontend.

Cisco 4500X Flexible NetFlow Configuration

This part is very well documented. The Catalyst 4500 Series Switch Software Configuration Guide for releases IOS XE 3.4.xSG and IOS 15.1(2)SGx has a chapter dedicated to configuration of Flexible NetFlow.

This blog post also has a great summary of the tasks involved.

Our specific Flexible NetFlow configuration is:

flow record NF-RECORD
 match ipv4 protocol
 match ipv4 source address
 match ipv4 destination address
 match transport source-port
 match transport destination-port
 collect interface input
 collect interface output
 collect counter bytes
 collect counter packets
!
flow exporter MYCOLLECTOR
 destination 172.16.123.123
!
flow monitor NF-MONITOR
 record NF-RECORD
 exporter MYCOLLECTOR
 cache timeout active 300
!
vlan configuration 199
 ip flow monitor NF-MONITOR input

A couple of points/caveats to keep in mind:

  1. The Cisco 4500X does not support NetFlow for egress traffic; only for ingress traffic.
  2. It is not possible to apply a NetFlow monitor to an SVI (VLAN interface). However, the monitor can be applied to an entire VLAN, as shown above.

NFDUMP Installation

The NetFlow collector is a machine running Ubuntu Linux. Because there is an NFDUMP package in the distribution, installing the NFDUMP suite is as simple as running “apt-get install nfdump” as root.

NfSen Installation

Unfortunately, there is no binary package for NfSen on Ubuntu, so it has to be installed from source. Fortunately, the installation process is very easy.

This blog post was helpful for getting an idea of the tasks involved. Editing the file /etc/nfsen.conf, which the is used by the install.pl script to install NfSen, is one of the most important tasks. The differences between the sample nfsen.conf that is shipped in the NfSen tarball and the nfsen that we ended up using are as follows:

--- /usr/src/nfsen-1.3.6p1/etc/nfsen-dist.conf    2012-01-14 05:13:53.000000000 -0500
+++ /etc/nfsen.conf    2015-03-04 11:17:21.347058193 -0500
@@ -18,7 +18,7 @@

 #
 # Required for default layout
-$BASEDIR = "/data/nfsen";
+$BASEDIR = "/opt/nfsen";

 #
 # Where to install the NfSen binaries
@@ -36,7 +36,7 @@
 # NfSen html pages directory:
 # All php scripts will be installed here.
 # URL: Entry point for nfsen: http://<webserver>/nfsen/nfsen.php
-$HTMLDIR    = "/var/www/nfsen/";
+$HTMLDIR    = "/srv/www/server.example.com/nfsen/";

 #
 # Where to install the docs
@@ -76,7 +76,7 @@

 #
 # nfdump tools path
-$PREFIX  = '/usr/local/bin';
+$PREFIX  = '/usr/bin';

 #
 # nfsend communication socket
@@ -88,12 +88,12 @@
 # This may be a different or the same uid than your web server.
 # Note: This user must be in group $WWWGROUP, otherwise nfcapd
 #       is not able to write data files!
-$USER    = "netflow";
+$USER    = "www-data";

 # user and group of the web server process
 # All netflow processing will be done with this user
-$WWWUSER  = "www";
-$WWWGROUP = "www";
+$WWWUSER  = "www-data";
+$WWWGROUP = "www-data";

 # Receive buffer size for nfcapd - see man page nfcapd(1)
 $BUFFLEN = 200000;
@@ -160,9 +160,9 @@
 # Ident strings must be 1 to 19 characters long only, containing characters [a-zA-Z0-9_].

 %sources = (
-    'upstream1'    => { 'port' => '9995', 'col' => '#0000ff', 'type' => 'netflow' },
-    'peer1'        => { 'port' => '9996', 'IP' => '172.16.17.18' },
-    'peer2'        => { 'port' => '9996', 'IP' => '172.16.17.19' },
+    'core_1'    => { 'port' => '9995', 'col' => '#0000ff', 'type' => 'netflow' },
+#    'peer1'        => { 'port' => '9996', 'IP' => '172.16.17.18' },
+#    'peer2'        => { 'port' => '9996', 'IP' => '172.16.17.19' },
 );

 #
@@ -233,7 +233,7 @@
 #
 # Alert module: email alerting:
 # Use this from address 
-$MAIL_FROM   = 'your@from.example.net';
+$MAIL_FROM   = 'root@server.example.com';

 # Use this SMTP server
 $SMTP_SERVER = 'localhost';

To keep things tidy, we chose to install everything under /opt/nfsen, which can be seen above in the setting of $BASEDIR. Also, we chose to use the www-data user and www-data group, as these are used by Apache on Ubuntu.

After /etc/nfsen.conf has been edited, the install.pl script is run to finally install everything. The output from the install script in our system was this:

user@server:/usr/src/nfsen-1.3.6p1$ sudo ./install.pl /etc/nfsen.conf
Check for required Perl modules: All modules found.
Setup NfSen:
Version: 1.3.6p1: $Id: install.pl 53 2012-01-23 16:36:02Z peter $

Perl to use: [/usr/bin/perl] 
Found /usr/bin/nfdump: Version: 1.6.8p1 $Date: 2012-11-10 12:40:54 +0100 (Sat, 10 Nov 2012) $
Setup php and html files.
mkdir /srv/www/holyland.cghs.crusaders/nfsen/

Copy NfSen dirs etc bin libexec plugins doc ...
Copy config file '/etc/nfsen.conf'

In directory: /opt/nfsen/bin ...
Update script: nfsen
Update script: nfsend
Update script: RebuildHierarchy.pl
Update script: testPlugin
In directory: /opt/nfsen/libexec ...
Update script: AbuseWhois.pm
Update script: Log.pm
Update script: Lookup.pm
Update script: NfAlert.pm
Update script: Nfcomm.pm
Update script: NfConf.pm
Update script: NfProfile.pm
Update script: NfSen.pm
Update script: NfSenRC.pm
Update script: NfSenRRD.pm
Update script: NfSenSim.pm
Update script: Nfsources.pm
Update script: Nfsync.pm
Update script: Notification.pm

Cleanup old files ...

Setup diretories:

Use UID/GID 33 33
Creating: mkdir /opt/nfsen/var
/opt/nfsen/var
Creating: mkdir /opt/nfsen/var/tmp
/opt/nfsen/var/tmp
Creating: mkdir /opt/nfsen/var/run
/opt/nfsen/var/run
Creating: mkdir /opt/nfsen/var/filters
/opt/nfsen/var/filters
Creating: mkdir /opt/nfsen/var/fmt
/opt/nfsen/var/fmt
Creating: mkdir /opt/nfsen/profiles-stat
/opt/nfsen/profiles-stat
Creating: mkdir /opt/nfsen/profiles-stat/live
/opt/nfsen/profiles-stat/live
Creating: mkdir /opt/nfsen/profiles-data
/opt/nfsen/profiles-data
Creating: mkdir /opt/nfsen/profiles-data/live
/opt/nfsen/profiles-data/live

Profile live: spool directories:
Creating: mkdir /opt/nfsen/profiles-data/live/core_1
core_1
Rename gif RRDfiles ... done.
Create profile info for profile 'live'

Rebuilding profile stats for './live'
Reconfig: No changes found!
Setup done.

* You may want to subscribe to the nfsen-discuss mailing list:
* http://lists.sourceforge.net/lists/listinfo/nfsen-discuss
* Please send bug reports back to me: phaag@sourceforge.net

Finally, we ran the following commands to allow NfSen to start when the machine boots:

ln -s /opt/nfsen/bin/nfsen /etc/init.d/nfsen
update-rc.d nfsen defaults 20
service nfsen start

And that was it. Now the NfSen console can be accessed via http://server.example.com/nfsen.

More GNOME/Ubuntu Unity System-wide Defaults

In this post we discussed how to set up system-wide defaults using GSettings schema overrides. This works great but we recently ran into a situation where this was not possible because the schema we wanted to modify was “relocatable”. Trying to modify such a schema without specifying a DConf path results in the following error:

$ gsettings set org.compiz.opengl sync-to-vblank true
Schema 'org.compiz.opengl' is relocatable (path must be specified)

The correct way to change a relocatable schema is by appending the path, as the error message above states. For example:

$ gsettings get org.compiz.opengl:/apps/compiz-1/plugins/opengl/screen0/options/sync_to_vblank/ sync-to-vblank
true

(Note the “:/apps/compiz-1/plugins/opengl/screen0/options/sync_to_vblank/ after the schema name; this is the path to the preference.)

The problem with this approach is that either by design or because it is a bug, it is not possible to write a schema override file that includes a DConf path. This Ubuntu bug seems to imply that this is a bug.

Another way to accomplish a system-wide default is by going to a lower level than GSettings and configuring DConf directly. It is probably better to use GSettings but in this particular case we had no option.

Here’s what we did:

First, we created the file /etc/dconf/profile/user with the following contents:

user-db:user
system-db:system-wide

Next, we created the directory /etc/dconf/db/system-wide.d and the file /etc/dconf/db/system-wide.d/00_compiz_site_settings with the following contents:

[org/compiz/profiles/unity/plugins/opengl]
enable-x11-sync=false

We then ran the command “dconf update” (as root) which created the DConf database /etc/dconf/db/system-wide (a binary file).

This causes the “opengl” Compiz plugin preference “enable-x11-sync” to be set to “false” for all users in the system.

References

This blog post from Ross Burton has a good discussion on how to set system-wide settings using GSettings: http://www.burtonini.com/blog/computers/gsettings-override-2011-07-04-15-45. It would have been a good reference to provide in my previous post but we missed it when we wrote that post.

The dconf System Administrator Guide is a fantastic reference to understand how to set system-wide defaults using DConf. One thing that was not clear to use after reading this document was that of DConf profile selection — the explanation above uses the file /etc/dconf/profiles/user because if no other DConf profile is selected (via the DCONF_PROFILE environment variable) then the profile called “user” is the one that is opened.

This post by Matt Fischer was extremely useful to understand how things work with DConf. Based on this post is that we realized we needed to use a profile called “user”.

Finally, this askubuntu.com question has very good insight into the differences between DConf and GSettings.

IPv6 Automatic Configuration

The Information Technology folks at the place I work enabled IPv6 a few months ago. Things worked great for a while but I recently noticed that I was not able to reach the IPv6 Internet. A quick investigation showed that IT disabled IPv6 Stateless Address Autoconfiguration (SLAAC) and enabled DHCPv6:

router>sh ipv6 interface vlan 320 
Vlan320 is up, line protocol is up
  IPv6 is enabled, link-local address is FE80::208:E3FF:FEFF:FD90 
  No Virtual link-local address(es):
  Description: data320
  Global unicast address(es):
    20xx:xxx:xxx:xxx::1, subnet is 20xx:xxx:xxx:xxx::/64 
  Joined group address(es):
    FF02::1
    FF02::2
    FF02::A
    FF02::D
    FF02::16
    FF02::FB
    FF02::1:2
    FF02::1:FF00:1
    FF02::1:FFFF:FD90
  MTU is 1500 bytes
  ICMP error messages limited to one every 100 milliseconds
  ICMP redirects are disabled
  ICMP unreachables are disabled
  Input features: Verify Unicast Reverse-Path
  Output features: MFIB Adjacency HW Shortcut Installation
  Post_Encap features: HW shortcut
 IPv6 verify source reachable-via any
   0 verification drop(s) (process), 0 (CEF)
   9 suppressed verification drop(s) (process), 9 (CEF)
  ND DAD is enabled, number of DAD attempts: 1
  ND reachable time is 30000 milliseconds (using 30000)
  ND advertised reachable time is 0 (unspecified)
  ND advertised retransmit interval is 0 (unspecified)
  ND router advertisements are sent every 200 seconds
  ND router advertisements live for 1800 seconds
  ND advertised default router preference is Medium
  Hosts use DHCP to obtain routable addresses.
router>

Note the “Hosts use DHCP to obtain routable addresses” message — it used to be “Hosts use stateless autoconfig for addresses”.

I am using NetworkManager on Ubuntu 14.10 to manage my network configuration. The version of NetworkManager on Ubuntu 14.10 is 0.9.8.8-0ubuntu28. The IPv6 configuration methods available for NetworkManager can be seen in the following screenshot:

When SLAAC was enabled, I had my network interface configured using the “Automatic” IPv6 configuration method. After IT switched to DHCPv6 this setting prevented my computer from getting an IPv6 address.

After switching to the “Automatic, DHCP only” method I was able to obtain an IPv6 address.

Unfortunately, it seems like the version of NetworkManager in Ubuntu 14.10 has a bug that prevents the installation of a default route (which is not obtained via DHCPv6 but via Neighbor Discovery Router Advertisement messages). The root cause of the bug seems to be that NetworkManager instructs the kernel to ignore Router Advertisement messages. It looks like this bug is fixed in NetworkManager versions 0.9.10.0 and later, but I decided to just live in an IPv4 at work instead of trying to backport the fix to NetworkManager 0.9.8, or trying to build NetworkManager 0.9.10.0 or later for Ubuntu 14.10.

Note: This blog post was helpful for me to understand what was happening: http://mor-pah.net/2012/11/06/cisco-ios-disabling-ipv6-stateless-autoconfig/.

Disable User Switching on Ubuntu 14.04

Recently, one of our home computers running Ubuntu 14.04 has been experiencing what seems to be a bug that is triggered when multiple users log in concurrently using the “fast user switching” feature. The symptom of the bug is a black screen after trying to switch back to an existing user session. Searching the Web seems to indicate that this is a common problem but it is not clear to me where (what component) the bug lies. This Ubuntu bug seems to match the problem but it has been attributed to a bug in the Xorg Intel graphics driver: .

In any case, my goal with this post is not to look into the bug itself but into a simple way to try to prevent it — because the bug is apparently triggered by switching users I decided to look into what it takes to disable the fast user switching feature:

In modern Ubuntu/GNOME (GNOME >= 3.0), fast user switching is a system-wide option that is controlled by a pair of GSettings options (they used to be GConf options but moved to GSettings when the GNOME project deprecated GConf in favor of GSettings):

shell$ gsettings list-recursively | grep user-switch
org.gnome.desktop.lockdown disable-user-switching false
org.gnome.desktop.screensaver user-switch-enabled true

These options are part of XML schema files stored under the /usr/share/glib-2.0/schemas/ directory:

shell$ grep user-switch /usr/share/glib-2.0/schemas/*xml
/usr/share/glib-2.0/schemas/org.gnome.desktop.lockdown.gschema.xml:    <key type="b" name="disable-user-switching">
/usr/share/glib-2.0/schemas/org.gnome.desktop.screensaver.gschema.xml:    <key type="b" name="user-switch-enabled">

To change the default, which is user switching enabled, one has to use the correct method to override GSettings defaults, which is to create a .gschema.override file, and re-compile the schemas.

To override the above two GSettings options I created /usr/share/glib-2.0/schemas/org.gnome.desktop.lockdown.gschema.override and /usr/share/glib-2.0/schemas/org.gnome.desktop.screensaver.gschema.override with the following contents:

shell$ cat /usr/share/glib-2.0/schemas/org.gnome.desktop.lockdown.gschema.override
[org.gnome.desktop.lockdown]
disable-user-switching=true

and

shell$ cat /usr/share/glib-2.0/schemas/org.gnome.desktop.screensaver.gschema.override
[org.gnome.desktop.screensaver]
user-switch-enabled=false

Then the schemas need to be re-compiled:

shell$ sudo glib-compile-schemas /usr/share/glib-2.0/schemas/

This refreshes the file /usr/share/glib-2.0/schemas/gschemas.compiled so the overridden GSettings that we configured are used. Logging out and restarting the lightdm service (sudo service lightdm restart) is required so the new configuration takes effect.

There is a wrinkle in all this, however — the session indicator (provided by the package indicator-session) in Ubuntu 14.04 (and probably some previous releases) does not take into consideration the above GSettings options so user switching is always available regardless of configuration. This is tracked in Ubuntu bug #1325353, which is fixed in Ubuntu 14.10. I did not want to upgrade my machine from 14.04 to 14.10 so I just downloaded the latest version of the indicator-session package (from what will become Ubuntu 15.04) and installed that (there were no package dependency problems).

I thought I would write about my experience disabling user switching because it requires overriding GSettings preferences via .gschema.override files and gschema recompiles, which is something I keep forgetting how to do.

As to whether disabling user switching fixes the black screen problem… only time will tell — users cannot use the computer while someone else is logged in. Instead, the user logged in must log out and then the new user has to log in via the LightDM greeter.

Note: The following  askubuntu.com question was very useful to figure out how to change GSettings defaults:

http://askubuntu.com/questions/65900/how-can-i-change-default-settings-for-new-users

ZoneMinder Hash Logins

ZoneMinder is a fantastic Linux video camera security and surveillance solution. I have a ZoneMinder installation at home that I use to monitor a few IP-based video cameras.

My ZoneMinder installation uses the built-in authentication system, which means that only authenticated users can access the system.

One problem with using the built-in authentication system, however, is that it makes it hard to access video from the cameras from outside of ZoneMinder. For example, if I wanted to take a snapshot from a shell script of what one of the ZoneMinder cameras is currently recording, that would not be very easy to accomplish because the shell script would have to somehow log in first, establish a fake web browsing session (session cookie, etc.), and then finally request the snapshot.

Fortunately, ZoneMinder offers a way to accomplish this without too much hassle via a  feature called “hash logins”, which is enabled by setting the option ZM_AUTH_HASH_LOGINS (Options->System->ZM_AUTH_HASH_LOGINS).

The way to use this is by appending a ‘&auth=<login hash>’ parameter to the ZoneMinder URL one wants to access. For example, running the following command would retrieve a snapshot (in JPEG format) from the camera with monitor ID 8:

shell$ curl 'http://www.example.comt/cgi-bin/nph-zms?mode=single&monitor=8&auth=d8b45b3cf3b24407d09cbc16123f3549' -o /tmp/snapshot.jpg
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
 Dload  Upload   Total   Spent    Left  Speed
 100 22813  100 22813    0     0   9488      0  0:00:02  0:00:02 --:--:--  9485

The only complicated part here is knowing how to generate the hash used in the “auth” parameter. This does not seem to be documented anywhere, nor could I find any examples, but a quick read of the ZoneMinder code provided the necessary clues: The hash is calculated by the function getAuthUser() in the file includes/functions.php like this:

$time = localtime( $now );
$authKey = ZM_AUTH_HASH_SECRET.$user['Username'].$user['Password'].$remoteAddr.$time[2].$time[3].$time[4].$time[5];
$authHash = md5( $authKey );

ZM_AUTH_HASH_SECRET is a string chosen by the user via ZoneMinder options. $user['Username'] and $user['Password'] come from the ‘Users’ table in the ZoneMinder database — getAuthUser() will iterate over all existing users trying to find a match. $time contains the local time and $time[2], $time[3], $time[4], and $time[5] contain the current hour, day of the month, month and year, respectively. The code tries to find a match in the last two hours (in other words a login hash will found to be valid for up to two hours).

A simple test program to generate the hash looks like this:

#!/usr/bin/php
 <?php
 //
 // Needs ZM_OPT_USE_AUTH and ZM_AUTH_HASH_SECRET and ZM_AUTH_HASH_LOGINS.
 // Better not to use ZM_AUTH_HASH_IPS as this will break authentication
 // when client is behind NAT because the IP address of the client is used
 // to calculate the hash if ZM_AUTH_HASH_IPS is set.
 //
 $time = localtime();
 $authKey = 'mykey' . 'myuser' . '*0945FE11CAC14C0A4A72A01234DD00388DE250EC' . $time[2] . $time[3] . $time[4] . $time[5];
 echo "\$authKey = $authKey\n";
 $authHash = md5($authKey);
 echo "\$authHash = $authHash\n";
 ?>

Note that the hashed password for the user needs to be provided to the script (the user password is passed through the MySQL PASSWORD() function to finally obtain the password hash that is stored in the ZoneMinder database).

How to make practical use of all this? A script could generate the login hash and then access some part of ZoneMinder via an HTTP request. One could use this, for example, in a script run via cron every few minutes to take snapshots to produce a time-lapse video. This is left as an exercise for the reader.

Slow SSH logins/spun-down disk woken during SSH logins

For a while I have been troubleshooting why SSH logins into my Ubuntu server running 14.04 LTS are seemingly slow. I enabled SSH debugs (LogLevel set to DEBUG in /etc/ssh/{sshd_config,ssh_config}) on both the client and the server and did not find anything that pointed to an issue with the SSH negotiation/login itself. Recently I discovered that the 10-second delay in logging in had to do with SSH logins causing a hard disk that I keep spun down to be spun up. That takes about 10 seconds, which explains the delay.

Turns out that by default, console and SSH logins cause the scripts in /etc/update-motd.d/ to be executed. The script /etc/update-motd.d/98-fsck-at-reboot in particular is what causes disks to be spun up.

The scripts in /etc/update-motd-d/ are responsible for generating the file /run/motd.dynamic, which looks like this:

Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-37-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Thu Oct  9 10:05:21 EDT 2014

  System load:  0.37                Processes:           164
  Usage of /:   18.1% of 915.51GB   Users logged in:     1
  Memory usage: 55%                 IP address for eth0: 10.10.13.10
  Swap usage:   12%

  Graph this data and manage this system at:
    https://landscape.canonical.com/

That is pretty and provides some good information but for me I rather have a shell prompt immediately after I hit the <Enter> key after typing “ssh <my server>”. I then can manually execute some useful commands like “w” to see who is logged in, the uptime, etc.

I have disabled execution of the /etc/update-motd.d/ scripts by tweaking the files /etc/pam.d/{login,sshd} in the following way:

# Prints the message of the day upon succesful login.
# (Replaces the `MOTD_FILE' option in login.defs)
# This includes a dynamically generated part from /run/motd.dynamic
# and a static (admin-editable) part from /etc/motd.
#session    optional   pam_motd.so  motd=/run/motd.dynamic noupdate
#session    optional   pam_motd.so

# Disable display of /run/motd.dynamic and add "noupdate" so
# scripts in /etc/update-motd.d/* are not called.
# peloy@chapus.net
# 20141009
session    optional   pam_motd.so noupdate

The first two commented out “session” lines are what were there by default. I commented them out and then added a new “session” line that has the “noupdate” keyword, which is actually what causes the pam_motd.so module to not execute the scripts in /etc/update-motd.d/. This uncommented “session” line is what will display the standard /etc/motd file (if it exists).

Now a SSH login is fast, does not provide a lot of information upon log in (which I like), and, most important, spun down disks are not spun up upon log in:

$ ssh altamira
You have new mail.
Last login: Thu Oct  9 10:23:44 2014 from 2003:450:e6a5:100:a412:6z7a:897d:1387
altamira$ # That was a very fast login!

The following askubuntu.com question is what pointed me in the right direction:

http://askubuntu.com/questions/282245/ssh-login-wakes-spun-down-storage-drives

This is an issue that seems to have been introduced when I upgraded this server from Ubuntu 12.04 LTS to 14.04LTS. In particular, my Ubuntu 12.04LTS installation did not have the file /etc/update-motd.d/98-fsck-at-reboot.