Last night I wrote an email advising a client about the Heartbleed vulnerability and its impact on operations. I’m posting that here, as it’s a little more balanced and professional (and better written) than the blog entry I wrote last night. Of course I’ve omitted names to protect my clients’ privacy. (And I asked permission before publishing this.)

 

On April 7, 2014, the OpenSSL project released a security advisory for a vulnerability called CVE-2014-0160 (aka “Heartbleed”).

This is a newly-announced vulnerability. The following is based on our reading of the security advisory and related information. Further information on this topic is becoming available rapidly, and we will update you as we get new facts and as our understanding of the facts changes.

OpenSSL is the tool that provides secure connections to websites. It’s what makes https:// connections work properly. Our project uses OpenSSL to secure the public’s connections to our web servers.

The vulnerability allows an attacker to send a specially-crafted packet to OpenSSL, and receive a dump of up to 64KB of the server’s memory. Because the attacker can send multiple packets each requesting a different 64KB chunk of the server’s memory, in effect the attacker can read the entire memory (RAM) of the server.

The vulnerable versions of OpenSSL (1.0.1 and 1.0.2) were released in March 2012, and all versions released between then and today are vulnerable to this attack. These versions are very widely-distributed. (Including on Ubuntu 12.04 Server, the operating system we use for the our servers.)

Because this is an attack that a very large number of servers on the Internet are vulnerable to, it is likely that this attack will be quickly “weaponized” so that unskilled attackers can use it in an automated fashion. At present, only highly-skilled attackers can use it.

We have verified using http://filippo.io/Heartbleed/ that our project is vulnerable to this attack.

All of this lends a great deal of urgency to the update and remediation process:

  1. The very first step is to update the version of OpenSSL on our servers to the latest version, released today. This can be done quickly and easily, and should be done as soon as possible. It is extremely likely that there will be no service interruption (other than on each server as it is updated) resulting from this update.
  2. Once the servers have been updated, they are no longer vulnerable to this attack. However, attackers could be in possession of private information that they copied off the server before OpenSSL was updated. (It appears that security researchers have been aware of this issue since December 2013, but attackers could have been using this attack any time from March 2012 to the present.)
  3. The following information should definitely be revoked and reissued:
  • SSL Certificates
  • VPN Certificates (VPN also uses SSL)

Until these certificates are revoked and reissued, we have to assume that attackers can eavesdrop on and modify the HTTPS/SSL traffic of our customers reaching our website, and of our VPN traffic.

Anything that has been in RAM on the server, even briefly, should be revoked and reissued. Our developers will have to study the issue to determine what needs to be revoked. However, a partial list of things that may have been in memory and may have been compromised includes:

  • End-user usernames and passwords
  • API keys
  • Passwords and authentication certificates to internal and external services, including our databases and monitoring services.

We are not aware at this time if any Intrusion Detection System or network monitoring can detect this attack. This attack does not show up in OpenSSL or system logs.

 

Introduction

I’m trying not to be alarmist, but it seems we’ve got a “holy fucking shit” security bug that everyone who runs a server needs to be aware of. And that everyone who uses the Internet needs to be aware of. Note that this was written on April 7, 2014. Please check to make sure this information is still up-to-date if you’re reading this much after that.

The information on this blog post focuses on what to do if you’re running Ubuntu LTS 12.04 Precise. (Because that’s what the servers I administer run. I haven’t had time to research and write a general-purpose blog post.)

A good writeup is at heartbleed.com. A (very overwhelmed and slow) testing site is at filippo.io/Heartbleed.

TL;DR Me

If you’re running Ubuntu 12.04 servers, run this command:

openssl version -a

If you see a “built” date of ”Mon Apr 7 20:33:29 UTC 2014″ or later, you have a fixed version. If you don’t, you still have the unfixed version.

If you don’t see the patched version, you need to upgrade immediately.

Whether or not you had the updated version you next need to revoke and reissue your SSL keys. And then you need to revoke and reissue any data that could have been in RAM on the server, because it could have been compromised.

Also, change your passwords on every sensitive on-line service you use.

 

What Does This Vulnerability Let Attackers Do?

They can read anything from the RAM on the attacked system. Like private SSL keys. Or anything else that might be in RAM.

Yeah. Holy Fucking Shit.

Here’s the writeup from heartbleed.com:

What is being leaked?

Encryption is used to protect secrets that may harm your privacy or security if they leak. In order to coordinate recovery from this bug we have classified the compromised secrets to four categories: 1) primary key material, 2) secondary key material and 3) protected content and 4) collateral.

What is leaked primary key material and how to recover?

These are the crown jewels, the encryption keys themselves. Leaked secret keys allows the attacker to decrypt any past and future traffic to the protected services and to impersonate the service at will. Any protection given by the encryption and the signatures in the X.509 certificates can be bypassed. Recovery from this leak requires patching the vulnerability, revocation of the compromised keys and reissuing and redistributing new keys. Even doing all this will still leave any traffic intercepted by the attacker in the past still vulnerable to decryption. All this has to be done by the owners of the services.

What is leaked secondary key material and how to recover?

These are for example the user credentials (user names and passwords) used in the vulnerable services. Recovery from this leaks requires owners of the service first to restore trust to the service according to steps described above. After this users can start changing their passwords and possible encryption keys according to the instructions from the owners of the services that have been compromised. All session keys and session cookies should be invalided and considered compromised.

What is leaked protected content and how to recover?

This is the actual content handled by the vulnerable services. It may be personal or financial details, private communication such as emails or instant messages, documents or anything seen worth protecting by encryption. Only owners of the services will be able to estimate the likelihood what has been leaked and they should notify their users accordingly. Most important thing is to restore trust to the primary and secondary key material as described above. Only this enables safe use of the compromised services in the future.

How to Upgrade & Verify You Have Upgraded

To verify whether or not you have the problem, do:

openssl version -a

If you see a “built on” date of ”Mon Apr 7 20:33:29 UTC 2014″ then you have the upgraded version. If not, you need to do the following steps:

apt-get update
apt-get install -y openssl libssl1.0.0
openssl version -a 
# (You should now see the April 7 date.)
lsof -n | grep ssl | grep DEL
# If you see results here, you have processes still using the old OpenSSL that to be restarted.

What Else do you Have to do Besides Update?

Once you’re patched your servers, you should definitely generate new SSL keys and CSRs, and then put new SSL certificates in place. This must be done AFTER you patch.

After that, you’ve got to go through the soul-searching and unpleasant process of figuring out what other passwords / certificates / secret information / whatever could have been in RAM on your server, and reissue/revoke those. Can’t help you here, as it’s going to be different depending on what you’re doing with your server.

For instance, it will be fairly common for services to need to revoke all user passwords and have the users create new passwords.

For End Users (Not Server Admins)

For you personally, do you use the same password on more than one site? Change it immediately so it’s different on each site. Password storage utilities such as 1Password or LastPass will be helpful here.

Once services have patched their OpenSSL implementation, you’ll want to change your password on EVERY SERVICE. Yeah, I know. That sucks. But that’s how big a problem this is.

You can check whether or not your bank (for instance) has fixed this by using the filippo.io/Heartbleed tool I mentioned above.

I’m doing some work on demo projects for AT&T’s M2X.* Prior to this there was example code, but there wasn’t a quick and easy way of getting a working example going.

I’ve now created two new demos, with two hosting options:

The Vagrant demo, of course, is free to host, as it runs on your personal computer. The OpenShift demo is free to use as long as you have fewer than three OpenShift applications already running. (There’s a limit of three free applications.)

Both demo repos contain loadreport.rb and stockreport.py. Loadreport reports the system load to AT&T’s M2X system every minute. Stockreport reports the current value of AT&T’s stock to M2X every minute. Both create their M2X Blueprints automatically.

The script stockreport.py uses the M2X Python package. The loadreport.rb script uses the Ruby M2X gem. This may be the first working Ruby code for creating M2X blueprints that’s been published.

Because M2X is currently free for developers, it is important to provide free options for developers to use to experiment with M2X. With these two solutions, you can have a free M2X testing environment, either on your computer or in the OpenStack cloud.

There are more demo languages and more demo environments coming, and I’ll be announcing those here as they’re published.

* I don’t speak for AT&T or Citrusbyte.

If you’re using Vagrant and you’ve upgraded to the latest version of VirtualBox (4.3.10 as of March 25, 2014), and you’ve got vagrant-vbguest installed (as you should) then you’ve undoubtedly run into a nasty bug:

default: /vagrant => /YOUR_VAGRANT_DIRECTORY
Failed to mount folders in Linux guest. This is usually because
the "vboxsf" file system is not available. Please verify that
the guest additions are properly installed in the guest and
can work properly. The command attempted was:
mount -t vboxsf -o uid=`id -u vagrant`,gid=`getent group vagrant | cut -d: -f3` /vagrant /vagrant
mount -t vboxsf -o uid=`id -u vagrant`,gid=`id -g vagrant` /vagrant /vagrant

This turns out to be a bug in the latest version of VirtualBox.

The workaround, per that bug ticket, is to issue this command after SSHing into your virtual machine:

sudo ln -s /opt/VBoxGuestAdditions-4.3.10/lib/VBoxGuestAdditions \
/usr/lib/VBoxGuestAdditions

That does work. However, I wanted to make this a little more automated, as at the moment I’m creating Virtual Machines as examples, and I want it to be as seamless as possible. I made my Vagrantfile look like this:

# -*- mode: ruby -*-
# vi: set ft=ruby :
 
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
 
# Provide fix for Bug 12879 in VirtualBox: https://www.virtualbox.org/ticket/12879
$fix12879 = <<SCRIPT
if [ -e /opt/VBoxGuestAdditions-4.3.10/lib/VBoxGuestAdditions -a ! -h /usr/lib/VBoxGuestAdditions ]; then
    # If we're on version 4.3.10 of Guest Additions AND we haven't created the symlink:
    ln -s /opt/VBoxGuestAdditions-4.3.10/lib/VBoxGuestAdditions /usr/lib/VBoxGuestAdditions
    echo "Working around bug 12879 in VirtualBox. Next do a vagrant reload --provision'"
    exit 1
fi
SCRIPT
 
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "precise64"
  config.vm.box_url = "http://files.vagrantup.com/precise64.box"
  config.vm.provision "shell", inline: $fix12879
  config.vm.provision "shell", path: "bootstrap.bash"
 
end

(I’ve had formatting issues with the above code that I haven’t been able to fix. If you’re going to cut-and-paste, better to do it from this Gist.)

The steps to take, then are:

vagrant up
# See the error about vboxsf
vagrant provision
# See the message about running "vagrant reload --provision"
vagrant reload --provision

I’d love to get this even more automated — as one or two steps instead of three. If you have any ideas for how to do that, please get in touch with me.

I do a lot of work with virtual machines (mainly Vagrant / VirtualBox instances for my DevOps work) and speed is a huge issue for me.

When I started at my new job I requested the fastest computer I could find — a new iMac with an SSD drive — and it’s still not fast enough when you’re iterating on virtual machines. So I went old-school, and created a ramdisk.

I changed my VirtualBox settings to create new virtual machines on the ramdisk, and my BASH init scripts are set up to create the ramdisk when they run.

The idea is that my virtual machine files will be stored in RAM on the ramdisk, and will be much faster to access and change. Of course, the downside is that when you reboot, you lose the ramdisk, and all its contents. Because I’m on an iMac and don’t reboot too often, this nets out to be a win for me.

Setting up VirtualBox was trivial: Screenshot 2014-01-10 17.57.07Setting up the script to create the ramdisk was a little more complicated.

First of all, here’s the command:

        diskutil erasevolume HFS+ “RAMDISK” `hdiutil attach -nomount ram://$SIZE_IN_SECTORS`

That will create a volume called RAMDISK and mount it. I called this from my ~/.bashrc file. However, I ran into several minor problems, and my example might save you some time:

First, this runs on creation of every terminal window, and you only want one ramdisk to exist, so I added logic to test for the existence of the ramdisk before starting it.

Then I ran into another problem — when you close the Terminal with multiple windows open, it will open multiple windows when it starts again — and because things are so fast, each window will check for the existence of the ramdisk, see that it’s not there, and create it. You end up with a dozen ramdisks.

So now my code checks for the existence of the ramdisk. if it’s not there, it sleeps a random amount of time, then checks again, and creates the ramdisk if needed. I can open two dozen terminal windows at once with no ramdisk in place, and it will only create one. (It’s possible that a collision could make it generate two or more ramdisks, but I re-open Terminal with no ramdisk mounted so seldom that it has never happened to me, and hasn’t been worth engineering against.)

Here’s the ramdisk stanza I ended up with:

# Set up RAM disk for VirtualBox Images
SIZE_IN_GB=14
SIZE_IN_SECTORS=`echo “$SIZE_IN_GB*1024^3/512″ | bc`
if [ -d /Volumes/RAMDISK ]; then
echo “RAMDISK for VirtualBox images already exists. Doing nothing.”
else
echo “Ramdisk does not exist.”
    SLEEPTIME=$(($RANDOM % 30 + 1))
    # Sleeping before attempting to create ramdisk, to avoid creating multiple ramdisks when opening multiple windows simultaneously.
    echo “Sleeping $SLEEPTIME”
    sleep $SLEEPTIME
    if [ -d /Volumes/RAMDISK ]; then
echo “Another window created the ramdisk. Doing nothing…”
    else
echo “Creating ramdisk.”
        diskutil erasevolume HFS+ “RAMDISK” `hdiutil attach -nomount ram://$SIZE_IN_SECTORS`
    fi
fi

 

Here’s it on GitHub, easily clonable.

The final issue is what happens when you reboot and your virtual machines disappear. VirtualBox will report that the machines are “inaccessible.” They certainly are! Here’s the fix:

VBoxManage list vms

VBoxManage unregistervm {774a761b-9cf3-45cc-a514-aeec198ea3d0}

Of course, use the guid displayed by “list vms” on your system, not the guid I used.

So far this has been working well for me, and really speeding things up for about three weeks.


Update 2014-03-24:

I created a quick Python script to remove my inaccessible VMs. You can find it at https://github.com/johnmarkschofield/nuke-virtualbox-inaccessible

Working at Citrusbyte

October 19, 2013 — 2 Comments

I’m incredibly excited to be working at Citrusbyte. I’m not sure how much I can say about how they work, but I know I can say this: They do their best to work according to the video and blog posts on this “How Github Works” page.

I remember reading those posts about how “work” works at Github — how they don’t measure by hours worked, and they optimize for programmer happiness. And I remember thinking that I wouldn’t get that good a working environment anywhere I worked.

I was wrong. Citrusbyte isn’t perfect (they have to have some compromises to fit how their clients work) but in general, they fit that Github model pretty closely.

I’m feeling incredibly excited and lucky to be there.

I had my favorite kind of consulting gig recently — a friend who runs a consulting company called, and said he had a problem his guys couldn’t figure out. I love being tech support for other smart geeks!

It’s pretty hard to find a solution to this on Google, so I figured I’d write it down for anyone else in this situation.

A client of my friend’s had an OS X file server with very complicated ACLs (Access Control Lists) on each file and directory.

They had created ACL entries for many individual users, and then had removed those users from the system as they left the company, got transferred, etc. You can list ACLs with “ls -le”. Here’s a sample of a broken one:

$ ls -le
-rw-r-----+ 1 schof staff 0 Aug 27 14:27 testfile
0: ADFB6DAF-96D1-4F78-8A16-4CBF465EC283 allow read

You can see the user name has been replaced with a GUID.

Normally to remove an ACL entry you would use “chmod -a”. That doesn’t work when the username has been replaced with a GUID:

$ chmod -a "ADFB6DAF-96D1-4F78-8A16-4CBF465EC283 allow read" testfile
chmod: Unable to translate 'ADFB6DAF-96D1-4F78-8A16-4CBF465EC283' to a UID/GID

My friend was stuck. I figured out that you could use the =a# option to chmod to change a particular ACL line. You can use this to change the GUID line to a valid ACL line, and then use “chmod -a” to remove the fixed line:

$ ls -le
total 0
-rw-r-----+ 1 schof  staff  0 Aug 27 14:27 testfile
0: ADFB6DAF-96D1-4F78-8A16-4CBF465EC283 allow read

$ chmod =a# 0 "schof allow read" testfile

$ ls -le
total 0
-rw-r-----+ 1 schof  staff  0 Aug 27 14:27 testfile
0: user:schof allow read

$ chmod -a "schof allow read" testfile

$ ls -le
total 0
-rw-r-----  1 schof  staff  0 Aug 27 14:27 testfile

That takes care of the problem nicely.

Of course, my friend’s client had tens of thousands of files that had this problem, so fixing it manually was not an option. I wrote a small script that checked for GUID ACL entries, and then removed them using this method. Worked like a charm!