Tuesday 8 September 2015

Password Security

A lot of fuss has been made recently about the hackers who released all of the private information held by the company Ashley Madison, which provides a supposed intermediary service for would-be adulterers.  Private information pertaining to millions upon millions of users has been released into the public domain.

What does that mean, though?  If you are one of Ashley Madison’s customers you may be desperately wondering who knows what your name is, where you live, your credit card information - even, perhaps, intimate details of your sexual preferences - what are the prospects for that information also making its way into the public eye?  I thought I would talk a little bit about encryption, how it works … and how to crack it.

Password-based encryption systems work broadly on the basis of either encryptions or hashes.  The difference between the two is that encryption systems are two way functions, whereas hash systems are one-way functions.  With an encryption-based system for every output value there is only one possible input value that could give rise to it, whereas with a hash system every output value has an unmanageably large number of possible input values that could give rise to it.  A trivial, but quite inadequate, example of a hash system might be a function which returns 1 if the input is odd and 0 if it is even.  Clearly, given an output value of 1 or 0 there is no way to determine what the input value was, beyond narrowing it down to an odd or even number.

With an encryption system the designer is basically asserting that there is no known way to reverse the encryption.  The risk is that if such a reversal algorithm is ever discovered it instantly unlocks every password ever encrypted using that system.  By contrast, with a hash system the designer is asserting that reversal is fundamentally not possible.  But in this case the risk is that you can unlock the password by stumbling upon one of a colossal number of possible input values (of which the true password is just one).  Both methods have their advantages, and are widely used.  But note that, in most common parlance, the output of a password security algorithm - whether an encryption or a hashing function - is usually referred to as a ‘hash’.

Typically, in a security application, the input value is a password.  What we have in principle is a system where I can publish both the encryption/hash algorithm and the hash value itself, and nobody can use that information to work out what the password was which created that hash value.  Anybody can make a guess at the password and pass it through the hash algorithm, but unless the guess is correct the result will not match the hash value.  The key advantage is that the open-text password itself is never stored anywhere.  Just about every password-based authentication system today is based upon these principles.

So how do hackers set about cracking those passwords?  The reality is that there is only one way of doing it.  What you do is make a guess at the password, pass it through the hash algorithm, and compare the result to the open-text hash value.  If you guessed right, you’ve cracked it.  If not, you try again with a different guess at the password.  Simple, really.  How long do you think it would take before they guessed right?

What serious ‘professional’ hackers do is to compile lists of ‘known’ passwords.  These are combinations of known names, known places, common numbers (such as dates) and other common candidates.  Also, every time a hacker cracks or otherwise obtains a password, they will add it to a list of known passwords, where it can be used to crack similar passwords the owner may be using elsewhere.  Such lists are maintained in various dark corners of the internet.  The list might, for example, contain the word
FRED’.  People accessing these lists will use that to automatically derive variants such as fred, “fReD”, “FR3D”, “fred69”, and “DERF”.  Hackers using specially-configured computers will burn through these lists, passing each and every possible password through the hash algorithm until they find a match, at which point all of your personal information protected by that password is immediately at their fingertips.  However, such lists usually contain over 10 million passwords.

While this analysis may or may not give you pause for thought, it is based on a simple inconvenient truth.  How quickly it takes the attack to crack your password depends on how ‘secure’ your password is.  An ‘unsecure’ password will appear near the top of the list as one of the best guesses available.  It will be cracked pretty quickly.  A less secure password won’t get tried until later in the process and will take a lot longer to crack.  But a highly secure password is one that won’t actually get tried at all for the simple reason that it won’t even be on the list.  It won’t be cracked at all.

The fact is, the vast majority of password choices used by ordinary people today are not secure enough to avoid appearing on one of those nefarious lists.  If your password doesn’t appear on a list, or cannot be derived from a root that appears on a list, then it is never going to be cracked no matter whose computer is doing the cracking.  Bear this in mind next time you enter a password somewhere!  The problem is that people don’t like properly secure passwords because they are almost impossible to remember.  But there’s no simple way around that.  Here is an example of a highly secure 48-character password (the degree of security increases exponentially with the number and variety of characters used):


2ys5Ts46\Edtwe(Ddpbk8/!6taa3Zs<UdEAZmPtMe"nkrrRF

If this is what you had to type in to access your Ashley Madison account, you can console yourself that it was never going to be cracked.  Mind you, it might take you all day, and several frustrating attempts, just to enter it successfully each time you log in!  Such is the price to be paid for a comfortable level of security.

One of the most commonly used hash algorithms is SHA-2, designed by the NSA.  Less secure is MD5 which is often used to verify the accuracy of a CD rip (which is not a security application).  More secure would be something seriously potent like bcrypt, scrypt, or PBKDF2.  As a reasonable approximation, what makes an algorithm more ‘secure’ is largely down to the amount of computer time taken to calculate the hash value, as this also increases the time taken by a hacker to churn through a list of passwords.  Additionally, algorithms like scrypt work by requiring the computer to have a very substantial amount of RAM, making the required hardware much more expensive, while at the same time compromising a hacker’s ability to configure an attacking computer to perform multiple operations in parallel.  Despite this, the scrypt algorithm, for example, is deceptively simple, amounting to only a few tens of lines of code.

Many experts in computer and internet security are suspicious of NSA-designed algorithms such as SHA-1 and SHA-2, due to concerns that NSA may potentially only approve encryption systems that they are reasonably confident they can crack.  Such considerations are highly speculative.  However, a more recent development, SHA-3, was approved following a competition among non-NSA designers.

Even though what I have described sounds virtually uncrackable you would be surprised at how clever people are when it comes to mathematically analyzing encryption systems and devising increasingly more efficient ways of attacking them.  One such attack is known as a “rainbow table”.  It uses the formidable data storage capacity of modern computer and network technology to store vast lists of what you might term ‘intermediate results’.  By referring to those rainbow tables, hackers can seriously short-circuit the time taken to crack a password.  To get around that, most encryption and hash algorithms using a technique called “salting”, which adds various random characters to the user’s password before encrypting it.  These random characters, called the “salt”, can be stored in plaintext alongside the hash result without compromising the integrity of the encryption.  Salting has the effect of disrupting a rainbow table and rendering it useless.

Back to Ashley Madison.  It turns out that their password encryption system uses a 10-round bcrypt, which is an encryption-based algorithm.  A computer security expert recently analyzed the released Ashley Madison password hashes using a computer optimized for cracking.  He had a list of 14.3 million clear-text passwords.  He loaded the first 6 million Ashley Madison password hashes into his program to attack them.  From his first 6 days of intense cracking he estimated that it would take him 19,000 years to crack all 6 million passwords, and 117,000 years to crack the whole database.  On that basis you would rate Ashley Madison’s password encryption system as very good indeed - relatively few web sites implement anything as powerful as a 10-round bcrypt.  As a sobering thought, the researcher estimated that if Ashley Madison had been using plain unsalted MD5, the passwords of their entire user list could be cracked in a mere 3 seconds!  Note that this is basically an individual with a highly-sophisticated but still relatively ordinary commercial computer.  A government, with a room full of super-computers dedicated to the task, would probably make mincemeat of those numbers.

But the problem with the Ashley Madison episode is not that people’s passwords were cracked.  They weren’t.  The hackers actually hacked directly into the company’s database, which gave them direct access to the user data without needing to know the password in the first place.  And in this case, it would appear that at least part of the database itself was not encrypted, something that is a requirement of the US government’s PCI and HIPAA compliance standards (for financial and medical data respectively).  There may be a case to make that this was an ill-considered and avoidable oversight given the thought that went into their implementation of a password encryption system.

It is important to understand that the Ashley Madison hack does not mean that the clients’ passwords were compromised.  They were in the sense that the leaked password hash values gives hackers everywhere useful information which they could use to attempt to discover what the passwords were.  But as we have seen, the task of churning that data into passwords is a formidable one.  However, for anyone who cares to try, the first passwords to be cracked will be the least secure ones.  The computer security expert I mentioned earlier reported cracking his first 4,096 passwords in only 6 days, and you would imagine that few (if any) of those fell outside of the ‘unsecure’ category.  The trouble is you would imagine that the sort of person who uses a low-security password is also the sort of person that re-uses that same password - either identically or with minor variations - for most of their other secure accounts.  Given that the Ashley Madison database exposes the users’ e-mail addresses in open-text, this gives hackers a significant opportunity to target those individuals and go after their banking and other financial data.  This, ultimately, is the probably biggest threat posed by the Ashley Madison hack.

So, in conclusion, it is probably a good time to ask yourself an important question.  Given what you’ve just read, just how secure do you think your own passwords are?