When you put your credit card into a website what happens to it? The goal of this article is to explore some of the possible answers to that question.
With all the changes that are happening in the payment card industry these days, I’ve been thinking about security around it. EMV/Chip and PIN is coming and there are weird things happening around NFC/ApplePay/Google Wallet/Tap to Pay. There have also been a lot of breaches in the last year, that are really helping expose the weaknesses in how this data is stored and transmitted. This post is really more a thought experiment about how you store hashed information “securely”.
I’m new here, what is PCI-DSS?
PCI-DSS are basically a set of rules/guidelines/a standard around how payment card information is supposed to be secured. Unlike HIPAA which has the force of law behind it, or things like ISO 27001 which are openly discussed standards, that aim to be best practice, but are optional, PCI-DSS is an agreement by all third-parties and is handled under contract law. There is what the standard says you should do, and it either says it explicitly or it says it in technology-agnostic/general terms. There are fines for non-compliance, but there is also a bunch of risk management stuff that that let’s a card processor for example assume the risk until a retailer can come into compliance. I’m not a PCI-DSS expert, so take what I say with a grain of salt. I’d love to hear from actual financial industry nerds about what it is like in the trenches and about what actually tends to happen.
What Does PCI-DSS Say About the Storage of Credit Cards?
I looked online for what the PCI Security Standards Council and others had to say about how this works, and I found a document from 2008, written by the PCI SCC that seemed to have some pretty decent information in it. It doesn’t get into specific technology recommendations, I guess technology moves to quickly. It does talk about do’s and don’ts. How you should treat cardholder data(e.g. name, card number, exp date).
At a minimum, PCI DSS requires PAN to be rendered unreadable anywhere it is stored – including portable digital media, backup media, and in logs. Software solutions for this requirement may include one of the following:
• One-way hash functions based on strong cryptography – also called hashed index, which displays only index data that point to records in the database where sensitive data actually reside.
• Truncation – removing a data segment, such as showing only the last four digits.
• Index tokens and securely stored pads – encryption algorithm that combines sensitive plain text data with a random key or “pad” that works only once.
• Strong cryptography – with associated key management processes and procedures. Refer to the PCI DSS and PA-DSS Glossary of Terms, Abbreviations and Acronyms for the definition of “strong cryptography.”
By the way PAN as defined by PCI as “Primary Account Number is the payment card number (credit or debit) that identifies the issuer and the particular cardholder account. Also called Account Number “. So this talks a little bit about encryption, which helps secure your information, but while it may be seen as a silver-bullet by some, the data needs to be unencrypted at some point to be useful. For example, simply encrypting the hard drives on which the credit card numbers is stored, can help with physical security, but in no way helps against hackers. The whole encryption aspect of PCI-DSS is even more complicated, and is outside of the scope of this article. Truncating the number seems like a simple solution to the secure storage problem as well, but I can imagine that it is easy for business types to come up with objections to that. The token idea is awesome, and it is how some of the most modern implementations are happening, but is really hard to implement, both technically and practically getting all parties to deal with the tokens properly.
The main focus of this article is that section on hash functions. What exactly is a “One-way hash functions based on strong cryptography”? I think if you asked that question to the average security person, even if you mentioned that it was in the context of PCI compliance, I am betting they would say something like “SHA-1 or SHA-256 are decent one-way hash functions based on strong cryptography”. It is the goal of this article to perhaps question that notion.
One Way Hashing Functions You Say?
So if your credit card can’t be stored in a truncated fashion, I’m going to guess it is probably going to be a hash. A hash function is a deterministic, predictable one-way function. Easy to go one way, really hard to go the other. Think of it like burning a piece of paper. The atoms in the paper haven’t been destroyed, but the process is not ever going to be reversed. Some example uses of hashes are:
- Your password on your computer is stored as a hash.
- When you download a file, it can be hashed to verify that it arrived all in one piece.
- Secure messaging, hashes can be used to ensure integrity and/or as part of digital signatures.
- Used as part of proof of work in crypto-currencies.
There are a bunch of different types of hashing algorithms. Message Digest 5(MD5) was popular for a long time. There are several iterations of the Secure Hash Algorithm that are in use today(SHA-1, SHA-2/SHA-256,SHA-384,SHA-512). There are fundamental differences between the way these all work, but it usually holds, that the longer the length of the hash the harder it is to figure out what the original data was.
So… Credit Cards.
So most credit card numbers are 16 digits long. lets take a look at the number 0123 4567 8901 2345 and how it hashes.
Man I sure feel secure. It is probably pretty difficult to guess that 0123 4567 8901 2345 came from any of those numbers right?
So let’s say a you buy stuff from a company, ACME Corp., and they store your credit card in a hash, how hard is it for a bad guy, if he breaks in and steals the database, to determine your credit card number?
So thanks to the “magic internet money”, Bitcoin, a lot of research has gone into how to generate large numbers of hashes, very efficiently, with commodity hardware. It turns out that graphics cards are pretty well suited for making hashes as they are parallel processing powerhouses. According to these estimates(this is in the same ballpark), an AMD R9 290X graphics card(currently retailing for around $330), can generate around 2,900,000,000 SHA-1 hashes/second. 2160 = 2.46 x 1046 is the maximum number of hashes possible in SHA-1. Which is a lot. But we actually don’t need to search through all of those hashes to find every hashed credit card. Because all credit cards are constrained to a 16 digit numbers our search space is constrained to hashes of numbers from “1000 0000 0000 0000” to “9999 9999 9999 9999”. So that is 8,999,999,999,999,999 card numbers divided by 2,900,000,000 hashes/second gives you 3103448 seconds or roughly 36 days to hash all possible credit card numbers.
But credit card numbers are actually more predictable than that. For example all Visa cards begin with a ‘4’. If Visas are only hashed then it will only take 3.6 days. Each specific issuer starts with the same 4-6 digits, for example most Wells Fargo, USA credit card numbers start with 473702. So if 9,999,999,999-1,000,000,000=8,999,999,999 possible Wells Fargo credit card numbers. 8,999,999,999 card numbers divided by 2,900,000,000 hashes/second can be done in 3 seconds. I have also seen a lot of information online where people say something along the lines of: “well, for customer service reasons we store the institution name and/or the last 4 digits of the card number”. This would make it trivially fast to crack these hashes.
So I hear you saying “well I salt my hashes”. The problem is that with a hash search space, so tiny, and the salt being stored in plain text with the hash, it is potentially trivial to break the hash in close to real time. What about using a longer hash than SHA-1? The difference in performance between SHA-1 and SHA-512 is only an order of magnitude, which is not enough.
Alternatives and Conclusion
So the biggest problem is the speed of which these hashing algorithms can compute a hash, If you slow that down dramatically then a guessing every hashed credit card number becomes too difficult to be practical. Thankfully we have solutions like bcrypt. Basically bcrypt uses an intentionally slow hashing function, that is actually based on the key derevation function present in the Blowfish cipher. The nice thing about using something like bcrypt is that even though it takes a little bit longer to generate the hash when you actually know the card number and are a good guy, it takes a whole heck of a lot longer to guess when you are a bad guy. The other amazing thing about bcrypt is its ability to become more secure over time, allowing you to increase the time for all future hashings, There are also somewhat similar alternatives like scrypt and PBKDF2. All three options seem to incite religious fanaticism about which one you should use, but in my mind any of them are better than using hashes built for performance.
I hope you enjoyed this little thought experiment, and I’d love your feedback. Hopefully you have at least learned a bit about how hashing works, or better yet understanding how hashing works when working with a constrained set of possible inputs.
Feature Photo Credit: Kris Krüg