Doing the Math on Hashing Credit Card Numbers

When you put your credit card into a website what happens to it?  The goal of this article is to explore some of the possible answers to that question.

With all the changes that are happening in the payment card industry these days, I’ve been thinking about security around it. EMV/Chip and PIN is coming and there are weird things happening around NFC/ApplePay/Google Wallet/Tap to Pay.  There have also been a lot of breaches in the last year, that are really helping expose the weaknesses in how this data is stored and transmitted.  This post is really more a thought experiment about how you store hashed information “securely”.

I’m new here, what is PCI-DSS?

PCI-DSS are basically a set of rules/guidelines/a standard around how payment card information is supposed to be secured.  Unlike HIPAA which has the force of law behind it, or things like ISO 27001 which are openly discussed standards, that aim to be best practice, but are optional, PCI-DSS is an agreement by all third-parties and is handled under contract law.  There is what the standard says you should do, and it either says it explicitly or it says it in technology-agnostic/general terms.  There are fines for non-compliance, but there is also a bunch of risk management stuff that that let’s a card processor for example assume the risk until a retailer can come into compliance.  I’m not a PCI-DSS expert, so take what I say with a grain of salt.  I’d love to hear from actual financial industry nerds about what it is like in the trenches and about what actually tends to happen.

What Does PCI-DSS Say About the Storage of Credit Cards?

I looked online for what the PCI Security Standards Council and others had to say about how this works, and I found a document from 2008, written by the PCI SCC that seemed to have some pretty decent information in it.  It doesn’t get into specific technology recommendations, I guess technology moves to quickly.  It does talk about do’s and don’ts.  How you should treat cardholder data(e.g. name, card number, exp date).

At a minimum, PCI DSS requires PAN to be rendered unreadable anywhere it is stored – including portable digital media, backup media, and in logs. Software solutions for this requirement may include one of the following:

• One-way hash functions based on strong cryptography – also called hashed index, which displays only index data that point to records in the database where sensitive data actually reside.

• Truncation – removing a data segment, such as showing only the last four digits.

• Index tokens and securely stored pads – encryption algorithm that combines sensitive plain text data with a random key or “pad” that works only once.

• Strong cryptography – with associated key management processes and procedures. Refer to the PCI DSS and PA-DSS Glossary of Terms, Abbreviations and Acronyms for the definition of “strong cryptography.”

By the way PAN as defined by PCI as “Primary Account Number is the payment card number (credit or debit) that identifies the issuer and the particular cardholder account. Also called Account Number “.  So this talks a little bit about encryption, which helps secure your information, but while it may be seen as a silver-bullet by some, the data needs to be unencrypted at some point to be useful.  For example, simply encrypting the hard drives on which the credit card numbers is stored, can help with physical security, but in no way helps against hackers.  The whole encryption aspect of PCI-DSS is even more complicated, and is outside of the scope of this article.  Truncating the number seems like a simple solution to the secure storage problem as well, but I can imagine that it is easy for business types to come up with objections to that.  The token idea is awesome, and it is how some of the most modern implementations are happening, but is really hard to implement, both technically and practically getting all parties to deal with the tokens properly.

The main focus of this article is that section on hash functions.  What exactly is a “One-way hash functions based on strong cryptography”?  I think if you asked that question to the average security person, even if you mentioned that it was in the context of PCI compliance, I am betting they would say something like “SHA-1 or SHA-256 are decent one-way hash functions based on strong cryptography”.  It is the goal of this article to perhaps question that notion.

One Way Hashing Functions You Say?2015-01-06

So if your credit card can’t be stored in a truncated fashion, I’m going to guess it is probably going to be a hash.  A hash function is a deterministic, predictable one-way function.  Easy to go one way, really hard to go the other.  Think of it like burning a piece of paper.  The atoms in the paper haven’t been destroyed, but the process is not ever going to be reversed.  Some example uses of hashes are:

  1. Your password on your computer is stored as a hash.
  2. When you download a file, it can be hashed to verify that it arrived all in one piece.
  3. Secure messaging, hashes can be used to ensure integrity and/or as part of digital signatures.
  4. Used as part of proof of work in crypto-currencies.

There are a bunch of different types of hashing algorithms.  Message Digest 5(MD5) was popular for a long time.  There are several iterations of the Secure Hash Algorithm that are in use today(SHA-1, SHA-2/SHA-256,SHA-384,SHA-512).  There are fundamental differences between the way these all work, but it usually holds, that the longer the length of the hash the harder it is to figure out what the original data was.

So… Credit Cards.

So most credit card numbers are 16 digits long.  lets take a look at the number 0123 4567 8901 2345 and how it hashes.

MD5: d927ad81199aa7dcadfdb4e47b6dc694

SHA1: 92c983f5bc8e4014b029985ced72f4b18bb85250

SHA-256: 184aa46d813411727da0dc9e64186bb9907289b5aab4b320d26fff5ea45d8e3d

SHA-384: 346ff9b0b427894a698536bdea7e5ba0862f7ab5beba3a1f007ca8934c4695f1b8c8214fc30ba27a9babec460eecd512

SHA-512: 4c52a43e8558a65e81f86281113ae422eca96540ffdc4dd7ce8813b3caf704434d2de6d0c6d2b0e50003b99a10dc9c0ab64f81bf4ed7f3ace84d7bf82fd8247a

Man I sure feel secure.  It is probably pretty difficult to guess that 0123 4567 8901 2345 came from any of those numbers right?

The Math

So let’s say a you buy stuff from a company, ACME Corp., and they store your credit card in a hash, how hard is it for a bad guy, if he breaks in and steals the database, to determine your credit card number?

So thanks to the “magic internet money”, Bitcoin, a lot of research has gone into how to generate large numbers of hashes, very efficiently, with commodity hardware.  It turns out that graphics cards are pretty well suited for making hashes as they are parallel processing powerhouses.  According to these estimates(this is in the same ballpark), an AMD R9 290X graphics card(currently retailing for around $330), can generate around 2,900,000,000 SHA-1 hashes/second.  2160 = 2.46 x 1046 is the maximum number of hashes possible in SHA-1. Which is a lot. But we actually don’t need to search through all of those hashes to find every hashed credit card.  Because all credit cards are constrained to a 16 digit numbers our search space is constrained to hashes of numbers from “1000 0000 0000 0000” to “9999 9999 9999 9999”.  So that is 8,999,999,999,999,999 card numbers divided by 2,900,000,000 hashes/second gives you 3103448 seconds or roughly 36 days to hash all possible credit card numbers.

But credit card numbers are actually more predictable than that.  For example all Visa cards begin with a ‘4’.  If Visas are only hashed then it will only take 3.6 days.  Each specific issuer starts with the same 4-6 digits, for example most Wells Fargo, USA credit card numbers start with 473702.  So if 9,999,999,999-1,000,000,000=8,999,999,999 possible Wells Fargo credit card numbers.  8,999,999,999 card numbers divided by 2,900,000,000 hashes/second can be done in 3 seconds.  I have also seen a lot of information online where people say something along the lines of: “well, for customer service reasons we store the institution name and/or the last 4 digits of the card number”.  This would make it trivially fast to crack these hashes.

So I hear you saying “well I salt my hashes”.  The problem is that with a hash search space, so tiny, and the salt being stored in plain text with the hash, it is potentially trivial to break the hash in close to real time.  What about using a longer hash than SHA-1?  The difference in performance between SHA-1 and SHA-512 is only an order of magnitude, which is not enough.

Who needs rainbow tables?   source

Who needs rainbow tables? source

Alternatives and Conclusion

So the biggest problem is the speed of which these hashing algorithms can compute a hash, If you slow that down dramatically then a guessing every hashed credit card number becomes too difficult to be practical.  Thankfully we have solutions like bcrypt.  Basically bcrypt uses an intentionally slow hashing function, that is actually based on the key derevation function present in the Blowfish cipher.  The nice thing about using something like bcrypt is that even though it takes a little bit longer to generate the hash when you actually know the card number and are a good guy, it takes a whole heck of a lot longer to guess when you are a bad guy.  The other amazing thing about bcrypt is its ability to become more secure over time, allowing you to increase the time for all future hashings,  There are also somewhat similar alternatives like scrypt and PBKDF2.  All three options seem to incite religious fanaticism about which one you should use, but in my mind any of them are better than using hashes built for performance.

I hope you enjoyed this little thought experiment, and I’d love your feedback.  Hopefully you have at least learned a bit about how hashing works, or better yet understanding how hashing works when working with a constrained set of possible inputs.

Feature Photo Credit:  Kris Krüg

9 thoughts on “Doing the Math on Hashing Credit Card Numbers

  1. In addition, a credit card number has as the last digit a checksum digit. So of all fifteen-digit card number prefixes ABCD-EFGH-IJKL-MNOx, there is only one permitted value of x for the sixteenth digit, the checksum value, not all ten values of 0-9.

    This limits the hash search space further by a factor of 10 in all cases.

  2. There is another method to reduce the possible credit card numbers, credit card numbers use a luhn modulus to verify integrity of the given number. You can shorten your attack space even further, by only generating valid initial values to hash.

    Full disclosure, I’ve worked in the payment processing side and had written a report on this exact subject to attempt to convince the higher ups to not just meet PCI, but to go far further in what I believe are rather flimsy guidelines / rules.

  3. Unsure if my phone has double posted, but
    there is a way to shorten the initial values even further, credit cards use a luhn modulus for integrity verification. Because of this, you can only generate valid initial values for hashing, thus reducing the count even further.

    Full disclosure, I previously worked at a payment processor, and had written a report regarding this exact problem in an attempt to get the higher ups to strive not just for compliance, but security.
    I cannot say I’m an expert in PCI, but I have had a number of years dealing with it from both directions. (Sys admin, and in earlier job, pen tester).
    In my opinion, PCI DSS is weak and sometimes very borderline a joke for anyone who actually wants security; on the technical side it is outdated and sadly the more recent PCI DSS 3.0 didn’t fix anything really, honestly in some cases it makes it easier to side step compliance by changing definitions to be more loosely worded.

  4. OK lets straighten you up on a few points. Firstly how is PCI DSS policed. Anyone that stores or processes Cardholders data must comply to the standard, and any service providers they use must also comply. The sanction for breaking the rules are fines and loss of liability transfer if there is a breach. The ultimate sanction is being excluded from the card schemes (ie cannot process card payments). For a large retailer this is the nuclear threat, that would easily put them out of business.

    Secondly hashing is an acceptable form of encrypting data, on the same basis as encryption. You are presuming that the compromise is that the hashes are exposed, and therefore subject to an off-line attack. In the case that a database of hashes are exposed, then you are right these can be attacked in the same way that password hashes are attacked. if you have the salt then yes even a salted hash could be cracked, but if you don’t have the salt then forget it.

    If the data is encrypted, then unless you have the key decrypting the data will take longer than the lifetime of the universe. PCI DSS has a whole lot of rules about key management that are just as important as the small paragraph that you mention.

    The other point is that the security of transactions doesn’t just rely on the PAN, there are other data elements used to verify transactions, and lots of procedures that merchants must follow in order to prevent a fraudulent transaction being charged back to them.

    Even the US is now moving to adopt Chip & PIN transactions, and the PCI rules for managing PINs are much stricter, essentially they must never be stored.

    I hope this brief overview helps.

    1. Thanks for clarifying on how PCI compliance works. Hashing is not encryption, and pretty much every implementation I have seen of salting prepends the salt to the hash in clear text.

      The encryption thing is tricky though, as what does encrypted actually mean. I mention full disk encryption in the article, there is also encryption during transport, which only helps while it is in transport. Target encrypted their card numbers in the terminals, and then decrypted them within the POS computers to do the authorization. Then the card numbers were re-encrypted and sent along their way. If your systems are compromised then your ability to encrypt and decrypt in a secure way is compromised.

      Yeah Chip and PIN is awesome and the rollout where I live was flawless, but it still doesn’t help with those Card Not Present transactions. I think a well standardized and secure way of doing tokens is the answer there.

      1. Really not trying to do a sales promotion here but the company that I work for does have some legitimate solutions for card not present transactions. However, PCI council has decided to not list them as a validated P2Pe solution due to the lack of HSM. Encourage you to, if time permits, to check shift4 out.

    2. Coming late to the party here, but Stuart’s reply illustrates the problem with the payment industry in that it focuses on what PCI requirements are instead of “actual security.” He also has no idea how chip and pin works, because the EMV/chip standards don’t add any additional encryption whatsoever to protect the PAN or PIN. A copy of the card’s PAN and magstripe are stored on the chip and transmitted to the processor using the same 3DES/DUKPT encryption we’ve been using since the 90’s, if there’s any additional encryption at all. All a chip does is prove there was a chip. EMV supports terminal authenticated pins called offline pins, which are actually quite secure, but they’re not commonly used. Most transactions use online pins, which mean the pin is hashed into a pin block, encrypted using 3DES/DUKTPT in most cases and transmitted to the processor. But this method of encrypting pin blocks was not introduced with EMV or chip technology. It’s been around forever.

      The purpose of hashing a card is usually to determine when the same card is used for data analytics or to drive duplicate transaction detection without actually storing a decrypt-able form of the card number. I thought Jim’s overview of how vulnerable this kind of hashing is to brute force attacks was insightful, and just thought Stuart coming down off the mountain and presenting himself as an expert on credit card security was a little irritating, when it’s clear he only has a surface level understanding of it.

Leave a Reply to Cody Wood Cancel reply