Friday, November 15, 2013

Parsing text for a Credit Card Number

A while back I wrote some code that parses text and looks for credit card numbers embedded in the text and then optionally masks all or some of the credit card in the text. This code is open source and available on GitHub: Credit Card Parsing

The main use case for this code was to remove credit card numbers that customers would email in to customer support when they were trying to buy something. The Payment Cards Industry (PCI) standards require the credit card data be stored under more secure conditions than most default standard storage configurations.

If customers' support tickets are being stored in a regular unencrypted database and they are sending you their credit card numbers then you are not PCI compliant. You have two choices. Make the support ticket database PCI compliant or remove the credit card numbers from the text that you're storing in this database.

The code in this library provides a mechanism to find and then optionally mask the credit card numbers in any block of text.

While I was developing this code I learned something interesting. 0.6% of all UUIDs/GUIDs have a valid credit card number in them. When I say valid I mean that it passes the Luhn Algorithm. I discovered this because our support emails had a lot of UUIDs that were part of URLs and other keys that the customers needed to communicate to us.

The first time I saw this false positive I thought that it was a one in a million chance. Not wanting to rely on that (bad) luck I used Monte Carlo Simulation to determine the probability of this happening and it came out at 0.6%. I had to adjust the credit card finding algorithm to ensure that what it thought was a credit card was not part of a UUID.

Development of this library was an unusual experience in that unit tests were immediately useful and protective. Each time a defect came back from production (i.e. a missed credit card or a false positive) I added that block of text* to a unit test and confirmed that it failed. I then fixed the code to ensure that the unit test passed and every time I did that it broke one of the other unit tests. My adherence to strictly unit test everything I wrote in this project ensured that it never had any regression bugs.

* If it was a missed credit card then that credit card would be replaced by a fake one before it was inserted into the unit test. I used Graham King's Credit Card Number Generator to create these fake credit cards.