What is Tokenization?
A deep dive on tokenization, its benefits and drawbacks, and how it compares to encryption.
Security has traditionally been treated as a featureless and burdensome expense paid by organizations. Tokenization, however, has reframed the conversation, leading to an explosion in its usage over the last few years.
To understand how tokenization levels the playing field, it’s helpful to understand:
- What tokenization is, why it’s important, and how it compares to encryption.
- What a tokenization platform is and how it helps developers satisfy their compliance and security requirements without the distractions, costs, and liabilities of securing it themselves.
- How tokenization differs from encryption (spoiler: they’re complimentary).
A brief introduction to tokens and tokenization
A token is a non-exploitable identifier that references sensitive data. Tokens can take any shape, are safe to expose, and are easy to integrate. Tokenization refers to the process of storing data and creating a token. The process is completed by a tokenization platform and looks something like this:
- You enter sensitive data into a tokenization platform.
- The tokenization platform securely stores the sensitive data.
- The system provides a token to use in place of your sensitive data.
What is a third-party tokenization platform and what does it do?
Tokenization platforms can be split into two functions: a token vault and its services.
The token vault offers a secure and compliant location to store the original data (e.g,. card numbers). The services provide organizations the ability to collect, abstract, permission and use the tokens or variations of the underlying data. Combined, these two components provide organizations a way to collect, store, and use sensitive data without assuming the ongoing costs, delays, and liabilities of securing it themselves.
How are tokens used?
Instead of using the sensitive data, developers and their applications use previously generated tokens to execute traditional operations that sensitive data would provide, like performing analyses, generating documents, or customer verification.
To see how this works, let’s take a look at two examples:
A company needs PII to generate and send tax documents for its employees. This company doesn’t want to go through the trouble of securing their employees’ data within their own system, so they use a tokenization platform for storing sensitive employee data. During onboarding, employees provide their PII via a form hosted on a company’s website.
Although the company hosts the website, the form uses an iframe which captures and sends PII to the tokenization platform. Tokens are generated to represent the PII and given back to the company. The company uses the tokenization platform to process and generate the tax document complete with the necessary sensitive information, all without the company worrying about compliance.
Another company needs PCI information from its customers to process payment for its e-commerce website. Similar to the company mentioned above, they don’t want to build a compliant system that doesn’t give them a competitive advantage. Additionally, they don’t want to be locked into a specific payment processor. They opt for a tokenization platform that can process payments with many payment processors.
When customers reach the point on their e-commerce site to enter payment information, an iframe is used to send PCI information to the tokenization platform. Tokens representing the customer PCI information are sent back to the company. As soon as the customer takes action to purchase, the company makes requests with that customer’s tokens through a proxy that calls a payment processor to charge the customer’s payment method. This was all done without the company needing to comply with stringent PCI compliant policies.
The benefits of third-party tokenization platforms
The benefits of tokenization aren’t as pronounced if you manage your own tokenization platform, which not only requires immense effort, but also leaves you responsible for securing and maintaining compliance that comes with having it in your systems. Accordingly, the vast majority of companies that tokenize their sensitive data utilize third-party platforms. In other words, the big benefits of and reasons for tokenization rest with the platforms offering these services.
Since tokens are simply a reference to data, they can look like anything. Tokens can be a set of random characters, or they can look like the sensitive data they reference. This “format preserving” capability makes tokens easy to validate, identify, and store, especially with existing systems. For example, when storing a card number that looks like 4929 1457 2823 9313 in a tokenization platform, you may get a token of 4264 2159 7881 9313, in which, the token preserves the last 4 digits of the card number. This is called aliasing. These kinds of tokens may also be easier to store if they’re formatted like the data that they hide.
Easy to integrate with other systems
Depending on the tokenization platform, tokenized data is easy to integrate with other systems. Let’s say you use a tokenization platform that offers an outbound proxy — a server that makes requests on your behalf — that trades a token for the sensitive data before going to its final destination. This would make it straightforward to send sensitive data to another system without taking on any risk yourself.
Safe to expose
Tokens are safe to expose because they are not derived from the data they reference. Tokens don’t need to have anything in common with the data they reference. Because of this, tokens are completely un-exploitable, meaning they’re irreversible without accessing the tokenization platform. You can be at ease passing your token around because they themselves can’t be hacked.
Minimize or eliminate compliance requirements
While necessary, compliance requirements are a significant burden for organizations to bear. This non-value-generating work consumes resources, timelines, and funds.
The alphabet soup of private and public regulations, like PCI, GDPR, HIPAA, and more, endorse the use of tokenization and specialized tokenization platform providers to help organizations descope their compliance requirements and secure their data.
The drawbacks of third-party tokenization platforms
There’s no such thing as a perfect solution when it comes to security, but acknowledging challenges helps us, at Basis Theory, hedge these challenges for customers.
By most standards, storing sensitive data and returning an identifier isn’t enough to properly restrict access to the stored data. At the very least, tokenization platforms should use an authentication mechanism, like OAuth, to restrict who has access to this data. As an additional layer of protection, these systems may encrypt the sensitive data so that it’s safe against unauthorized access. An extra step must be taken in order to ensure that the stored sensitive data is protected. Fortunately, most tokenization platforms offer authentication, encryption, and permissioning features to make sure your data is safe.
In order for tokenized data to be usable, a request must be made to the tokenization platform to retrieve the underlying data. This means that there will be added latency to retrieve any sensitive information. While this increase in latency is negligible in most cases, tokenization may not be ideal if your system requires an immediate response. Lucky for us, latency can be treated through geo-replication, horizontal and vertical scaling of resources, concurrency, and caching.
Service Provider Downtime
Like all popular cloud providers, systems must be available to serve legitimate requests. Any service used for tokenization must scale in order to meet the demand of its clients. For seamless workflows, these services should have safeguards in place to protect against spikes in traffic and outages. Anyone considering tokenization must consider this extra dependency. Prudent systems address downtime through redundancies, self-healing operations, heartbeats and pings, synthetic tests, and 24/7 support.
Alternatives to tokenization
Encryption is one of the most popular methods of securing data. It works by converting plaintext data into an unrecognizable string of numbers, letters, and symbols—also known as ciphertext. The ciphertext is a computed value based on a key and plaintext data. To retrieve the original sensitive data within the ciphertext, you’ll need to use a key capable of converting the ciphertext data back into plaintext data.
Encryption vs. Tokenization
Encryption and tokenization are more different than they are similar. We found that contrasting the two would be more valuable in describing each method and in deciding which to use in what situations.
Format. As discussed earlier, tokens can take any shape, making them more readable and shareable to both machines and humans. They can be validated and identified without risk exploitation. This is significantly different from encryption, where the resulting ciphertext is something that’s generally not within your control. This means organizations introducing AES-256 will need to ensure their systems, applications, and databases account for ciphertext lengths that may exceed, for example, the original 16-digit card number.
Vulnerability. Encrypted data is created by performing operations on the sensitive data. In some cases, if an attacker is given enough data about the encryption algorithm and the environment in which it was run, the ciphertext can be reversed to plaintext data without a key, although newer encryption algorithms are usually sophisticated enough to make this sort of attack very difficult and time consuming. Tokens, on the other hand, do not depend on the sensitive data to be created. Which means tokens may not have anything in common with the data they conceal. In order to exploit a token the tokenization platform that created it must be exploited.
Authentication and Authorization. With encryption, the key used to decrypt the data is how you authorize and retrieve the data. And in most cases, you only have one key to use with the ciphertext. Consequently, to revoke access to ciphertext you must use a new key to create a new ciphertext. This can be inconvenient and unweildy. With tokenization, there may be a multitude of ways to authenticate and authorize depending on the tokenization platform. API keys, PINs, passphrases, certificates, username and password—the list goes on of all the ways you’re able to verify your identity. The great thing about this is that your permissions may be entirely separate and different from someone who you’ve allowed access to your tokens. You’re able to manage your access independent of someone else’s.
Integration. Integrating with encryption is best done when using a widely accepted protocol in place to exchange the keys necessary to encrypt and decrypt data. One of the most common protocols for this is HTTPS. Without using a common protocol, securely sending over the necessary keys may be a significant challenge. Tokenization platforms allow for extensive components and mechanisms to integrate with third parties. These can range from forward and reverse proxies to direct integrations. Tokenization has the clear advantage here in that it can use a variety of different methods to send your data securely to an entrusted third party.
Prevalence. Encryption is ubiquitous. You can find an implementation for encryption algorithms in just about every platform you look at. Encryption has been around for a long time, has many standards, and has solidified itself as the de facto tool to secure your data. Tokenization has a long way to catch up in this area. We’ve seen tokenization rise in popularity as of late, especially in the payments industry. There are more and more platforms and integrations with different technologies that we’re seeing every day. Soon we may see standards for tokenization in everyday use.
Independence: Encryption scrambles a raw value while tokenization generates a net new value. While both can be stored within a system, encrypted values carry the raw data with them where ever they go whereas tokens reference the encrypted values held elsewhere. By keeping an encrypted value secured in a specialized database and using a token in your systems instead, you essentially decouple the raw data's risk from its utility. This independence is one of the big reasons developers and CISO alike prefer tokens.
When to encrypt vs. when to tokenize
Choosing between encryption and tokenization often comes down to a question of how often the data needs to be accessed.
Encryption is best used when there is a smaller number of systems that need access to the concealed data. In order to make ciphertext meaningful to other systems, decryption keys have to be retrievable by those systems. If you're not already using an established protocol, safely making those keys available to other systems can be difficult. And leaked keys are often culprits in data breach attempts, as Samsung discovered when criminals used keys to leak company source code, or as NordVPN experienced when attackers gained root access to its VPN servers via its encryption keys. Encryption is an excellent option for when trusted actors need access to sensitive data.
Tokenization, on the other hand, is best when a store of sensitive data is utilized by many actors. These systems can offer forward and reverse proxies, and direct integrations. This makes sending sensitive data seamless and secure. Additionally, tokenization platforms let administrators easily retain control by making it straightforward to revoke or grant access to the underlying data. Tokenization is also ideal when you need to share sensitive data, and for simpler workflows across various systems.
At Basis Theory, we encrypt and tokenize data, as well as offer APIs to administer the back-end processes (like key management and access policies) needed to govern both.
Evolution of tokens
So far we've stayed pretty vanilla about tokens, but the next evolution of tokens goes beyond serving as a simple reference to the raw data. For example, Basis Theory's Tokens allow users to dynamically configure multiple properties with a single token, allowing developers to tailor the permissions, masks, and preserve their format and length. Today's tokens can be searched, fingerprinted, and tagged, and with services, like Proxy, they can be shared with any third-party endpoint.
Tokenization is the process by which tokens are created. A token is a reference to sensitive data that’s stored within a tokenization platform. They are highly flexible, safe to expose, and, with the right tokenization platform, easy to integrate. Encryption, on the other hand, secures your data by using a key to obfuscate your sensitive data into ciphertext, making it a valuable method for storing your data at rest, and when a few trusted actors need access to it.