Data Tokenization: 10 of our most Frequently Asked Questions
There are many flavors of tokens and tokenization. This FAQ focuses on data tokenization—or the act of converting sensitive text data into non-sensitive text.
Data Tokenization FAQs
What is tokenization?
Tokenization refers to the process of generating a digital identifier, called a token, to reference an original value. So, for example, if we tokenized an original value, like a credit card number, we’d create a token that might look like <span class="code">bb8b7ed0-fee5-11ec-9686-ff66557783a8</span>.
What are tokens?
A token refers to the new value generated through the process of tokenization. It replaces sensitive data within an application or database where a piece of sensitive data might exist. In our previous example, the token is <span class="code">bb8b7ed0-fee5-11ec-9686-ff66557783a8</span>. Tokens are unique to the system that create them, meaning that while my card in one system may be <span class="code">bb8b7ed0-fee5-11ec-9686-ff66557783a8</span>, it could also be <span class="code">4343-4343-4343-4343</span> in another.
Tokens allow developers to safely use sensitive data within their applications, databases, or devices without exposing the system to the risks and requirements of holding it.
How do you use a token?
To create a token, an application sends the original value, like a credit card number, to a specialized environment, called a token vault or tokenization system. Here, two things happen: 1) the original value is encrypted and stored for safekeeping, and 2) a token is generated and sent to an application or database for future use.
Detokenization refers to the steps to exchange a token for the right to access and use the original value. To do this, the actor (i.e., a system or a person) sends a token to a tokenization system. Then, after a series of secret handshakes between the application and this tokenization system, the token is detokenized and ready for further instruction.
Let’s imagine you are an eCommerce company. To facilitate a faster checkout experience for repeat customers (and reduce your security and compliance risks), you’ve tokenized my credit card. However, if your backend sent the token as is, my payment would fail.
Instead, your checkout application sends the token and payment instructions to the tokenization system. Once the tokenization system proves that the checkout application has the necessary access and permissions, it detokenizes my previously tokenized payment data. From here, the system may forward my card and payment information to a card processor.
Why would someone want to use tokens?
Tokens can take any shape and are safe to expose, allowing them to integrate with existing systems easily to replace sensitive data. You can reduce your systems’ risk and compliance while unlocking new revenue-driving or cost-saving opportunities. Some of the core benefits include:
- Reduce risks: Applications, devices, and databases collect, store, and use sensitive data to complete day-to-day operations, making them targets for adversaries. In addition, the more fragmented the system, the more surface area criminals have to attack and exfiltrate sensitive data.
Tokens are undecipherable, unreadable, and unusable to those without permissions and access. These attributes prevent adversaries from seeing and using any exfiltrated data from applications, databases, or devices. Meanwhile, my real credit card information stays encrypted and stored in a firewalled environment.
- Reduce or eliminate compliance requirements: Companies, industry groups, and governments have created rules to govern the use, storage, and management of sensitive data, like personally identifiable information (PII), primary account numbers (PANs), and bank account data. These mandated protections—like access controls, firewalls, and audits—aim to prevent the theft and abuse of data used by organizations.
Many applications, databases, and devices rely on seeing or holding sensitive plaintext data to complete day-to-day operations. Doing so can bring these systems “into scope”, creating significant complexity, overhead, and costs for its stewards. The more places this plaintext data exists, the more effort it takes to ensure proper compliance. This can hinder an organization’s response to shifting markets or customer demands. To complicate matters, the rules and requirements governing this data often change based on factors like geography, data type, and usage.
By replacing sensitive data with tokens, you reduce the number of applications, databases, and devices interacting with the plaintext value (e.g., credit card numbers). In doing so, tokens reduce the scope of requirements and their impact on an organization. To give you a sense of this effect, customers using Basis Theory’s compliant environment to store encrypted plaintext values can reduce their PCI Level 1 reporting requirements by up to 90%.
With a smaller compliance footprint comes the ability to respond to shifting regulations more quickly. Now centralized, encrypted, and stored in a compliant location, sensitive data can more easily adapt to new data residency laws, data protection requirements, and industry standards. And, because their applications use tokens instead of the plaintext value, they can accommodate these new requirements without disrupting day-to-day operations.
- Safely using sensitive data: Historically, the risks and compliance requirements governing sensitive data have made it difficult to move beyond sensitive data’s primary use case (e.g., only using SSNs for credit checks). By quarantining and locking down this data, organizations lose opportunities to make better risk decisions, create new partnerships, and design more unified customer experiences.
By abstracting sensitive data, replacing them with token, and gating access, developers can unlock new partnerships, products, and insights that drive revenue or save costs.
What are tokenization platforms, and why are they valuable?
Tokenization refers to the process of creating and using a token. Tokenization platforms refer to the package of compliant functions, support, and infrastructure needed to unlock all the benefits tokens offer. These tokenization platforms can be built in-house or purchased but typically contain some variation of the following:
- A compliant infrastructure: Auditors and regulators require that companies secure and manage sensitive data in a compliant location. These requirements can vary by country or region, data type, etc. Tokenization platforms may offer managed hosted environments that comply with these specialized requirements or accommodate a customer's existing infrastructure.
- Access controls and permissions: Sensitive data should only be accessible and editable to those with the proper access and permissions to do so. Tokenization platforms allow developers to quickly assign NIST impact levels and classifications to their applications and their data, as well as offer or connect to existing tools to manage these controls at scale better.
- Developer services, tools, and documentation: Being able to tokenize and store sensitive data is only part of the equation. Developers require tools, like APIs, services, and compute capabilities, to embed, build, and support tokenization within their system or a third parties’.
Tokenization platforms provide the foundation and supporting documentation developers need to build new products, partnerships, and insights. For example, with Basis Theory’s platform, you can search sensitive data without detokenizing or decrypting the original value, collect user data seamlessly from within your application, or route requests and responses to and from third parties.
- Token capabilities: Tokenization platforms provide immediate flexibility to their tokens, allowing developers to interact with their sensitive data a lot like plaintext. Here are some of the properties developers may receive on Day One.
- Aliasing: Tokens can be formatted or lengthened to preserve the look and feel of the underlying data
- Masking: Tokens can reveal all or part of the original value
- Tagging: Tokens can use metadata, allowing you to reference all of the data owned by a single customer
- Fingerprinting: Tokens can be correlated, allowing developers to create irreversible relationships between multiple tokens that contain identical data.
- Impact levels and classifications: Tokens can ensure only authenticated and authorized actors have access and permissions to the underlying data.
- Searching: Tokens can be indexed, allowing to search the underlying data set without decrypting the original values.
Who cares about Data Tokenization?
Today, industries subject to financial, data security, or regulatory compliance standards tend to fuel demand. That looks a lot like insurance, payment, healthcare, and lending platforms. Tomorrow, however, is a different story.
Governments, like India and the European Union, are driving new and clarifying old data security and residency requirements at a dizzying pace. These well-meaning laws mandate companies to accommodate varying levels of access, localization, and rules across multiple countries depending on where the underlying data is stored and how it is used. The emerging trend has added significant complexity and compliance risk to organizations.
Tokenization platforms provide an abstraction layer that allows organizations to lock down raw values in compliant hosted and localized environments while its applications maintain daily operations with harmless tokens. It’s one of a few reasons many industry and government regulators explicitly endorse tokenization as a viable solution for compliance.
Where can tokens be stored or shared?
Tokens do not contain the original plaintext values, allowing them to be stored anywhere.
What kind of data can be tokenized?
Any data, files, images, etc. We like to say, “If it can be serialized, it can be tokenized.”
What’s the difference between tokenization and encryption?
Fundamentally, the token creates an entirely net new value. By decoupling the sensitivity from the value, tokens allow an organization to lock down, say, a social security number in a protected environment. In contrast, its token counterpart lives out its life in any number of applications, devices, or databases.
On the other hand, encryption works by scrambling and unscrambling the original value back and forth from an unrecognizable state, called a cipher text, to its original plaintext version. While the ciphertext may look like a net new value, it still contains the SSN somewhere inside it.
What are some of the challenges to tokenization?
Tokens rely on a request to retrieve the underlying data, adding additional latency to the process; however, this can be addressed through geo-replication, horizontal and vertical scaling of resources, concurrency, and caching.
Simply generating a token won’t restrict access to the stored data or account for other compliance, security, and risk challenges needed to house and use it. To do so, developers would need to build and maintain many other compliance requirements and best practices to take advantage of all that tokenization can offer. These may include but are not limited to managing users, enabling permissions, maintaining proper encryption key management, standing up to compliant environments, and much more.
As noted, tokenization platforms, like Basis Theory, offer these services out-of-the-box.
If encryption keys can be stolen, so can the API keys used to broker access and permissions to the underlying data. These API keys must be protected to the fullest extent possible. Fortunately, tokens provide more methods for authentication than encryption and are fair easier to maintain as your system scales. Combined these two factors reduce a significant amount of risk that exists regardless of the tool used to secure and share sensitive data today.