What’s the best way to search through my encrypted data?
We’ve had customers ask us this question several times before. So now we’re proud to announce that Crypteron natively supports searchable encryption! In this post, we’ll be covering exact searches, wildcard searches as well as fuzzy searches. And we’ll be doing that via industry standard, battle tested encryption algorithms. All this is available today and follows the same uber-simple programming model you’ve come to expect from us. Hooray 🎉 !
But first – the natural tension
One of the fundamental objectives of strong encryption is to eradicate all patterns from encrypted data. Everything should looks like noise or garbage. And all garbage should look the same. However, searching depends on patterns to traverse the search space. So there is a natural tension between strong encryption and efficient searching. These fundamental opposing forces are why it’s very difficult to combine encryption with searching.
The broken ways
Before we dive into how Crypteron does it, lets cover some of the broken ways some other platforms have taken to achieve searchable encryption. They perform “encryption” in a way that utterly breaks the promises of modern, strong cryptography. Such crypto-sins include using AES in ECB mode on data larger than 128 bits or using a zero or constant initialization vector (IV). So even if you are technically “encrypting” your data, it’s not secure. In fact, some systems make this bad situation even worse by then encrypting each word individually! This utterly destroys AES security. Even the simplest of frequency analysis attacks can effortlessly decrypt “encrypted” data. In real time, on low end mobile processors.
The images below show the original unencrypted source (“plain text”), an “encrypted” version using the above mentioned kludges/hacks and finally using a modern encryption the right way.
Original | |
Broken cryptography | |
Cryptography done correctly |
Another example
Original | Broken Cryptography | Cryptography done correctly |
You can visually see the leakage of information above. For those concerned with compliance, none of the above approaches would pass NIST or NSA criteria.
The experimental ways
Exotic and experimental cryptography such as Homomorphic encryption or order preserving encryption has a lot of academic interest. The ultimate goal is to permit certain operations (like searching) over encrypted data without loss of privacy or integrity. However order preserving encryption has been proven to leak data. Homomorphic encryption hasn’t proven very strong either. Plus, depending on which expert you talk to, it’s about a billion to a million times slower than today’s encryption systems. Commercial feasibility, if ever, is projected to be about 20-30 years away!
The point is that there is no need to risk your valuable data on unproven, experimental encryption algorithms. You get a false sense of security, end up wasting your security budget and get distracted from real solutions.
The solution
Short version
If you’re in a hurry, just know that Crypteron users just have to put [Secure(Opt.Search)]
(in C#) or @Secure(opts = Opt.SEARCH)
(in Java) in front of their search fields. Crypteron takes care of everything behind the scenes. Here are actual examples showing it in action.
C# Example
// Attributes on data class
public class Patient
{
public int Id {get; set;}
[Secure]
public string FullName {get; set;}
[Secure(Opt.Search)]
public string SocialSecurityNumber {get; set;}
}
// To search for SSN 123-456-7890,
// generate a search prefix
var searchPrefix =
SecureSearch.GetPrefix("123-456-7890");
// Use the search prefix in a query
var foundPatient = secDb.Patients.Where(p =>
p.SocialSecurityNumber.StartsWith(searchPrefix)
)
Java Example
// Annotations on data class
public class Patient
{
private int Id;
@Secure
private String fullName;
@Secure(opts = Opt.SEARCH)
private String socialSecurityNumber;
}
// To search for SSN 123-456-7890,
// generate a search prefix
final String searchPrefix =
SecureSearch.getPrefix("123-456-7890");
// Use the search prefix in a query:
final TypedQuery<Patient> query =
entityManager.createQuery("SELECT p FROM Patient p
where p.socialSecurityNumber LIKE :searchPrefix", Patient.class);
query.setParameter("searchPrefix", searchPrefix + "%");
final Patient foundPatient = query.getSingleResult();
Long version – behind the scenes orchestration
Behind the scenes, Crypteron is generating an in-place, cryptographically secure, distributed search index. This happens as each piece of data is added and is constructed on-the-fly on a per searchable column/field basis. The distributed search index uses a HMAC-SHA256 primitive and the HMAC cryptographic keys are entirely separate from the data encryption keys. This encrypted search index is distributed across all searchable fields and it’s storage adds about 33 bytes to each searchable field. The run-time performance impact is negligible, almost the same as non-searchable fields. Of course, you are shielded from all the complexities – the platform orchestrates it auto-magically behind the scenes. When searching the database, Crypteron’s SDK provides an API that returns a search token. You pass this search token to the database to perform a native query, all without decrypting any data! So if you’re searching for a “Maria” in your database – you immediately get it back at native lookup speeds.
All under warranty
What’s great is that all other Crypteron features continue to work just fine. This means, your actual data is encrypted with AES, in GCM mode (super strong) and uses unique, cryptorandom IVs. You also get both self-integrity and tamper protection. Note that tamper protection is subtly distinct from self-integrity. Integrity means than an attacker cannot modify encrypted data (e.g. intern’s salary) without an alarm going off. Tamper protection ensures that one cannot replace one perfectly fine encrypted value with another (e.g. replace intern’s encrypted salary with CEOs encrypted salary) without an alarm going off. Crypteron effortlessly gives you both.
Advanced searches
Wildcards (e.g. “Mar*”)
The above is great when handling exact matches like for example “Maria”. But what about other other search patterns? Example, “Mar*”? The general idea is to first list your search requirements. Then build specific search indices for each as an optimization. This may sound complex, but it’s really simple. Let’s illustrate with an example. We’ll use C# syntax, Java is similar.
Business requirement: Must be able to search by the first three letters of a customer’s first name (Example: “Mar*”)
Steps:
- Create another field, say,
FirstNameFilter
. This will only contain the first three letters, in lower-case, of the customer’s first name. So while theFirstName
may contain “Maria”,FirstNameFilter
will contain “mar”. As you’ll see, the lower case trick increases the versatility of this approach. - Mark the
FirstNameFilter
field as Secure-Searchable. i.e.[Secure(Opt.Search)]
. Note thatFirstName
itself could be marked as[Secure]
or[Secure(Opt.Search)]
. The latter adds an exact search use case if you have one. - Pass the first 3 letter you receive to the Crypteron
GetPrefix()
SDK/agent library to get the search token - Issue the query as usual to the database using that search token
This way you’ll get all customers like Mary, mary, Martha, maRTHa, Margaret, mariA, marie, Marilyn etc at native database search speeds.
Extend this pattern if you have similar search requirements on other fields. For example: Search via first 3 characters of last name or last 4 of social security number.
Fuzzy searches (e.g. “1-2-3”, “12 3”, “123”)
What about fuzzy searches? For example, a US formatted phone number where (123) 456-7890, 1234567890 and 123 456 7890 all really mean the same thing.
You guessed it – create a special search index. Except now you pre-process the string to strip the non-digit characters. The same approach for dates like 12/31/2012, 12-31-2012 or 12.31.2012 and so on.
Full wildcard searches (e.g. “*4567*”)
What if you want to run fully generalized wildcard searches over strongly encrypted data? We first suggest architects to introspect because often times the business case is more pragmatically solved by the simpler, safer approaches. But if you absolutely need full wildcard searches, the flexibility of Crypteron’s “run-anywhere” agents enables. On SQL Server this means you can run the C# version of the Crypteron agents inside SQL Server as a SQL User Defined Function and you can perform wildcard searches quickly over encrypted data! Please see github.com/crypteron/crypteron-sql-clr-demo for details.
Conclusion
There you have it, searchable encryption over your encrypted data at native search speeds. The above scenarios should cover the vast majority of business requirements. All without compromising the security of your data via broken or experimental cryptography.
If you have any questions, comments or concerns, please do not hesitate to drop us a line at support@crypteron.com. We’d love to solve your data security challenges.