The Sequence Chat: Gavin Uhma – CTO, Cape Privacy on Security Guardrails for LLMs
Was this email forwarded to you? Sign up here The Sequence Chat: Gavin Uhma – CTO, Cape Privacy on Security Guardrails for LLMsA chat about security and privacy in LLM interactions.Quick bio
I’m Gavin Uhma, co-founder and CTO of Cape Privacy. I started Cape Privacy in 2018 based on the belief that data privacy would be one of the biggest barriers to the adoption of AI. Since then, we’ve worked with a number of privacy and security technologies and various forms of ML. 🛠 ML Work
Cape keeps your sensitive data private while using LLMs. We redact sensitive personal, financial, and health data from your prompts. The explosion of interest in ChatGPT has brought privacy to the forefront. There have been many stories like employees entering financial data or proprietary code, and there have been credential breaches that have exposed ChatGPT history. It’s unfortunate when companies ban ChatGPT because it’s such a great productivity tool. For this, we’ve built CapeChat which is a privacy-focused app on top of the ChatGPT API. Developers meanwhile have struggled with how to protect their end-user’s data while building LLM based features. CapeChat is based on our Cape API which makes it easy for developers to protect the sensitive data in their prompts before sending them to an LLM provider like OpenAI or Anthropic.
A secure enclave enables “confidential compute” which means it keeps data confidential while it is being computed, even from system admins and cloud providers. The Cape API runs entirely within a secure enclave, so no human can see what’s being processed (including the humans at Cape and AWS). We use secure enclaves for inference, embeddings, vector search, de-identification and even custom user-defined models and python functions.
The best thing you can do for privacy is to bring the model local to the data. Since GPT-4 is proprietary and behind an API that hasn’t been an option. In addition, many developers appreciate the ease-of-use and pay-per-use of an API. At Cape Privacy we thread the needle by giving you a way to protect sensitive data while using the best LLMs at companies like OpenAI or Anthropic.
When we de-identify data and send it to GPT-4, the LLM has no ability to see the sensitive entities like PHI, PCI, and PHI. For example, imagine a prompt “Hi, my name is [FIRST_NAME] [LAST_NAME] and I live in [CITY]. Where do I live?” The LLM can not rely on trained knowledge about “[CITY]” but it can answer “You live in [CITY]”. When you re-identify you convert the placeholder back to the original sensitive value: “You live in Halifax”. So interestingly an LLM still works even when the data is redacted. But what are some of the less obvious implications of redacting data for an LLM?
If you ask the LLM something like “Who is [NAME]” it will say it doesn’t know, which for many people is a feature rather than a limitation. I believe many people have gotten away from using LLMs for facts, and would rather rely on an LLM to repurpose the facts that they provide. The majority of the use cases we see at Cape are developers using their own databases or documents to provide the LLM with context.
We run the entire vector store and embeddings model in a secure enclave so developers can create embeddings and perform similarity searches, entirely confidentially. This enables you to construct prompts from large datasets while keeping the data private. We can’t see your data, and we prevent third-party providers from seeing it. 💥 Miscellaneous – a set of rapid-fire questions
Tough question! I find quantization interesting, or in general, ways of making models more efficient.
Security and privacy should be a feature of every LLM platform, however, it is complex enough to justify standalone companies. As a simple example, companies want a single platform to manage data privacy if they are working with multiple LLM providers.
SMPC enables incredible opportunities such as co-training an LLM across companies. The resulting model can have a cryptographically shared ownership, while keeping the underlying datasets and model weights private. The downside of SMPC is the massive operational complexity and performance overhead, which most use-cases do not justify. But SMPC is legitimate, and it is deployed in the real-world today.
I generally find jailbreaking really interesting. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 313: Multimodal Chain-of-Thought Reasoning
Monday, July 31, 2023
In this Issue: Multimodal chain-of-thought(CoT) reasoning. Amazon's original multimoda CoT paper. The Open Assistant framework. 💡 ML Concept of the Day: Multimodal Chain-of-Thought Reasoning One of
💡Whitepaper: Training Data for ML Models—A Deep Dive
Monday, July 31, 2023
Struggling to get the right training data for your ML models? This whitepaper breaks down the common challenges faced by ML teams, including accessing the right training datasets, the time-travel
More Foundation Models from Stability AI
Sunday, July 30, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
📍 Webinar: Emerging architectures for LLM applications 🤖🧠🏗️
Friday, July 28, 2023
The LLM stack is quickly evolving, and if one thing is already clear to us, is that it's a different breed from what we've seen and known before. Join us on August 8th at 2:00 PM EST for a live
Textbooks are All You Need: How Microsoft's Phi-1 Outperformed Larger Code Language Models
Thursday, July 27, 2023
The secret was in the quality of the fine-tune dataset
You Might Also Like
Weekend Reading — More time to write
Sunday, November 24, 2024
More Time to Write A fully functional clock that ticks backwards, giving you more time to write. Tech Stuff Martijn Faassen (FWIW I don't know how to use any debugger other than console.log) People
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video
Daily Coding Problem: Problem #1617 [Easy]
Saturday, November 23, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. You are given an string representing the initial conditions of some dominoes.
Ranked | The Tallest and Shortest Countries, by Average Height 📏
Saturday, November 23, 2024
These two maps compare the world's tallest countries, and the world's shortest countries, by average height. View Online | Subscribe | Download Our App TIME IS RUNNING OUT There's just 3
⚙️ Your own Personal AI Agent, for Everything
Saturday, November 23, 2024
November 23, 2024 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Convergence. Imagine if you had a digital
Educational Byte: Are Privacy Coins Like Monero and Zcash Legal?
Saturday, November 23, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 23, 2024? The HackerNoon
🐍 New Python tutorials on Real Python
Saturday, November 23, 2024
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Black Friday Giveaway @ Real Python This Black
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for