A deep dive on AI model distillation attacks

Presented by

James Wilson
James Wilson

Enterprise Technology Editor

In this solo episode of Risky Business Features James Wilson explores how distillation techniques are both a legitimate way to train smaller models, as well as a way to steal model capabilities. It’s not just a problem for frontier labs! Any LLM-based product could have its competitive advantage stolen through these attacks.

James covers:

  • High-level concept of distillation
  • Why it matters including close/open-weight/open-source explanation
  • Types of distillation and the prompts used
  • The distillation pipeline end to end
  • Distillation at scale and mitigation techniques
  • Hardware resource constraints for distillation
A deep dive on AI model distillation attacks
0:00 / 72:08

Show notes

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Alpaca: A Strong, Replicable Instruction-Following Model

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Zephyr: Direct Distillation of LM Alignment

Stealing Part of a Production Language Model

Microsoft probes if DeepSeek-linked group improperly obtained OpenAI data, Bloomberg News reports

Detecting and preventing distillation attacks