Introduction
Large language models (LLMs) inevitable acquire sensitive information during training—such as data that exposes personal privacy, subject to commercial license, or violates legal compliance. It’s required that LLMs learn to withhold such sensitive information before they can be deployed at scale. This is the research field called machine unlearning.
Our work investigates the family of scalable unlearning algorithms (i.e. those efficient enough to be deployed on billion-tokens models) and illustrate how they can’t guarantee forgetting: an unlearned model cannot interact with users as if it has never seen the sensitive data, as long as we don’t know how the model acquired such private inforamtion from the first place. This is because unlearning is path-dependent by nature: the order of which a model receives new information impacts how it forgets. If an unlearning algorithm does not take this into account, it’s shooting in the dark.
Desiderata
Consider a model \( \theta \) trained on dataset \( D = D_f \cup D_R\), which can bepartitioned into a forget set \( D_f \) and a retain set \( D_r \). The goal of an unlearning algorithm $\mathcal{U}$ is to remove the influence of the forget set from the model's predictions. The following desiderata drives research in unlearning.
Retrain Equivalence

Local Unlearning
Why is Retrain Equivalence Impossible?
Today's LLMs are trained in distinct stages, such as instruction tuning, alignment tuning, RL for reasoning capabilities, etc. At production, these training stages may further interleave each other. This is the source of unlearning impossibility: as long as we don't know how these training stages are ordered, local unlearning algorithms are doomed to fail.
We argue impossibility by showing that unlearning is path-dependent. The relative order between the forget set and other training stages impacts what is unlearned and how fast unlearning occurs.
If we feed two models trained on the same datasets but in different orders to the same unlearning algorithm, the behavioral difference between these models will diverge in a path-dependent way; therefore they can not both be similar to the retrained baseline.
Real Life Example

Experiment: LLM Post-Training Pipeline
We empirically demonstrate history-dependent nature of local unlearning algorithms through a four-stage finetuning pipeline that simulates today's LLM post-training workflow. The same base model is finetuned on identical datasets but in different orders, then the resulting finetuned models undergo a same unlearning procedure. We observe that the unlearning outcomes diverge in a path-dependent manner: the speed of forgetting and the spill-over effect on other capabilities are history-dependent.
Training stages
-
Sinst — Instruction TuningINSTRUCT-SKILLMIX • 4k pairs • 10 epochs
-
Stofu — Fictitious KnowledgeTOFU • 4k Q-A • 4 epochs
-
Smath — Math ReasoningGSM8K rewrites • 8k items • 2 epochs
-
SU — Safety/Refusal (Unlearn set)Synthetic refusals (“Sorry, I cannot assist…”) • 4.5k • 2 epochs
Takeaways
-
Different unlearning algorithms exhibit similar path-dependenceWe experimented with three local unlearning algorithms: gradient ascent, Negative Preference Optimization, and SimNPO. We observe that different algorithms exhibit similar path-dependence, even with the precense of reference-based regularization terms.
-
Recency Effect: Unlearning is hardest when the information is freshWhen there is no intermediate retained data between the learning and unlearning of the forget set, we find that it's consistently slower to decrease the log likelihood of the forget set targets. This is often accompanied by a smaller spill-over effect on held-out capabilities. The mechanism of this recency effect is worth further investigation, especially for the RL community.
-
Stage recency slows forgettingLater-stage information tends to persist longer, making recent stages harder to forget under local updates.

Shallow vs. Deep Forgetting
Definitions
- Shallow (superficial) forgetting: suppresses one phrasing of an undesired response, but paraphrases remain likely.
- Deep forgetting: reduces probability across paraphrases/semantically equivalent responses, not just the target string.
Design
Case study on Llama-3.2-3B. We curate 40 unsafe prompts; each has two compliance phrasings (C and U) and one refusal (R). Train under six learning permutations (θ₁…θ₆), then unlearn U via GA and track log-probabilities of R, C, U across epochs.
Outcome
Paths θ₁, θ₂, θ₅, θ₆ → shallow (only U drops). Paths θ₃, θ₄ → deep (both C and U drop below R). In short: the depth of forgetting is path-dependent.
Figure placeholder
Replace with your “Log probability evolution during unlearning” plot.
Artifacts
Grab the paper and code, and see how to reproduce key plots.
- Repro notes and ethics guidance in
docs/
of the repo. - Configs in
configs/
; run script undersrc/experiments/
.
Figure placeholder
Replace with a teaser plot (e.g., metric vs. unlearning steps for different paths).
Citation
@article{yu2025impossibility,
title={On the Impossibility of Retrain Equivalence in Machine Unlearning},
author={Yu, Jiatong and He, Yinghui and Goyal, Anirudh and Arora, Sanjeev},
journal={Preprint},
year={2025},
note={Code and materials: REPO_URL}
}
Acknowledgments & contact
Questions or feedback? Open an issue in the repository or reach out via your preferred channel.