~/wiki / github / heretic-automatic-censorship-removal-llm

Heretic: Fully automatic removal of censorship from language models

◷ 4 min read 5/4/2026

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

$ cd section/ $ join vibe dev

In the world of local LLMs, censorship and safe alignment are often a major constraint. Many models refuse to respond to “sensitive” requests, even if they are legitimate.

Heretic is a powerful open-source tool from a p-e-w developer that automatically removes censorship from transformer models without expensive training. It became one of the most popular solutions in the LocalLLaMA community in 2025-2026.

How does Heretic work?

Heretic uses an advanced technique called directional ablation (also known as abliteration), based on research by Arditi et al. (2024) and subsequent work.

The key innovation is Automatic parameter optimization using TPE (Tree-structured Parzen Estimator) through the *Optuna library. The tool simultaneously minimizes:

  • Number of failures (refusals) on malicious prompts
  • KL divergence from the original model (to preserve as many original abilities as possible)

This makes the process completely automatic – no manual tuning or deep understanding of transformer architecture is required.

Benefits of Heretic

  • *Automaticity – started the team and got the result.
  • High quality - often outperforms manual abliteration for maintaining model intelligence at the same level of censorship removal.
  • Support for a wide range of models** - most of the dense models, many MoEs, multimodal and even hybrid (e.g. Qwen3.5).
  • Efficiency - support for 4-bit quantization (bitsandbytes) to work on graphics cards with a small volume of VRAM.
  • Research capabilities - visualization of residual vectors, PaCMAP projections, geometric analysis.

Example of comparison (Gemma-3-12B-IT):

Модель Отказы на вредных промптах KL-дивергенция (ниже = лучше)
Оригинал 97/100 0
Ручные abliteration (лучшие) 3/100 0.45–1.04
Heretic 3/100 0.16

Heretic achieves the same de-censorship, but with significantly less damage to the model’s abilities.

Installation and use

bash
pip install -U heretic-llm
heretic Qwen/Qwen3-4B-Instruct-2507

Or for any other model:

bash
heretic meta-llama/Llama-3.1-8B-Instruct

Once the process is complete, you can:

  • Save the model locally
  • Download to Hugging Face
  • Chat test
  • Launch benchmarks

Running time: ~45 minutes on Llama-3.1-8B on RTX 3090 (with quantization - faster and easier in memory).

Additional opportunities

  • Configuration files (config.default.toml, config.noslop.toml)
  • Research mode (heretic-llm[research])
  • Generation of residual vectors and animated GIFs
  • Detailed geometric analytics by model layers
  • Built-in quality assessment

Who's good for Heretic?

  • Local LLM enthusiasts who want maximum freedom from censorship
  • Researchers of interpretability of models
  • Developers who need “undisguised” models for specific tasks
  • Anyone who is tired of the constant failure of ChatGPT-like interfaces

Conclusion

Heretic is one of the most elegant and effective tools in the open-source LLM ecosystem today. It democratizes a complex abliteration technique, making it accessible to anyone with minimal technical skills.

Through automatic optimization, Heretic produces models that often outperform manual variants in terms of freedom vs retention of intelligence. More than 3,000 models created by the community using Heretic are now available on Hugging Face.

Reference to repository: https://github.com/p-e-w/heretic

$ cd ../ ← back to GitHub