Heretic: Fully automatic removal of censorship from language models

◷ 4 min read 5/4/2026 by: Alexey, VibeCode

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

Heretic: Fully automatic removal of censorship from language models - обложка

In the world of local LLMs, censorship and safe alignment are often a major constraint. Many models refuse to respond to “sensitive” requests, even if they are legitimate.

Heretic is a powerful open-source tool from a p-e-w developer that automatically removes censorship from transformer models without expensive training. It became one of the most popular solutions in the LocalLLaMA community in 2025-2026.

How does Heretic work?

Heretic uses an advanced technique called directional ablation (also known as abliteration), based on research by Arditi et al. (2024) and subsequent work.

The key innovation is Automatic parameter optimization using TPE (Tree-structured Parzen Estimator) through the *Optuna library. The tool simultaneously minimizes:

Number of failures (refusals) on malicious prompts
KL divergence from the original model (to preserve as many original abilities as possible)

This makes the process completely automatic – no manual tuning or deep understanding of transformer architecture is required.

Benefits of Heretic

*Automaticity – started the team and got the result.
High quality - often outperforms manual abliteration for maintaining model intelligence at the same level of censorship removal.
Support for a wide range of models** - most of the dense models, many MoEs, multimodal and even hybrid (e.g. Qwen3.5).
Efficiency - support for 4-bit quantization (bitsandbytes) to work on graphics cards with a small volume of VRAM.
Research capabilities - visualization of residual vectors, PaCMAP projections, geometric analysis.

Example of comparison (Gemma-3-12B-IT):

Модель	Отказы на вредных промптах	KL-дивергенция (ниже = лучше)
Оригинал	97/100	0
Ручные abliteration (лучшие)	3/100	0.45–1.04
Heretic	3/100	0.16

Heretic achieves the same de-censorship, but with significantly less damage to the model’s abilities.

Installation and use

bash

pip install -U heretic-llm
heretic Qwen/Qwen3-4B-Instruct-2507

Or for any other model:

bash

heretic meta-llama/Llama-3.1-8B-Instruct

Once the process is complete, you can:

Save the model locally
Download to Hugging Face
Chat test
Launch benchmarks

Running time: ~45 minutes on Llama-3.1-8B on RTX 3090 (with quantization - faster and easier in memory).

Additional opportunities

Configuration files (config.default.toml, config.noslop.toml)
Research mode (heretic-llm[research])
Generation of residual vectors and animated GIFs
Detailed geometric analytics by model layers
Built-in quality assessment

Who's good for Heretic?

Local LLM enthusiasts who want maximum freedom from censorship
Researchers of interpretability of models
Developers who need “undisguised” models for specific tasks
Anyone who is tired of the constant failure of ChatGPT-like interfaces

Conclusion

Heretic is one of the most elegant and effective tools in the open-source LLM ecosystem today. It democratizes a complex abliteration technique, making it accessible to anyone with minimal technical skills.

Through automatic optimization, Heretic produces models that often outperform manual variants in terms of freedom vs retention of intelligence. More than 3,000 models created by the community using Heretic are now available on Hugging Face.

Reference to repository: https://github.com/p-e-w/heretic