SuperGemma4-31B-Abliterated: The most powerful compact local LLM of 2026 – uncensored, lightweight and truly smart
Main chat
A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.
XX
In April 2026, with the local model market crowded with abliterized versions of Gemma, Llama and Qwen, one release instantly blew the community apart. SuperGemma4-31B-Abliterated by Jiunsong (known as @songjunkr) is not just another uncensored version of Google Gemma-4-31B-it. It is a deeply redesigned model that takes the basic Gemma 4 31B, completely removes censorship, eliminates inherent weaknesses and optimizes calculations to the point that the gain in benchmarks reaches +8.5% in the overall standings and +119% on BenchTok.
The model was released in recent days (the announcement in Telegram and X coincides with the user’s post), and is already positioned as “what we all wanted from a local LLM”: completely uncensored, easy to launch, incredibly smart and at the same time compact by the standards of the 31B class. Available in two convenient formats – MLX 4-bit (natively for Apple Silicon) and GGUF 4-bit (for everyone else).
What is SuperGemma4-31B-Abliterated and what is it built on
The database is the official google/gemma-4-31B-it (dense model of 31 billion parameters released by Google in early 2026). Jiunsong took it, applied advanced abliteration, and then did aggressive post-optimization:
- The inherent weaknesses of the model are eliminated.
- Removed inefficient stages of calculations.
- Removed unnecessary duplicate data in the weights.
The result is a model that feels completely uncensored, significantly more useful and fun in everyday use. The author specifically emphasizes: this is not a crude ablite, but a “heavily upgraded” version, sharpened under real scenarios of local launch.
Due to the limitations of iron, the author was unable to make a dense bf16 version, so optimized 4-bit quanta were immediately released. In practice, 4-bit gives a great balance of quality and speed.
Benchmark: Real growth, not marketing
Here is the comparison, which is given in the official announcement (31b-it-base vs SuperGemma4-31B):
| Метрика | 31b-it-base | SuperGemma4 31B | Прирост |
|---|---|---|---|
| Total | 68.50 | 74.30 | +8.5% |
| Mini | 74.00 | 80.80 | +9.2% |
| Target | 60.60 | 65.10 | +7.4% |
| Reg | 50.00 | 66.70 | +33.4% |
| BenchTok | 12.18 | 26.70 | +119.2% |
Tested on MMLU, GPQA, IFEval and other key benchmarks. The growth in Reg (+33.4%) and BenchTok (+119%) is particularly impressive, which means that the model is noticeably better at regular tasks and token generation (speed/quality of output).
According to early testers, SuperGemma4 feels “sharp” than the basic Gemma 4 in coding, tool-use, long context and creative tasks. At the same time, there are almost no failures - "0/100 refusals" in similar 26B versions of the line.
Key advantages over other local 31B models
** Complete freedom (uncensored)** Abliteration is carried out qualitatively: the model responds to any requests without moralizing and refusals, but retains consistency and usefulness.
*Compact and speed * 4-bit quanta allow you to run 31B models even on relatively modest hardware (16-24 GB of VRAM for GGUF Q4). On Apple Silicon (M2/M3/M4/Max), the MLX version flies natively.
** Living behavior** The author specifically optimized for everyday use: better tool-calling, less tokenizer jank, more natural and useful answers.
Multimodality of the base Gemma 4 is saved (vision, image-text, etc. in the original; in SuperGemma4 the emphasis on text, but the basis of multimodal).
** Openness* * MIT-like Gemma license, full openness on Hugging Face.
How to run SuperGemma4-31B-Abliterated
Option 1: Apple Silicon (MLX 4-bit) is the easiest
# Install mlx-lm if not already
pip install mlx-lm
# Launch
python -m mlx lm.generate --model Jiunsong/SuperGemma4-31b-abliterated-mlx-4bit --prompt "Your request is here."
Or through LM Studio/Ollama-like wrappers with MLX support.
Option 2: GGUF 4-bit (Windows, Linux, macOS, even weak iron) ** Download the sibling repository: Jiunsong/SuperGemma4-31b-abliterated-GGUF (Q4 K M and other quanta)
Launch via llama.cpp:
./llama-cli -m SuperGemma4-31b-abliterated-Q4 K M.gguf -p "Your Prompt" -n 512 --temp 0.7
Or through Ollama / LM Studio / SillyTavern / KoboldCpp – the model is already appearing in the community.
Iron recommendations:
- 24GB VRAM → Comfortable Q4/Q5.
- 16GB → Q4 K M is on edge, but it works.
- Apple M4 Max → MLX version shows the best speeds.
Nuances, Limitations and Fair Cons
Plus:
Real quality improvement over base.
A great choice for those who are tired of censorship in Claude/GPT and want a local flagship.
Jiunsong already has a strong SuperGemma 26B lineup.
Minuses and nuances:
Small bugs and instability are possible (the author directly writes: “Maybe a little unstable”).
There is no dense bf16, only quants (but for 99% of users it is a plus).
Quality depends heavily on prompting: like all Gemma, it likes clear instructions.
There are fewer community-fine-tunes than the Llama-3/4.
Why is it important right now
2026 is a watershed year for local AI. Cloud models are becoming more expensive and censored, and hardware (especially Apple Silicon and RTX 50 Series) is already easily pulling 30+ billion parameters. SuperGemma4-31B-Abliterated is a prime example of how the open-source community is taking Google’s top model and turning it into what it was supposed to be: a powerful, free and affordable tool.
Whether you’re designing, researching, writing code, creating content, or just want a personal AI without subscriptions or restrictions, this is one of the best models of spring 2026.
References for download and discussion:
- MLX 4-bit: huggingface.co/Jiunsong/SuperGemma4-31b-abliterated-mlx-4bit
- GGUF: huggingface.co/Jiunsong/SuperGemma4-31b-abliterated-GGUF
- Author's announcement: Follow @songjunkr in X.