SuperGemma4-31B-Abliterated: The most powerful compact local LLM of 2026 – uncensored, lightweight and truly smart

◷ 6 min read 4/15/2026 by: Alexey, VibeCode

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

SuperGemma4-31B-Abliterated: The most powerful compact local LLM of 2026 – uncensored, lightweight and truly smart - обложка

SuperGemma4-31B-Abliterated XX

In April 2026, with the local model market crowded with abliterized versions of Gemma, Llama and Qwen, one release instantly blew the community apart. SuperGemma4-31B-Abliterated by Jiunsong (known as @songjunkr) is not just another uncensored version of Google Gemma-4-31B-it. It is a deeply redesigned model that takes the basic Gemma 4 31B, completely removes censorship, eliminates inherent weaknesses and optimizes calculations to the point that the gain in benchmarks reaches +8.5% in the overall standings and +119% on BenchTok.

The model was released in recent days (the announcement in Telegram and X coincides with the user’s post), and is already positioned as “what we all wanted from a local LLM”: completely uncensored, easy to launch, incredibly smart and at the same time compact by the standards of the 31B class. Available in two convenient formats – MLX 4-bit (natively for Apple Silicon) and GGUF 4-bit (for everyone else).

What is SuperGemma4-31B-Abliterated and what is it built on

The database is the official google/gemma-4-31B-it (dense model of 31 billion parameters released by Google in early 2026). Jiunsong took it, applied advanced abliteration, and then did aggressive post-optimization:

The inherent weaknesses of the model are eliminated.
Removed inefficient stages of calculations.
Removed unnecessary duplicate data in the weights.

The result is a model that feels completely uncensored, significantly more useful and fun in everyday use. The author specifically emphasizes: this is not a crude ablite, but a “heavily upgraded” version, sharpened under real scenarios of local launch.

Due to the limitations of iron, the author was unable to make a dense bf16 version, so optimized 4-bit quanta were immediately released. In practice, 4-bit gives a great balance of quality and speed.

Benchmark: Real growth, not marketing

Here is the comparison, which is given in the official announcement (31b-it-base vs SuperGemma4-31B):

Метрика	31b-it-base	SuperGemma4 31B	Прирост
Total	68.50	74.30	+8.5%
Mini	74.00	80.80	+9.2%
Target	60.60	65.10	+7.4%
Reg	50.00	66.70	+33.4%
BenchTok	12.18	26.70	+119.2%

Tested on MMLU, GPQA, IFEval and other key benchmarks. The growth in Reg (+33.4%) and BenchTok (+119%) is particularly impressive, which means that the model is noticeably better at regular tasks and token generation (speed/quality of output).

According to early testers, SuperGemma4 feels “sharp” than the basic Gemma 4 in coding, tool-use, long context and creative tasks. At the same time, there are almost no failures - "0/100 refusals" in similar 26B versions of the line.

Key advantages over other local 31B models

** Complete freedom (uncensored)** Abliteration is carried out qualitatively: the model responds to any requests without moralizing and refusals, but retains consistency and usefulness.
*Compact and speed * 4-bit quanta allow you to run 31B models even on relatively modest hardware (16-24 GB of VRAM for GGUF Q4). On Apple Silicon (M2/M3/M4/Max), the MLX version flies natively.
** Living behavior** The author specifically optimized for everyday use: better tool-calling, less tokenizer jank, more natural and useful answers.
Multimodality of the base Gemma 4 is saved (vision, image-text, etc. in the original; in SuperGemma4 the emphasis on text, but the basis of multimodal).
** Openness* * MIT-like Gemma license, full openness on Hugging Face.

How to run SuperGemma4-31B-Abliterated

Option 1: Apple Silicon (MLX 4-bit) is the easiest

bash

# Install mlx-lm if not already
pip install mlx-lm

# Launch
python -m mlx lm.generate --model Jiunsong/SuperGemma4-31b-abliterated-mlx-4bit --prompt "Your request is here."

Or through LM Studio/Ollama-like wrappers with MLX support.

Option 2: GGUF 4-bit (Windows, Linux, macOS, even weak iron) ** Download the sibling repository: Jiunsong/SuperGemma4-31b-abliterated-GGUF (Q4 K M and other quanta)

Launch via llama.cpp:

bash

./llama-cli -m SuperGemma4-31b-abliterated-Q4 K M.gguf -p "Your Prompt" -n 512 --temp 0.7

Or through Ollama / LM Studio / SillyTavern / KoboldCpp – the model is already appearing in the community.

Iron recommendations:

24GB VRAM → Comfortable Q4/Q5.
16GB → Q4 K M is on edge, but it works.
Apple M4 Max → MLX version shows the best speeds.

Nuances, Limitations and Fair Cons

Plus:

Real quality improvement over base.
A great choice for those who are tired of censorship in Claude/GPT and want a local flagship.
Jiunsong already has a strong SuperGemma 26B lineup.
Minuses and nuances:
Small bugs and instability are possible (the author directly writes: “Maybe a little unstable”).
There is no dense bf16, only quants (but for 99% of users it is a plus).
Quality depends heavily on prompting: like all Gemma, it likes clear instructions.
There are fewer community-fine-tunes than the Llama-3/4.

Why is it important right now

2026 is a watershed year for local AI. Cloud models are becoming more expensive and censored, and hardware (especially Apple Silicon and RTX 50 Series) is already easily pulling 30+ billion parameters. SuperGemma4-31B-Abliterated is a prime example of how the open-source community is taking Google’s top model and turning it into what it was supposed to be: a powerful, free and affordable tool.

Whether you’re designing, researching, writing code, creating content, or just want a personal AI without subscriptions or restrictions, this is one of the best models of spring 2026.

References for download and discussion: