~/wiki / novosti / supergemma4-31b-abliterated-uncensored-local-llm

SuperGemma4-31B-Abliterated: The most powerful compact local LLM of 2026 – uncensored, lightweight and truly smart

◷ 6 min read 4/15/2026

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

$ cd section/ $ join vibe dev

SuperGemma4-31B-Abliterated XX

In April 2026, with the local model market crowded with abliterized versions of Gemma, Llama and Qwen, one release instantly blew the community apart. SuperGemma4-31B-Abliterated by Jiunsong (known as @songjunkr) is not just another uncensored version of Google Gemma-4-31B-it. It is a deeply redesigned model that takes the basic Gemma 4 31B, completely removes censorship, eliminates inherent weaknesses and optimizes calculations to the point that the gain in benchmarks reaches +8.5% in the overall standings and +119% on BenchTok.

The model was released in recent days (the announcement in Telegram and X coincides with the user’s post), and is already positioned as “what we all wanted from a local LLM”: completely uncensored, easy to launch, incredibly smart and at the same time compact by the standards of the 31B class. Available in two convenient formats – MLX 4-bit (natively for Apple Silicon) and GGUF 4-bit (for everyone else).

What is SuperGemma4-31B-Abliterated and what is it built on

The database is the official google/gemma-4-31B-it (dense model of 31 billion parameters released by Google in early 2026). Jiunsong took it, applied advanced abliteration, and then did aggressive post-optimization:

  • The inherent weaknesses of the model are eliminated.
  • Removed inefficient stages of calculations.
  • Removed unnecessary duplicate data in the weights.

The result is a model that feels completely uncensored, significantly more useful and fun in everyday use. The author specifically emphasizes: this is not a crude ablite, but a “heavily upgraded” version, sharpened under real scenarios of local launch.

Due to the limitations of iron, the author was unable to make a dense bf16 version, so optimized 4-bit quanta were immediately released. In practice, 4-bit gives a great balance of quality and speed.

Benchmark: Real growth, not marketing

Here is the comparison, which is given in the official announcement (31b-it-base vs SuperGemma4-31B):

Метрика 31b-it-base SuperGemma4 31B Прирост
Total 68.50 74.30 +8.5%
Mini 74.00 80.80 +9.2%
Target 60.60 65.10 +7.4%
Reg 50.00 66.70 +33.4%
BenchTok 12.18 26.70 +119.2%

Tested on MMLU, GPQA, IFEval and other key benchmarks. The growth in Reg (+33.4%) and BenchTok (+119%) is particularly impressive, which means that the model is noticeably better at regular tasks and token generation (speed/quality of output).

According to early testers, SuperGemma4 feels “sharp” than the basic Gemma 4 in coding, tool-use, long context and creative tasks. At the same time, there are almost no failures - "0/100 refusals" in similar 26B versions of the line.

Key advantages over other local 31B models

  1. ** Complete freedom (uncensored)** Abliteration is carried out qualitatively: the model responds to any requests without moralizing and refusals, but retains consistency and usefulness.

  2. *Compact and speed * 4-bit quanta allow you to run 31B models even on relatively modest hardware (16-24 GB of VRAM for GGUF Q4). On Apple Silicon (M2/M3/M4/Max), the MLX version flies natively.

  3. ** Living behavior** The author specifically optimized for everyday use: better tool-calling, less tokenizer jank, more natural and useful answers.

  4. Multimodality of the base Gemma 4 is saved (vision, image-text, etc. in the original; in SuperGemma4 the emphasis on text, but the basis of multimodal).

  5. ** Openness* * MIT-like Gemma license, full openness on Hugging Face.

How to run SuperGemma4-31B-Abliterated

Option 1: Apple Silicon (MLX 4-bit) is the easiest

bash
# Install mlx-lm if not already
pip install mlx-lm

# Launch
python -m mlx lm.generate --model Jiunsong/SuperGemma4-31b-abliterated-mlx-4bit --prompt "Your request is here."

Or through LM Studio/Ollama-like wrappers with MLX support.

Option 2: GGUF 4-bit (Windows, Linux, macOS, even weak iron) ** Download the sibling repository: Jiunsong/SuperGemma4-31b-abliterated-GGUF (Q4 K M and other quanta)

Launch via llama.cpp:

bash
./llama-cli -m SuperGemma4-31b-abliterated-Q4 K M.gguf -p "Your Prompt" -n 512 --temp 0.7

Or through Ollama / LM Studio / SillyTavern / KoboldCpp – the model is already appearing in the community.

Iron recommendations:

  • 24GB VRAM → Comfortable Q4/Q5.
  • 16GB → Q4 K M is on edge, but it works.
  • Apple M4 Max → MLX version shows the best speeds.

Nuances, Limitations and Fair Cons

Plus:

  • Real quality improvement over base.

  • A great choice for those who are tired of censorship in Claude/GPT and want a local flagship.

  • Jiunsong already has a strong SuperGemma 26B lineup.

  • Minuses and nuances:

  • Small bugs and instability are possible (the author directly writes: “Maybe a little unstable”).

  • There is no dense bf16, only quants (but for 99% of users it is a plus).

  • Quality depends heavily on prompting: like all Gemma, it likes clear instructions.

  • There are fewer community-fine-tunes than the Llama-3/4.

Why is it important right now

2026 is a watershed year for local AI. Cloud models are becoming more expensive and censored, and hardware (especially Apple Silicon and RTX 50 Series) is already easily pulling 30+ billion parameters. SuperGemma4-31B-Abliterated is a prime example of how the open-source community is taking Google’s top model and turning it into what it was supposed to be: a powerful, free and affordable tool.

Whether you’re designing, researching, writing code, creating content, or just want a personal AI without subscriptions or restrictions, this is one of the best models of spring 2026.

References for download and discussion:

$ cd ../ ← back to News