What sizes does Gemma 4 come in?

Four sizes: E2B and E4B for mobile/edge, 26B Mixture of Experts, and 31B Dense. The E-series models are designed for phones and IoT devices. The 26B and 31B models target workstations and GPUs.

How does Gemma 4 compare to other open models?

The 31B Dense model ranks #3 on the Arena AI text leaderboard. The 26B MoE ranks #6. Both outperform models up to 20x their size.

Can Gemma 4 run on a consumer GPU?

Yes. The unquantized 31B model fits on a single 80GB H100. Quantized versions run on consumer GPUs. The E2B and E4B models run on phones and Raspberry Pi with zero internet required.

Is Gemma 4 truly open source?

Yes. Unlike previous Gemma releases with more restrictive terms, Gemma 4 uses the Apache 2.0 license. No usage restrictions, full commercial rights, complete control over data and infrastructure.

What modalities does Gemma 4 support?

All four models handle text, images, and video. The E2B and E4B edge models also support native audio input for speech recognition and understanding.

Gemma 4 Runs on Your Hardware. That's the Point

Google released Gemma 4 under Apache 2.0. The 31B model ranks #3 globally, the 26B MoE activates just 3.8B params, and the smallest variants run on phones.

TL;DR: Google released Gemma 4 under Apache 2.0 — four models from phone-sized to frontier-class. The 31B Dense ranks #3 among all open models globally. The 26B MoE activates only 3.8B parameters per token, outcompeting models 20x its size. The smallest variants run offline on phones and Raspberry Pi with native audio, video, and image understanding. 256K context. 140+ languages. Built from the same tech as Gemini 3, then given away.

What is Gemma 4?

Gemma 4 is Google DeepMind's latest open model family, released April 2, 2026. ^[1] Built from the same research that powers Gemini 3 (Google's proprietary model), it ships in four sizes under Apache 2.0.

The headline: the 31B Dense model ranks #3 on the Arena AI text leaderboard. The 26B MoE sits at #6. Both outperform models up to 20x larger. ^[1]

This isn't a research preview. The Gemma family already has 400 million+ downloads and 100,000+ community fine-tuned variants. ^[1] Gemma 4 is the next generation of that ecosystem.

Gemma 4 models plotted on Arena AI's Elo Score vs Total Model Size chart — both the 31B and 26B sit in the top-left corner, matching or beating models 20x their size

What are the four models?

31B Dense — the largest and most capable. Maximum quality, best foundation for fine-tuning. Unquantized bfloat16 weights fit on a single 80GB NVIDIA H100. Quantized versions run on consumer GPUs. ^[1]

26B Mixture of Experts (MoE) — optimized for speed. It has 26 billion total parameters but activates only 3.8 billion during inference. That's the efficiency trick: frontier-level reasoning at a fraction of the compute. ^[1]

E4B and E2B — engineered for mobile, IoT, and edge devices. These activate an effective 4 billion and 2 billion parameter footprint during inference. They run completely offline on phones, Raspberry Pi, and NVIDIA Jetson Orin Nano with near-zero latency. ^[1]

All four models support 256K context (larger models) or 128K context (edge models), native video and image processing, and are trained on 140+ languages. The edge models also handle native audio input. ^[1]

Why does the Apache 2.0 license matter?

Previous Gemma releases used Google's custom license with usage restrictions. Gemma 4 switches to Apache 2.0 — the same license used by projects like Kubernetes and Apache Kafka. ^[1]

That means: full commercial rights, no usage restrictions, complete control over your data, infrastructure, and model weights. Fine-tune it. Deploy it anywhere. No phone calls to Google's legal team.

Clément Delangue, CEO of Hugging Face, called it "a huge milestone." ^[1] It's hard to disagree. Google went from "open weights with asterisks" to actual open source.

What's built for agents?

Gemma 4 isn't positioned as a chatbot. The design is explicitly agentic: ^[1]

Function calling — models can invoke tools and APIs natively
Structured JSON output — reliable machine-readable responses
Native system instructions — persistent behavior configuration
Code generation — designed to power local-first AI coding assistants

This is the pattern the industry is converging on. Models that can plan, call tools, check results, and iterate — not just generate text. Gemma 4 bakes that into the architecture rather than bolting it on.

Where can you run it?

The range is deliberately wide: ^[1]

Google AI Studio — instant access to 31B and 26B
Google AI Edge Gallery — E4B and E2B for mobile development
Hugging Face, Ollama, vLLM, llama.cpp, MLX, LM Studio — day-one support
NVIDIA NIM and NeMo — enterprise deployment
Google Cloud (Vertex AI, Cloud Run, GKE) — production scale
Android Studio Agent Mode — direct integration for app developers

Quantized versions mean the 31B model runs on hardware most developers already own. The edge models run on hardware most people carry in their pockets.

What does this mean?

The open model landscape just got more competitive. Google is shipping Gemini 3-class technology as a free, Apache 2.0 download that runs on a consumer GPU.

For developers building AI products, the practical implication is clear: frontier-level reasoning is no longer locked behind API calls. You can fine-tune Gemma 4 on your data, run it on your infrastructure, and ship it in your product — with no usage restrictions and no per-token costs.

The constraint now isn't access. It's imagination.

Key takeaways

31B Dense = #3 open model globally, 26B MoE = #6. Both outcompete models 20x their size.
26B MoE activates just 3.8B parameters per token — frontier intelligence at a fraction of the compute.
E2B and E4B run on phones and Raspberry Pi — offline, with audio, video, and image understanding.
Apache 2.0 license — full open source, no restrictions. A first for Google's Gemma family.
Agentic by design — function calling, structured JSON, system instructions baked in.
256K context on larger models, 128K on edge. 140+ languages.

I break down things like this on LinkedIn, X, and Instagram — usually shorter, sometimes as carousels. If this was useful, you'd probably like those too.

Gemma 4: Byte for byte, the most capable open models — Google Blog [↩] [↩ ^[2]] [↩ ^[3]] [↩ ^[4]] [↩ ^[5]] [↩ ^[6]] [↩ ^[7]] [↩ ^[8]] [↩ ^[9]] [↩ ^[10]] [↩ ^[11]]

Gemma 4: Google's Most Capable Open Model Runs on Your Hardware

What is Gemma 4?

What are the four models?

Why does the Apache 2.0 license matter?

What's built for agents?

Where can you run it?

What does this mean?

Key takeaways

Related

Gemma 4: Google's Most Capable Open Model Runs on Your Hardware

What is Gemma 4?

What are the four models?

Why does the Apache 2.0 license matter?

What's built for agents?

Where can you run it?

What does this mean?

Key takeaways

Footnotes

Related