Google Dropped Two Small Models Last Night—4x Faster Than GPT-5

Published on: 2026-05-20

Google Dropped Two "Small Models" Last Night—4x Faster Than GPT-5

Google I/O 2026 opened with two bombshells: Gemini 3.5 Flash—output speed 4x faster than GPT-5; Gemini Omni—any input, any output, including direct video generation. In one night, Google proved one thing: in the model race, parameter count matters less than raw speed.


Body Image

Gemini 3.5 Flash: How Does a "Small Model" Beat GPT-5 by 4x?

You might think "Flash" means a stripped-down version. But Google played the reverse logic—not cutting features for speed, but using new architecture to make small models perform at large model levels.

Key specs of Gemini 3.5 Flash:

Metric Gemini 3.5 Flash Comparison
Output Speed 4x faster than GPT-5/Claude Zero-latency real-time conversation
Reasoning Matches full-size Gemini Small size ≠ Low intelligence
Context Window 1 million tokens Fits 10 books
Multi-modal Input Text/Image/Audio/Video Full format support

Four-word summary: Fast and powerful.


Gemini Omni: AI That Outputs Video Is Here

If 3.5 Flash is "fast," Omni is "all-powerful."

Omni's definition: Any input → Any output. Give it text → it returns voice. Give it an image → it returns video. Give it a PDF → it returns a filled spreadsheet.

Most explosive capability: Video output.

"Generate a 15-second tutorial video teaching users how to translate menus with Google Lens"—Omni directly produces a complete video with subtitles and voiceover.

This means AI creation has crossed from the "text-image era" into the "video era." Short video creators now have a new competitor—one that doesn't need to sleep.


900 Million MAU: The Signal That AI User Scale Is Crossing the Critical Point

Google also disclosed Gemini user metrics:

  • Monthly active users exceeded 900 million—comparable to WeChat's overseas user base
  • Paid subscriptions up 300%
  • Enterprise customer onboarding doubled

The significance of this data point is no less than the two new model releases. It signals that AI has evolved from a "novelty tool" to "daily infrastructure." When a product's MAU reaches 900 million, it's no longer a niche experiment—it's a universal standard.


Why This Is Google's Counter-Offensive?

For the past six months, AI circle buzz has centered on OpenAI (GPT-5.5) and Anthropic (Claude Code). Despite holding Gemini, Google's presence has been suppressed.

Google I/O 2026's signal is clear:

I'm not competing with you on benchmarks. I'm competing on: who's faster, who's more versatile, who has more users.

These three dimensions—speed, modality coverage, user scale—are precisely Google's natural advantages. Search has traffic entry points, Android has device entry points, YouTube has content entry points. Embedding AI into every entry point is Google's "dimensionality reduction" strike.


Direct Impact on Regular Users

Benefit 1: Free User Treatment Improved

Gemini Flash-class models are free for everyone—unlike certain competitors who lock good models behind paywalls.

Benefit 2: AI Interaction Barrier Keeps Dropping

Omni's "any input → any output" means you no longer need Prompt engineering skills—give it an image, a voice memo, a file, and it understands what you want.

Benefit 3: AI Becomes "Affordable"

Flash series' core logic is "lightweight but sufficient"—achieving usable AI capability with less compute cost. This aligns perfectly with Kaihe A1's logic: not "the more expensive the better," but "good enough is enough."


One-sentence summary: Google I/O 2026 didn't release a "better" model—it released an AI paradigm that's "faster, more versatile, and more accessible to more people." Speed matters more than precision—because no one wants to wait 5 seconds for AI to respond.


AI Frontier column tracks global LLM dynamics. Follow us to understand every AI wave's direction.

© KAIHE AI - Agent Computer Specialist