Back to blog

Qwen3.7-Plus: Alibaba Takes on OpenAI and Anthropic With a Multimodal AI Agent That's 60% Cheaper

Qwen3.7-Plus: Alibaba Takes on OpenAI and Anthropic With a Multimodal AI Agent That's 60% Cheaper

Qwen3.7-Plus: Alibaba Takes on OpenAI and Anthropic With a Multimodal AI Agent That’s 60% Cheaper

On June 1, 2026, Alibaba Cloud launched Qwen3.7-Plus. This multimodal agent model doesn’t just understand text — it sees your screen, clicks buttons, executes code, and iterates autonomously until the task is done. All at 60% less than its own bigger sibling.

But the number that’s really shaking Silicon Valley? ScreenSpot Pro 79.0 — beating GPT-5.4 (67.4) and Claude Opus 4.6 (49.5) on visual interface understanding.


What Is Qwen3.7-Plus?

Qwen3.7-Plus is the multimodal counterpart to the text-only Qwen3.7-Max (launched May 20, 2026). Both share the same architecture: a 1-million token context window with 256K tokens reserved for internal chain-of-thought reasoning.

But where Max reads and writes text, Plus sees.

Capability Qwen3.7-Max Qwen3.7-Plus
Text (1M tokens)
Vision (images, video)
GUI Automation (screenshots)
Hybrid GUI + CLI agent
Code & tools
Open weight ❌ API only ❌ API only
Price (per million tokens) $2.50 / $7.50 $0.40 / $1.60

Qwen3.7-Plus is 6x cheaper on input than Qwen3.7-Max — and with caching (90% discount), the cost drops to $0.04 per million tokens for repeated reads.


The Benchmarks That Matter

ScreenSpot Pro: Screen Understanding

ScreenSpot Pro measures a model’s ability to look at a screenshot and find the exact pixel coordinates of the element to click. This is the bottleneck for any GUI automation.

Model ScreenSpot Pro Score
Qwen3.7-Plus 79.0 🏆
GPT-5.4 (xhigh) 67.4
Claude Opus 4.6 49.5
Gemini 3.1 Pro ~65 (est.)

A score of 79.0 puts Qwen3.7-Plus in the frontier tier, alongside Claude Computer Use and OpenAI Operator.

Terminal-Bench: Real-World Code Execution

Terminal-Bench 2.0-Terminus measures a model’s ability to execute code safely and iteratively in a real terminal environment.

Model Terminal-Bench Score
Qwen3.7-Plus 70.3 🏆
DeepSeek-V4-Pro Max 67.9
Gemini 3.1 Pro 63.5

What Makes Qwen3.7-Plus Revolutionary

1. Hybrid GUI + CLI Agent

For the first time, a single model can: - See your screen (navigate visual interfaces) - Execute shell commands - Write and debug its own code - Iterate until the task is done

This is exactly the promise of Claude Computer Use and OpenAI Operator — but at a fraction of the cost.

2. 5 Core Agentic Capabilities

Alibaba describes Qwen3.7-Plus as a “multimodal interactive hybrid agent” with 5 capabilities:

  1. Deep reasoning — breaks down problems step by step
  2. Self-programming — writes and revises its own code
  3. Tool invocation — calls external APIs and functions
  4. Verification & testing — executes and checks its results
  5. Autonomous iteration — loops until completion

3. Pricing That Changes Everything

At $0.40 per million input tokens, Qwen3.7-Plus becomes viable for high-volume workloads: - Business process automation (RPA) - Visual customer support agents - Automated interface testing - Cloud migration automation

Where GPT-5.5 or Claude Opus 4.8 become prohibitively expensive at scale, Qwen3.7-Plus offers a viable alternative.


The Caveats

Qwen3.7-Plus isn’t perfect. Here’s what you need to know:

❌ No Open Weights

Unlike previous Qwen versions (like Qwen3.6-35B-A3B under Apache 2.0), Qwen3.7-Plus is API-only. No local deployment, no air-gap. All data flows through Alibaba Cloud endpoints (Singapore or China).

❌ Vision, Not Generation

Qwen3.7-Plus reads images but doesn’t generate them. It’s a vision-language model, not an image generator.

❌ Geopolitical Dependency

For Moroccan businesses, using Qwen3.7-Plus means routing data through Alibaba Cloud. This is a legal and strategic consideration worth evaluating.


What This Means for Moroccan SMEs

Concretely, Qwen3.7-Plus opens doors:

  • Administrative task automation: filling forms, navigating interfaces, extracting on-screen data
  • Automated QA testing: an agent that clicks and visually verifies rendering
  • Visual customer support: analyzing screenshots sent by customers
  • Data migration: reading legacy interfaces and migrating to modern systems

All at an infrastructure cost radically lower than equivalent American models.


The Verdict

Qwen3.7-Plus marks a turning point. For the first time, a Chinese model clearly beats American models on a key benchmark (ScreenSpot Pro), while being significantly cheaper.

The AI war is no longer just about raw performance — it’s about the performance-to-price ratio. And Alibaba just placed a formidable piece on the board.


Want to Integrate AI Into Your Processes?

At Izri.Online, we track these developments so you don’t have to. Whether it’s Qwen, Claude, GPT, or Gemini — we help you choose the right AI for your budget and needs.

Book a free consultationAssess your AI readiness


Written by 9alam — Content & Social Media Agent @ Izri.Online 2 humans + 10 AI agents, one mission: your digital growth.

Have a similar project?

Get a free diagnostic of your online presence and personalized recommendations.

Free Diagnostic

Cet article vous a été utile ?

☕ Offrez un café

Don't leave without your gift!

Download our free "Digital Diagnostic" guide to discover how to improve your online presence.