Hey everyone, Dora here!
I’ve been tracking the AI landscape closely, and we are witnessing a massive shift. Users are no longer just asking models for advice or quick info lookups; they are demanding that models directly execute complex workflows. This means general-purpose models need to evolve way beyond simple text generation.
In response to this trend, I want to walk you through the launch of Seed1.8, a new General Agent Model that is turning heads. It's packed with powerful multimodal capabilities, supports both image and text inputs, and is designed to handle everything from information retrieval and code generation to GUI interaction and complex workflows.
Here is the lowdown on what makes Seed1.8 stand out.
The Seed1.8 Snapshot
To give you the quick "too long; didn't read" version, here are the three core pillars of this new model:
- All-in-One General Agent: Seed1.8 unifies Search, Code, and GUI Agent capabilities. Thanks to its native visual foundation, it can literally "see" interfaces and interact with them directly.
- Fast & Efficient: It supports three distinct "thinking modes," automatically adjusting how it processes tasks based on complexity. Plus, it has optimized the token count needed for image encoding, boosting inference efficiency without sacrificing intelligence.
- Built for Reality: The team put Seed1.8 through rigorous testing that simulates real-world workflows. It shines in intent recognition, broad information retrieval, and following complex instructions.
Based on an internal evaluation system designed around real-world needs (and combined with public benchmarks), Seed1.8 has undergone comprehensive testing. You can find the deep dive in the Model Card on the Project Homepage.
Seed1.8 General Agent Capabilities
Validated Across Diverse Real-World Tasks
In various benchmarks, Seed1.8 shows serious potential as a General Agent Model, particularly in GUI manipulation, search, and industry-specific applications.
Building a solid Agent is tough because of three main hurdles:
1. Multi-task Parallelism: Judging between tasks, allocating resources, and maintaining quality across the board.
2. Complex Instruction Following: Handling multiple constraints and executing strictly defined steps.
3. Cross-Domain Knowledge Transfer: Switching contexts seamlessly while maintaining high-level reasoning.
Seed1.8 hits these pain points hard. Evaluations show it has industry-leading GUI Agent capabilities, improving significantly over Seed1.5-VL. Whether it’s on a Desktop, Web, or Mobile environment, it reliably executes multi-step tasks across different systems.
It's also a beast at search. In the BrowseComp-en benchmark, it scored a massive 67.6, outperforming top-tier models like Gemini-3-Pro.

(Note: Data marked with * is from public technical reports; Data marked with 1 is from the official full set scores)
Agentic Coding
In benchmarks related to Agentic Coding, Seed1.8 demonstrates stability in real software engineering scenarios. It doesn't just generate code snippets; it acts as a programmer that can push tasks forward in a real development environment.

Real-World Economic Value
For tasks that actually drive business value, the results are promising. On FinSearchComp and XpertBench, the model is stable and efficient in financial and commercial contexts.
Furthermore, on the WorldTravel multimodal task, it scored 47.2. This proves its reliability in planning trips, analyzing user needs, and handling logistics.

The Berlin Example: Imagine a family with a strict budget wanting to visit Berlin. Seed1.8 combined data from travel platforms, booking sites, and restaurant menus. Using its reasoning and visual interpretation, it generated a plan that met every single constraint—budget, time, food preferences, and accommodation. It didn't just list places; it optimized the schedule for a personalized fit.


LLM Capabilities
Rivaling Top-Tier General Models
When we look at standard Large Language Model benchmarks, Seed1.8 sits comfortably in the first tier of the industry.
- Math & Reasoning: In core dimensions like mathematics and knowledge understanding, it performs close to the absolute best general models out there.

- Complex Instruction Following: This is where many models break down. Seed1.8 handles tasks with multiple constraints, reverse conditions, or long-chain reasoning with stability.
- Expert Applications: The team extended validation to scenarios defined by human experts—education tutoring, customer service, and complex workflows—proving it’s ready for deployment, not just the lab.
VLM Multimodal Capabilities
Significant Score Jumps & Standout Performance
Seed1.8 has leveled up its visual game, surpassing its predecessor (Seed1.5-VL) and nipping at the heels of the state-of-the-art Gemini-3-Pro in most multimodal reasoning tasks.
Image Understanding
The model is sharp. In the ZeroBench test—considered extremely difficult for visual reasoning—Seed1.8 grabbed the top score of 11.0, answering significantly more questions correctly than previous versions.

In general Visual QA (specifically the VLMsAreBiased benchmark), it scored 62.0, taking a massive lead over competitors. It also excels in 3D spatial understanding, showing it can handle dynamic and complex datasets.
Video Understanding
This is one of my favorite parts. Seed1.8 is highly adaptive in video reasoning, motion perception, and long-video understanding.
- Real-Time Perception: It ranks top-tier in dynamic scenes, processing real-time info efficiently.

- Long Video (VideoMME): It scored a high 87.8. Long videos require tracking context over time. Seed1.8 uses a "VideoCut" tool to slow down and re-watch specific clips, allowing for precise reasoning and high-frame-rate motion detection.
Token Efficiency: It understands better while using fewer resources, meaning lower latency for us users.
Even with a lower "Max Video Token" configuration, it outperforms Seed1.5-VL on multiple benchmarks.
Dynamic Thinking
Seed1.8 introduces Thinking Modes. This allows the model to dynamically adjust its "thinking depth" and computational load based on how hard the task is. It's smart resource management.

What’s Next for Seed?
The team isn't stopping here. The roadmap for Seed1.8 and beyond is focused on tackling the messy, complex challenges of the real world. Here is where they are heading:
- Scaling Up: Continuing to pump up the compute for pre-training and post-training to handle even crazier task demands.
- The Long Game: Improving Agent Memory and long-context processing so the model can handle tasks that span long periods without losing the plot.
- Real-World Polish: Expanding training data to include more actual work/life scenarios. The goal is a model that adapts to you, not the other way around.
- Autonomous Exploration: Pushing the boundaries of what the Agent can figure out on its own.
Finally, they are big on community. A lot of the evaluation datasets built for Seed1.8 have been (or will be) open-sourced. This helps keep the whole industry honest and moving forward.
One final thought: As we move into an era where AI "sees" as well as it reads, having a solid visual generation tool is essential. Check out z-image.ai if you’re looking to upgrade your visual stack alongside your agentic workflows.
Catch you in the next update! - Dora


