On December 10, 2025, Zhipu announced on the official official account of WeChat that Zhipu officially launched the GLM-ASR series of speech recognition models, and also launched the desktop smart spectrum AI input method based on the series of models. Doesn't this mean we have taken another big step forward in the field of voice interaction? Compared to traditional speech recognition tools, Zhipu's new move is undoubtedly an innovative transformation.
GLM-ASR series models: a new benchmark for speech recognition
The perfect combination of cloud and end-to-end
GLM-ASR-2512 is a next-generation cloud based speech recognition model from Zhipu. It is like a super 'voice translator' that can accurately convert speech into text in real-time. In real-world complex environments with multiple scenarios, languages, and accents, its performance can be considered outstanding, with a Character Error Rate (CER) of only 0.0717, firmly maintaining a leading position in the industry. Imagine that whether you are on a noisy street or communicating with people with different accents, it can accurately recognize your speech without error. Isn't it amazing?
GLM-ASR-Nano-2512 is an upgraded end-to-end model based on GLM-ASR-2512. Although it has only 1.5B of parameters, it has achieved state-of-the-art performance in the current open source speech recognition direction, and even outperforms several closed source models in some tests. This is like a small and exquisite elf with enormous energy, compressing its powerful recognition ability to run locally. In this way, not only is privacy protection stronger, but interaction latency is also lower, allowing you to freely use voice recognition functions anytime, anywhere.
Zhipu AI Input Method: A New Experience of Voice Interaction
Model capability call in input
The Zhipu AI input method is based on the GLM-ASR series model, allowing users to perform smooth interaction through voice on the computer. Traditional input methods are like a simple typist, only responsible for turning what you say into text. The Zhipu AI Input Law is an all-in-one assistant that not only achieves precise speech to text conversion, but also directly calls on the ability of large models in the input method to complete operations such as translation, rewriting, and emotional transformation, truly realizing the concept of "fingertip as model, voice as instruction".
Integration from dictation to rewriting
The Zhipu AI input method realizes the integrated process of "selecting and modifying" from dictation to rewriting. It can directly call the underlying GLM model to help users translate, expand, and streamline any text on the screen, while also completing intelligent polishing to make the output more natural and smooth. The entire process is completed within the input box, achieving the integration of "understanding execution replacement" without the need to repeatedly switch between multiple applications. For example, if you have written a paragraph and feel that it is not exciting enough, simply operate it in the input method, and it can help you make the text more vivid and interesting.
Switching between thousands of people and thousands of faces
The Zhipu AI input method also supports setting different "persona" styles to achieve different expressions of the same sentence in different scenarios. Imagine in a work setting, if you choose the "face the boss" persona, the originally colloquial rambling can instantly transform into a logically rigorous and logically clear work report; In daily life scenarios, switching to the "facing partner" persona, the text becomes gentle and playful, close to the context of daily conversations. It's like equipping you with an intelligent language wizard who can transform different language styles according to your needs.
Vibe Coding (Language Sense Programming)
For developers, Zhipu AI Input Method has specially launched the Vibe Coding experience and integrated it with the Zhipu Coding Plan account. By combining multilingual support and code understanding capabilities, developers can quickly input code logic and comments through voice commands, search for forgotten Linux instructions, and use natural language to command AI to complete complex mathematical calculations or script writing. Designers can also transition from traditional "hand picking" to "mouth doing design", greatly improving work efficiency. For example, when a designer is brainstorming a design proposal, they only need to describe their ideas through voice, and the input method can quickly generate corresponding design elements.
Whisper Capture and Efficient Hot Words
In public settings such as open offices and libraries, we often give up voice input because we feel embarrassed to speak loudly. The Zhipu AI input method has optimized its ability to capture weak sounds and distinguish environmental noise to address this pain point. By speaking softly, it can be accurately converted into text, solving the problem of "being embarrassed to use voice input" in public places. At the same time, it also supports users to import exclusive vocabulary, project codes (such as AutoGLM), and rare names and place names with just one click. Just add it once in the settings, and subsequent use will be more convenient.
It is worth mentioning that with the continuous development of AI technology, voice interaction has become an important trend in future technology. The GLM-ASR series models and Zhipu AI input method launched by Zhipu undoubtedly conform to this trend, bringing users a more convenient and efficient voice interaction experience.