Better performance than Suno v5, Tencent and Tsinghua jointly released SongGeneration2: Overcoming the problem of pronunciation and debugging, supporting local deployment

2026-03-16

On March 9, 2026, the music foundation model SongGeneration2, jointly developed by Tencent and Tsinghua University's Human Computer Voice Interaction Laboratory, was officially released. This news was like a heavy bomb, causing waves in the technology and music industry.

Technological Innovation: Targeting the Three Pain Points of AI Music

In the past, AI music often gave people a "plastic feeling" and there were many urgent problems that needed to be solved. The emergence of SongGeneration2 is like a sharp blade, precisely targeting these pain points.

Architecture innovation: driven by "dual core" to achieve excellence

The reason why SongGeneration2 can achieve such outstanding performance is due to its innovative hybrid LLM diffusion architecture.

Composition Brain (LeLM): Global Planning and Detail Control

LeLM is like an experienced composer, responsible for planning the overall structure and singing details of music. It can accurately grasp the rhythm, melody, and harmony of music, solve the key problem of "how to sing", and lay a solid foundation for musical works.

High fidelity renderer (Diffusion): Synthesizing complex acoustic details

Guided by language models, the Diffusion renderer is capable of synthesizing extremely complex acoustic details. It is like a skilled tuner, polishing every note perfectly, giving music works extremely high sound quality and realism.

Layered representation: Balancing melody and sound quality

SongGeneration2 pioneered a parallel modeling approach that combines mixed representation and multi track representation, balancing the stability of melody with the delicacy of sound quality. This unique architectural design allows music works to have both smooth melodies and rich timbres, as well as delicate emotional expressions.

Open source benefits: lowering the threshold for creativity and promoting nationwide composition

For the majority of developers, the open source of SongGeneration2 is undoubtedly a huge blessing. The SongGeneration-v2 large model with 4B parameters has been officially open sourced and supports multilingual generation in Chinese, English, and other languages.

Even more surprising is that it can run smoothly on consumer grade hardware equipped with 22GB of video memory, realizing the possibility of localized and private creation. This means that ordinary users can easily participate in music creation without the need for expensive professional equipment.

In order to enable users to experience the charm of SongGeneration-v2 faster, the project team has also launched the SongGeneration-v2 Fast version on HuggingFace. This version sacrifices a very small amount of sound quality in exchange for extremely fast generation - a complete single can be born in just one minute, greatly improving creative efficiency.

Summary: The era of "composers" for the whole nation may be coming

From the performance of SongGeneration2, it can be seen that AI music has officially entered the door of "commercial applications" from a "geek toy". With the open source of the Medium model and automated evaluation framework that supports 12GB video memory in the future, the threshold for AI music creation will be further lowered, and more people will have the opportunity to become "composers".