LLM BG: Singing Accompaniment Generation with LLM & Encodec
1Trong-Hieu Nguyen Mau, 1,2Quoc-Huy Trinh, 1Truong-Tien Nguyen, 1,3Minh-Van Nguyen, 1Khoa Tran, 1Thanh Do
1SongGen Team, Ho Chi Minh city, Vietnam
2Aalto University, Espoo, Finland
3Technical University of Denmark, Kongens Lyngby, Denmark
Abstract
Singing Accompaniment Generation (SAG) is a crucial task in song production, aiming to create accompaniment that harmonizes seamlessly with the vocal track. Recently, various studies have proposed unconditional generation models based on Transformers or Stable Diffusion, achieving promising results. However, these methods face challenges in real-world applications due to their lack of control. To address this issue, we propose LLM-BG, a Large Language Model based on QwenV2, capable of generating accompaniment conditioned on vocal audio input and prompt instructions. Through extensive experiments, we successfully generated 12-seconds accompaniment segments that harmonize with the vocal input and can be extended to full-song accompaniment for long vocal tracks, which is promising approach for the Singing Accompaniment Generation task.
Comparing with SingSong
Sample 1 | Sample 2 | |
---|---|---|
Prompt | motivational music, drumming, instrumental, energetic drums, drum fill, drum solo, fitness music, workout music | soft, instrumental, mellow, electric guitar |
Vocal Input | ||
Ground Truth Accompaniment |
||
Ground Truth Mixed |
||
SingSong Accompaniment |
||
SingSong Mixed |
||
LLM-BG Accompaniment |
||
LLM-BG Mixed |
Sample 3 | Sample 4 | |
---|---|---|
Prompt | female vocals, drumming rhythm, keyboard accompaniment, groovy, percussive bass line, dance rhythm, vocal harmony, medium tempo, percussion hits | instrumental, harmonica, acoustic guitar, bass guitar, slow tempo, acoustic drum, sentimental, advertisement jingle, beat-making |
Vocal Input | ||
Ground Truth Accompaniment |
||
Ground Truth Mixed |
||
SingSong Accompaniment |
||
SingSong Mixed |
||
LLM-BG Accompaniment |
||
LLM-BG Mixed |
Sample 5 | Sample 6 | |
---|---|---|
Prompt | groovy, rock, electric guitar, hi hats, funky, bass guitar, passionate, kick, snare, male vocal | TV series, slow tempo, bass guitar, electric guitar, teenage drama, male vocalist, acoustic drum, piano, pop, opening theme, mellow |
Vocal Input | ||
Ground Truth Accompaniment |
||
Ground Truth Mixed |
||
SingSong Accompaniment |
||
SingSong Mixed |
||
LLM-BG Accompaniment |
||
LLM-BG Mixed |
Sample 7 | Sample 8 | |
---|---|---|
Prompt | heavy metal, male vocal, distorted electric guitar, bass guitar, acoustic drums, loud, aggressive, violent video game | heavy metal, screaming vocals, distorted electric guitar, bass guitar, metal drum beat, aggressive, violent, action video game soundtrack |
Vocal Input | ||
Ground Truth Accompaniment |
||
Ground Truth Mixed |
||
SingSong Accompaniment |
||
SingSong Mixed |
||
LLM-BG Accompaniment |
||
LLM-BG Mixed |