LLM BG: Singing Accompaniment Generation with LLM & Encodec

1Trong-Hieu Nguyen Mau, 1,2Quoc-Huy Trinh, 1Truong-Tien Nguyen, 1,3Minh-Van Nguyen, 1Khoa Tran, 1Thanh Do

1SongGen Team, Ho Chi Minh city, Vietnam
2Aalto University, Espoo, Finland
3Technical University of Denmark, Kongens Lyngby, Denmark

Abstract

Singing Accompaniment Generation (SAG) is a crucial task in song production, aiming to create accompaniment that harmonizes seamlessly with the vocal track. Recently, various studies have proposed unconditional generation models based on Transformers or Stable Diffusion, achieving promising results. However, these methods face challenges in real-world applications due to their lack of control. To address this issue, we propose LLM-BG, a Large Language Model based on QwenV2, capable of generating accompaniment conditioned on vocal audio input and prompt instructions. Through extensive experiments, we successfully generated 12-seconds accompaniment segments that harmonize with the vocal input and can be extended to full-song accompaniment for long vocal tracks, which is promising approach for the Singing Accompaniment Generation task.

Overview of LLM-BG

Comparing with SingSong

Sample 1 Sample 2
Prompt motivational music, drumming, instrumental, energetic drums, drum fill, drum solo, fitness music, workout music soft, instrumental, mellow, electric guitar
Vocal Input
Ground Truth
Accompaniment
Ground Truth
Mixed
SingSong
Accompaniment
SingSong
Mixed
LLM-BG
Accompaniment
LLM-BG
Mixed
Sample 3 Sample 4
Prompt female vocals, drumming rhythm, keyboard accompaniment, groovy, percussive bass line, dance rhythm, vocal harmony, medium tempo, percussion hits instrumental, harmonica, acoustic guitar, bass guitar, slow tempo, acoustic drum, sentimental, advertisement jingle, beat-making
Vocal Input
Ground Truth
Accompaniment
Ground Truth
Mixed
SingSong
Accompaniment
SingSong
Mixed
LLM-BG
Accompaniment
LLM-BG
Mixed
Sample 5 Sample 6
Prompt groovy, rock, electric guitar, hi hats, funky, bass guitar, passionate, kick, snare, male vocal TV series, slow tempo, bass guitar, electric guitar, teenage drama, male vocalist, acoustic drum, piano, pop, opening theme, mellow
Vocal Input
Ground Truth
Accompaniment
Ground Truth
Mixed
SingSong
Accompaniment
SingSong
Mixed
LLM-BG
Accompaniment
LLM-BG
Mixed
Sample 7 Sample 8
Prompt heavy metal, male vocal, distorted electric guitar, bass guitar, acoustic drums, loud, aggressive, violent video game heavy metal, screaming vocals, distorted electric guitar, bass guitar, metal drum beat, aggressive, violent, action video game soundtrack
Vocal Input
Ground Truth
Accompaniment
Ground Truth
Mixed
SingSong
Accompaniment
SingSong
Mixed
LLM-BG
Accompaniment
LLM-BG
Mixed