Sing-On-Your-Beat: Simple Text-Controllable Accompaniment Generations

1,2,3Quoc-Huy Trinh, 1,3,4Minh-Van Nguyen, 1,4Trong-Hieu Nguyen Mau, 1Khoa Tran, 1Thanh Do

1SongGen Team, Ho Chi Minh city, Vietnam
2Aalto University, Espoo, Finland
3Technical University of Denmark, Kongens Lyngby, Denmark
4University of Science, VNU-HCM, Vietnam

Abstract

Singing is one of the most cherished forms of human entertainment. However, creating a beautiful song requires an accompaniment that complements the vocals and aligns well with the song’s instruments and genre. With advancements in deep learning, previous research has focused on generating suitable accompaniments but often lacks precise alignment with the desired instrumentation and genre. To address this, we propose a straightforward method that enables control over the accompaniment through text prompts, allowing the generation of music that not only complements the vocals but also aligns with the song’s instrumental and genre requirements. Through extensive experiments, we successfully generate 10-second accompaniments using vocal input and text control. Additionally, our method demonstrates robust control over the generated accompaniment based on input prompts, improving alignment with the song’s instrumental and genre needs.

Visualization of 2 stages during training and inference of the Llambada model.
Each token type is in the color as follows: vocal semantic tokens, vocal coarse tokens, clap text token, accom semantic token, and accom coarse token.

Comparing with SingSong

Sample 1 Sample 2
Prompt Accompaniment with romantic, acoustic, female vocals, piano, guitar, bass, love song, movie soundtrack Accompaniment with romantic, acoustic, female vocals, piano, guitar, bass, love song, movie soundtrack
Vocal Input
Ground Truth
Accompaniment
SingSong
Accompaniment
SingSong
Mixed
Llambada
Accompaniment
Llambada
Mixed
Sample 3 Sample 4
Prompt Accompaniment with romantic, female vocals, simple beat, arpeggiated guitar, bass, percussion Accompaniment with romantic, acoustic, female vocals, piano, guitar, bass, love song, movie soundtrack
Vocal Input
Ground Truth
Accompaniment
SingSong
Accompaniment
SingSong
Mixed
Llambada
Accompaniment
Llambada
Mixed