AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries

This paper addresses Audio Manipulation on Specific Sources (AMSS), which aims to edit only desired objects that correspond to specific sources, such as vocals and drums, according to a given description while preserving the content of sources that are not mentioned in the description. AMSS can be used for many applications such as video creation tools making audio editing easy for non-experts.

For example, users can decrease the volume of drums by typing simple textual instructions instead of time-consuming interactions with digital audio workstations.

You can edit your audio clip with text queries written in Audio Manipulation Language using AMSS-Net.

For example, you can reduce the volume of bass as follows.

model.manipulate_track(audio, 'decrease the volume of bass') 
model.manipulate_track(audio, 'pan vocals completely to the left side') 
model.manipulate_track(audio, 'apply heavy lowpass to drums, vocals') 
model.manipulate_track(audio, 'apply medium highpass to vocals, drums') # == apply highpass to drums, vocals 
model.manipulate_track(audio, 'separate vocals, bass, drums') # == extract vocals, drums, bass
model.manipulate_track(audio, 'mute bass, drums')  # == get rid of drums, bass
model.manipulate_track(audio, 'remove reverb from drums, bass') 

Below list the list of demonstration links.

  1. Audio Manipulation Language
  2. How AMSS-Net works?: Latent Source Channels
  3. Usecases of Progressive Manipulation, and an ablation study
  4. Controlling the level of audio effects