Usecases of Progresive Manipulation, and an ablation study
In this demonstration, we show that we can apply the proposed method repeatedly to manipulated audio tracks, which is also known as Progressive Manipulation used in conversational system.
Usecase 1: Progressive Manipulation for making Saptial Audio
AMMS-Net can make an audio clip, which had been recorded with a single microphone, provide a better and realistic hearing experience to users.
For example, there are people playing instruments and singing in the above video.
- The location of each person is
- bass guitarist: left
- drummer: middle
- vocalist with gutar: right
We can make a better audio clip in terms of the spatial audio with AMSS-Net.
- For example,
- pan bass completely to the left side: left 100% and right 0% (the operation 1 in the table below)
- pan vocals to the right side: left 20% and right 80% (the operation 2 in the table below)
- Also we can modify the timbre of specified instrument to avoid interference of different sources.
- by typing apply highpass to drums (the operation 3 in the table below)
|1||pan bass completely to the left side|
|2||pan vocals to the right side|
|3||apply highpass to drums|
Usecase 2: Progressive Manipulation to Remove Unwanted Noises
An audio clip may contain noises such as people shouting noise in the middle of a concert.
- For example, in the above youtube (from the movie Begin again),
- a man shouts (starting from 0:08 in the original audio clip in the table below)
- in the middle of a rooftop busking to ask them to stop playing
We can also observe reflections of some sounds, especially from the kick drums, which might annoy some listeners.
We perceive them more severe when we increase the volume of those instruments (the operation 1 in the table below)
Then, we can make a better audio clip to remove unwanted sounds or remove the reverberation effect with AMSS-Net as follows:
|1||increase the volume of drums|
|2||remove reverb from bass, drums|
Usecase 3: Progrssive Manipulation for More Dramatic Introduction with Musical Effects
We are used to some dramatic introduction with musical effects such as fade-in or band pass filters.
Although some artists show performance with special devices for musical effects as described in the figure below,
it is usually hard to apply those effects in a real performance with limited devices and engineers.
In this case, we can use AMSS-Net to post-process the recorded clip for more dramatic introduction as follows.
|1||extract vocals, drums, bass|
|2||apply light highpass to vocals|
|3||apply medium highpass to bass|
|4||apply light lowpass to drums|
Ablation Study: Progrssive Manipulation
- Methods based on neural networks sometimes suffer from artifacts, which are not present in the original source.
- Although they sound negligible after a single manipulation task, they can be large enough to be perceived after progressively applying several.
- To investigate artifacts created by progressive manipulation, we apply the same AMSS task ``apply highpass to drums’’ to a track in a progressive manner.
|model||desc x times||Audio||Spectrogram|
|amssnet||apply highpass to drums x 20|
|wo_smpocm||apply highpass to drums x 20|
|wo_csa||apply highpass to drums x 20|
- Our AMSS-Net contains minor artifacts compare to them because each decoding block of AMSS-Net has a CSA mechanism, a unique structure that prevents unwanted noise generated by intermediate manipulated features.