AMSS-Net

Usecases of Progresive Manipulation, and an ablation study

In this demonstration, we show that we can apply the proposed method repeatedly to manipulated audio tracks, which is also known as Progressive Manipulation used in conversational system.

Usecase 1: Progressive Manipulation for making Saptial Audio

“Paulie Cerra band - Take me down Quick - live at bluesmoose radio 28 juli 2021”,
by Blues Moose, licensed under Creative Commons Attribution licence (reuse allowed).

Paulie Cerra band - Take me down Quick -  live recorded for bluesmoose radio - 28 juli 2021.
at Cafe zaal de Comm - Groesbeek - coronaproof setting 1,5 meter socially distanced

Paulie Cerra – vocals, keyboard, saxophone
Ben Forrester – Guitar
Billy Haynes – Bass, Vocals
Alvino M. Bennet – Drums
https://www.pauliecerra.com/

Licence Creative Commons Attribution licence (reuse allowed)

AMMS-Net can make an audio clip, which had been recorded with a single microphone, provide a better and realistic hearing experience to users.

For example, there are people playing instruments and singing in the above video.

The location of each person is
- bass guitarist: rightmost
- vocalist, playing piano: right
- drummer: relatively left

We can make a better audio clip in terms of the spatial audio with AMSS-Net.

For example,
- pan bass completely to the left side: left 100% and right 0% (the operation 1 in the table below)
- pan vocals to the right side: left 20% and right 80% (the operation 2 in the table below)
Also we can modify the timbre of specified instrument to avoid interference of different sources.
- by typing apply highpass to drums (the operation 3 in the table below)

order	description	Audio	Spectrogram	Wav
0	original
1	pan bass completely to the right side
2	pan drums to the left side
3	pan vocals to the right side
4	apply light highpass to drums

Usecase 2: Progressive Manipulation to Remove Unwanted Noises

Copyright info.
- “Manchester Life : Reggae music on market street ”,
- by Africa Without Borders, licensed under Creative Commons Attribution licence (reuse allowed).
```
Manchester Life : Reggae music on market street
Licence Creative Commons Attribution licence (reuse allowed)
```

An audio clip may contain noises such as people shouting noise in the middle of a concert.

For example, in the above youtube (from the movie Begin again),
- a man shouts (starting from 0:08 in the original audio clip in the table below)
- in the middle of a rooftop busking to ask them to stop playing

We can also observe reflections of some sounds, especially from the kick drums, which might annoy some listeners.

We perceive them more severe when we increase the volume of those instruments (the operation 1 in the table below)

Then, we can make a better audio clip to remove unwanted sounds or remove the reverberation effect with AMSS-Net as follows:

order	description	Audio	Spectrogram	Wav
0	original
1	decrease the volume of vocals
2	increase the volume of drums, bass
3	remove reverb from bass, drums

Usecase 3: Progrssive Manipulation for More Dramatic Introduction with Musical Effects

Copyright info.
- “Zhen feat. Domenico Godino, “Time is running out””,
- by percefal, licensed under Creative Commons Attribution licence (reuse allowed).
```
Muse cover, live @ Work In Progress Music Pub
31 ottobre 2015
Licence Creative Commons Attribution licence (reuse allowed)
```

We are used to some dramatic introduction with musical effects such as fade-in or band pass filters.

Although some artists show performance with special devices for musical effects as described in the figure below,

it is usually hard to apply those effects in a real performance with limited devices and engineers.

In this case, we can use AMSS-Net to post-process the recorded clip for more dramatic introduction as follows.

order	description	Audio	Spectrogram	Wav
0	original
1	apply light highpass to vocals
2	apply medium highpass to bass
3	apply light lowpass to drums

Ablation Study: Progrssive Manipulation

Methods based on neural networks sometimes suffer from artifacts, which are not present in the original source.
Although they sound negligible after a single manipulation task, they can be large enough to be perceived after progressively applying several.
To investigate artifacts created by progressive manipulation, we apply the same AMSS task ``apply highpass to drums’’ to a track in a progressive manner.

choi hn · AMSS-Net sample - ACM Multimedia 2021

model	desc x times	Audio	Spectrogram
origin	N/A
amssnet	apply highpass to vocals x 1
amssnet	apply highpass to vocals x 20
wo_csa	apply highpass to vocals x 20
wo_smpocm	apply highpass to vocals x 20

Our AMSS-Net contains minor artifacts compare to them because each decoding block of AMSS-Net has a CSA mechanism, a unique structure that prevents unwanted noise generated by intermediate manipulated features.