AMSS-Net

How AMSS-Net works: Latent Source Channels

We generate an audio track from a single latent source to check how AMSS-Net works in this demo.

1. Latent Source

2. Latent Source Channel Extraction

3. Visualization of Latent Source Channels

4. Results

Below are generated audios of a single latent source channel. For example, if we generate output after masking all channels except for the fifth channel in the second head group, then the result sounds similar to the low-frequency band of drums (i.e., kick drum).

AMSS-Net can keep this channel and drop other drum-related channels to process “apply lowpass to drums.”

However, we found that a latent source channel does not always contain a single class of instruments. For example, the latent channel of the fourth row in the table deals with several instruments. Some latent sources were not interpretable to the authors.

Latent Source Channel similar symbol Audio Spectrogram
N/A origin
head=5, lach=0 drums (left channel)
head=2, lach=6 vocals
head=3, lach=5 drums (clap-snare) and piano
Track info:
Woosung Choi · footprint

Reference

[1] Woosung Choi, Minseok Kim, Jaehwa Chung, and Soonyoung Jung. 2020. LaSAFT:Latent Source Attentive Frequency Transformation for Conditioned Source Sepa-ration.arXiv preprint arXiv:2010.11631(2020).

[2] Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin Wilson, andJohn R. Hershey. 2020. Unsupervised Sound Separation Using Mixture InvariantTraining. In NeurIPS. https://arxiv.org/pdf/2006.12