Music source separation based on a lightweight deep learning framework (DTTNET: DUAL-PATH TFC-TDF UNET)
File(s)ICASSP_2024_aArxiv (1).pdf (311.41 KB)
Accepted version
Author(s)
Chen, Junyu
Vekkot, Susmitha
Shukla, Pancham
Type
Conference Paper
Abstract
Music source separation (MSS) aims to extract a variety of sources from a piece of mixed music. Typically, in the context of MUSDB-18 demixing challenge, the target sources are ’vocals’, ’drums’, ’bass’ and ’other’ tracks. While deep learning methods have shown impressive results, there is a trend toward larger models. In our paper, we introduce a novel and lightweight architecture called DTTNet 1 , which is based on Dual-Path Module and Time-Frequency Convolutions Time-Distributed Fully-connected UNet (TFC-TDF UNet). DTTNet achieves 10.12 dB cSDR on ’vocals’ compared to 10.01 dB reported for Bandsplit RNN (BSRNN) but with 86.7% fewer parameters. We also assess pattern-specific performance and model generalization for intricate audio patterns.
Date Issued
2024-03-18
Date Acceptance
2024-04-14
Citation
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
ISBN
979-8-3503-4485-1
ISSN
2379-190X
Publisher
IEEE
Journal / Book Title
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Copyright Statement
Copyright © 2024 IEEE. This is the author’s accepted manuscript made available under a CC-BY licence in accordance with Imperial’s Research Publications Open Access policy (www.imperial.ac.uk/oa-policy)
License URL
Identifier
http://dx.doi.org/10.1109/icassp48485.2024.10448020
Source
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publication Status
Published
Start Date
2024-04-14
Finish Date
2024-04-19
Coverage Spatial
Seoul, Korea, Republic of
Date Publish Online
2024-03-18