1. [Publications](/publications)
2. MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection
 
 # MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection

  ![Publication image](/sites/default/files/styles/wide/public/default_images/default.jpeg?itok=qUFsuJCP "Publication image")

 We present MarbleNet, an end-to-end neural network for Voice Activity Detection (VAD). MarbleNet is a deep residual network composed from blocks of 1D time-channel separable convolution, batch-normalization, ReLU and dropout layers. When compared to a state-of-the-art VAD model, MarbleNet is able to achieve similar performance with roughly 1/10-th the parameter cost. We further conduct extensive ablation studies on different training methods and choices of parameters in order to study the robustness of MarbleNet in real-world VAD tasks.



 ## Authors



Fei Jia (NVIDIA)

Somshubra Majumdar (NVIDIA)

Boris Ginsburg (NVIDIA)

 

 

 ## Publication Date



Monday, October 26, 2020

 

 ## Published in



[IEEE](https://ieeexplore.ieee.org/abstract/document/9414470)

 

 ## Research Area



[Speech Processing](/research-area/speech-processing)

 

 

 ## External Links



[Paper](https://arxiv.org/abs/2010.13886)