1. [Publications](/publications)
2. SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification
 
 # SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification

  ![Publication image](/sites/default/files/styles/wide/public/default_images/default.jpeg?itok=qUFsuJCP "Publication image")

 We propose SpeakerNet - a new neural architecture for speaker recognition and speaker verification tasks. It is composed of residual blocks with 1D depth-wise separable convolutions, batch-normalization, and ReLU layers. This architecture uses x-vector based statistics pooling layer to map variable-length utterances to a fixed-length embedding (q-vector). SpeakerNet-M is a simple lightweight model with just 5M parameters. It doesn't use voice activity detection (VAD) and achieves close to state-of-the-art performance scoring an Equal Error Rate (EER) of 2.10% on the VoxCeleb1 cleaned and 2.29% on the VoxCeleb1 trial files.



 ## Authors



Nithin Rao Koluguri (NVIDIA)

Jason Li (NVIDIA)

Vitaly Lavrukhin (NVIDIA)

Boris Ginsburg (NVIDIA)

 

 

 ## Publication Date



Friday, October 23, 2020

 

 ## Research Area



[Speech Processing](/research-area/speech-processing)

 

 

 ## External Links



[Paper](https://arxiv.org/abs/2010.12653)