| Research

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct mapping from the N-best hypotheses list generated by an ASR system to the predicted output transcription. However, despite its effectiveness, GER introduces extra data uncertainty since the LLM is trained without taking into account acoustic information available in the speech signal.

Read more about It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Unity ECC: Unified Memory Protection Against Bit and Chip Errors

DRAM vendors utilize On-Die Error Correction Codes (OD-ECC) to correct random bit errors internally. Meanwhile, system companies utilize Rank-Level ECC (RL-ECC) to protect data against chip errors. Separate protection increases the redundancy ratio to 32.8% in DDR5 and incurs significant performance penalties. This paper proposes a novel RL-ECC, Unity ECC, that can correct both singlechip and double-bit error patterns. Unity ECC corrects doublebit errors using unused syndromes of single-chip correction.

Read more about Unity ECC: Unified Memory Protection Against Bit and Chip Errors

Estimates of Temporal Edge Detection Filters in Human Vision

Edge detection is an important process in human visual processing. However, as far as we know, few attempts have been made to map the temporal edge detection filters in human vision. To that end, we devised a user study and collected data from which we derived estimates of human temporal edge detection filters based on three different models, including the derivative of the infinite symmetric exponential function and temporal contrast sensitivity function.

Read more about Estimates of Temporal Edge Detection Filters in Human Vision

Qianli Ma

Qianli Ma is research scientist at NVIDIA Research. He received his PhD from ETH Zürich and Max-Planck-Institute for Intelligent Systems (Tübingen), advised by Professor Michael Black and Professor Siyu Tang. He has also interned at Meta Reality Labs in Pittsburgh. He has been developing new representations and methods for reconstructing, generating and modeling digital humans. His research interests span generative models, 3D computer vision and graphics, with a current focus on dynamic 3D content generation.