NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or rails for short) are a specific way of controlling the output of an LLM, such as not talking about topics considered harmful, following a predefined dialogue path, using a particular language style, and more. There are several mechanisms that allow LLM providers and developers to add guardrails that are embedded into a specific model at training, e.g. using model alignment.

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

In this paper, we propose an efficient and accurate streaming speech recognition model based on the FastConformer architecture. We adapted the FastConformer architecture for streaming applications through: (1) constraining both the look-ahead and past contexts in the encoder, and (2) introducing an activation caching mechanism to enable the non-autoregressive encoder to operate autoregressively during inference. The proposed model is thoughtfully designed in a way to eliminate the accuracy disparity between the train and inference time which is common for many streaming models.

A Chat about Boring Problems: Studying GPT-Based Text Normalization

Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for language modeling. In this work, we argue otherwise. We empirically show the capacity of Large-Language Models (LLM) for text normalization in few-shot scenarios. Combining self-consistency reasoning with linguistic-informed prompt engineering, we find LLM-based text normalization to achieve error rates approximately 40% lower than production-level normalization systems.

Chenhui Deng

Chenhui Deng is currently a Research Scientist at NVIDIA, where he focuses on leveraging graph-based machine learning techniques for circuit problems. Chenhui earned his PhD degree in Electrical and Computer Engineering from Cornell University in 2024. His research area is in the interdisciplinary field of Machine Learning, Spectral Graph Theory, Electronic Design Automation, and VLSI.

Novel Transformer Model Based Clustering Method for Standard Cell Design Automation

Standard cells are essential components of modern digital circuit designs. With process technologies advancing beyond 5nm, more routability issues have arisen due to the decreasing number of routing tracks (RTs), increasing number and complexity of design rules, and strict patterning rules. The standard cell design automation framework is able to automatically design standard cell layouts, but it is struggling to resolve the severe routability issues in advanced nodes.

Quantum computing with subwavelength atomic arrays

Photon-mediated interactions in subwavelength atomic arrays have numerous applications in quantum science. In this paper, we explore the potential of three-level quantum emitters, or “impurities” embedded in a two-dimensional atomic array to serve as a platform for quantum computation. By exploiting the altered behavior of impurities as a result of the induced dipole-dipole interactions mediated by subwavelength arrays, we design and simulate a set of universal quantum gates consisting of the square root iSWAP and single-qubit rotations.

Quantum Goemans-Williamson Algorithm with the Hadamard Test and Approximate Amplitude Constraints

Semidefinite programs are optimization methods with a wide array of applications, such as approximating difficult combinatorial problems. One such semidefinite program is the Goemans-Williamson algorithm, a popular integer relaxation technique. We introduce a variational quantum algorithm for the Goemans-Williamson algorithm that uses only n+1 qubits, a constant number of circuit preparations, and poly(n) expectation values in order to approximately solve semidefinite programs with up to N=2^n variables and M∼O(N) constraints.