Large Language Models

Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions

A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs

Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT

Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion

Analyzing Large Language Models by Learning on Token Distribution Sequences