Large Language Models

A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs

Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT

Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion

Analyzing Large Language Models by Learning on Token Distribution Sequences