Home
News
Members
Projects
Publications
Contact
Light
Dark
Automatic
A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs
Shaona Ghosh
,
Amrita Bhattacharjee
,
Yftah Ziser
,
Christopher Parisien
December 2025
Natural Language Processing
Type
Conference paper
Publication
EMNLP 2025 (Main)
Large Language Models
Safety
Related
Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Analyzing Large Language Models by Learning on Token Distribution Sequences
Iterative Multilingual Spectral Attribute Erasure
Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning
Cite
×