LoRWeB: Spanning the Visual Analogy Space
with a Weight Basis of LoRAs

1Technion,  2NVIDIA,  3Bar Ilan University

TLDR: We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enabling flexible and generalizable image edits based on example transformations.


We present LoRWeB (LoRA Weight Basis), a novel modular framework for dynamically mixing low-rank adapters (LoRAs).
We use LoRWeB for visual analogy learning: Given an input pair of example "before" and "after" images (a and a'), we want to edit a new image (b) in a similar manner, and apply the same visual transformation (to produce b'). Rather than using a single fixed adapter, we learn a basis of LoRAs and a lightweight encoder that selects and weighs these basis LoRAs and constructs an edit LoRA. This enables better generalization to unseen visual transformations, achieving state-of-the-art performance on visual analogy tasks without requiring test-time optimization.

Teaser.

Visual Analogy Results

Abstract

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet {a, a', b}, the goal is to generate b' such that a : a' :: b : b'. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation.

How does it work?

LoRWeB Overview: We first encode a and a', that describe a visual transformation (e.g., adding a hat to the man), and b, which should be edited analogously (e.g., adding a hat to the woman) with CLIP, and a small learned projection module. The similarity between the encoded vector and a set of learned keys determines the linear coefficients for combining the learned LoRAs into a single, mixed LoRA. This mixed LoRA is injected into a conditional flow model (e.g., Flux.1-Kontext). Next, we build a 2×2 composite image from {a, a', b}. The conditional flow model gets this composite image as its input, along with a guiding edit prompt, and produces a composite image with the edited results b' in the bottom-right quadrant.

Comparison To Current Methods

We compare LoRWeB against four baselines on unseen tasks: a standard Flux LoRA of similar parameter capacity, as well as three prior visual analogy methods (RelationAdapter, VisualCloze, and Edit-Transfer). Our approach generalizes across more diverse tasks, and better maintains the visual details of both the subject and the analogy.

Quantitative Evaluation

Evaluation Metrics (top): (left) Accuracy of the applied edit and preservation of b in b' using Gemma-3. Top right is better. (right) CLIP directional similarity and LPIPS between b' and b. Bottom-right is better. Our method pushes the Pareto front of edit accuracy-preservation, achieving higher edit accuracy while strongly preserving the input image
Pairwise Comparisons (bottom): We compare LoRWeB to four baselines on overall edit quality preference via both a user study and using a VLM. LoRWeB produces edits that are favored by both. Error bars are the 68% Wilson score interval.



Importance of Prompts and Reference Images

LoRWeB directly leverages the analogy pair to understand the details of the proposed task, applying an edit that is beyond just text-based editing based on the given prompt. For example, when the prompt is "Give this creature a crown of crystals", the analogy context passes information on the amount and color of the crystals. The reference pair dictates the details of the analogy task, which might not be captured by textual prompts.



BibTeX

@article{manor2026lorweb,
  title={Spanning the Visual Analogy Space with a Weight Basis of LoRAs},
  author={Manor, Hila and Gal, Rinon and Maron, Haggai and Michaeli, Tomer and Chechik, Gal},
  journal={arXiv preprint},
  year={2026}
}