Proteina-Complexa

Method Paper Wet Lab Paper Code & Weights Teddymer Dataset

Overview. We present Proteina-Complexa, a novel fully atomistic protein binder design framework unifying conditional generative modeling and optimization. This page is partitioned into two sections: Core model development and wet lab validation. We carried out an experimental design campaign with over 1 million binder candidates for 133 targets in collaboration with Manifold Bio, Viva Biotech, Novo Nordisk, Duke University, Cambridge University and LMU Munich.

No Redesign

Sequences & structures generated end-to-end without inverse folding.

Test-Time Scaling

Latent generative search: More compute at inference → better binders.

Wet Lab Campaign

133 targets. >1M binders screened. Up to 63.5% hit rates. Picomolar affinities.

Carbohydrate Binders

First ever de novo designed carbohydrate binders.

Model Highlights

Mini-binder generation for protein and small molecule targets; atomistic motif scaffolding for enzyme design.
Generative pretraining with inference-time compute scaling outperforms prior generation and hallucination methods.
Teddymer: Synthetic binder-target pairs from domain-domain interactions of predicted monomer structures.
No more re-design: Binder sequences generated directly by Proteina-Complexa without additional inverse folding.
Interface hydrogen bond optimization for strong biophysical interactions; fold class guidance for control and diversity.
State-of-the-art performance on in-silico binder design metrics and in computational enzyme design benchmarks.

Go to model details · Read paper

Extensive Wet Lab Validation

All-to-all binding on 127 target set: Off-target hits for all targets and on-design hits against 86 targets.
Method benchmark: Proteina-Complexa generates more experimentally validated hits than prior models.
63.5% hit rates & picomolar affinities for PDGFR; 40%-50% hit rates for kinase mini-protein & peptide binders.
First de novo carbohydrate binders — a target class previously thought inaccessible to current methods.
Binders to muscle-wasting Activin receptor type IIA with validated blocking of myostatin signaling in cells.
Nanomolar binders against Nipah virus and joint structure-sequence re-engineering of existing binders.

Go to validation details · Read preprint

1. The Proteina-Complexa Model

Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Kieran Didi^1,2,*, Zuobai Zhang^1,3,4,*, Guoqing Zhou^1,*, Danny Reidenbach^1,*, Zhonglin Cao^1,*, Sooyoung Cha^8,9,*, Tomas Geffner¹, Christian Dallago¹, Jian Tang^3,5,6, Michael M. Bronstein^2,7, Martin Steinegger^8,9,10,11, Emine Kucukbenli^1,♢, Arash Vahdat^1,♢, Karsten Kreis^1,†

¹ NVIDIA ² University of Oxford ³ Mila - Québec AI Institute ⁴ Université de Montréal ⁵ HEC Montréal ⁶ CIFAR AI Chair ⁷ AITHYRA ⁸ School of Biological Sciences, Seoul National University ⁹ Interdisciplinary Program in Bioinformatics, Seoul National University ¹⁰ Institute of Molecular Biology and Genetics, Seoul National University ¹¹ Artificial Intelligence Institute, Seoul National University

^* Core contributor.
^♢ Equal advising.
^† Project lead.

International Conference on Learning Representations (ICLR) 2026
(Oral Presentation)

Paper

Abstract

Protein interaction modeling is central to protein design, which has been transformed by machine learning with applications in drug discovery and beyond. In this landscape, structure-based de novo binder design is cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Proteina-Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architectures and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Proteina-Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We also demonstrate interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.

Proteina-Complexa Model Overview

Architecture and Target Conditioning. Proteina-Complexa builds on La-Proteina's partially latent flow matching framework and extends it to conditional binder design. Generation is achieved by iterative denoising (see animation). Key design choices:

Continuous latent protein representation. An autoencoder encodes fully atomistic proteins into alpha-carbon backbone coordinates paired with continuous per-residue latent variables that capture amino acid identity and side-chain geometry—and decodes them back. This avoids discretization artifacts and enables accurate atomistic modeling.
Target-conditioned denoising. Only the flow model's denoiser conditions on the target; the autoencoder remains shared across all target types. Target features (Atom37 coordinates, amino acid identities, hotspot tokens) are embedded and jointly processed with the binder's noisy representation through a transformer with pair-biased attention.
Joint sequence and structure output without redesign. The model generates both sequence and fully atomistic structure simultaneously. Unlike prior approaches, generated sequences are used directly; no separate redesign step is required.

See Figure 1 and our paper for full details.

Figure 1. Proteina-Complexa's architecture. A frozen autoencoder (top) encodes fully atomistic proteins into a partially latent representation (alpha-carbon coordinates + continuous per-residue latents) and decodes them back. The target-conditioned denoiser (bottom) concatenates embedded target features (Atom37 coordinates, amino acid identity, hotspot tokens) with the binder's noisy latent and backbone embeddings, processing them jointly through multi-head pair-biased attention layers.

Test-Time Compute Scaling. Current binder design methods either rely purely on generation (training-time optimization) or purely on hallucination (inference-time optimization without a generative prior). Proteina-Complexa unifies both paradigms—to our knowledge, a first in structure-based binder design. We steer the generative denoising process using rewards from structure prediction confidence (ipAE) and interface hydrogen bond energies (see Figure 2):

Best-of-N sampling — scale compute by growing the candidate pool.
Beam search — maintain and prune parallel denoising trajectories by reward.
Feynman–Kac steering — importance sampling toward the reward-tilted distribution.
Monte Carlo tree search — explore the denoising trajectory tree, balancing exploration and exploitation.
Generate & Hallucinate — initialize hallucination refinement (e.g., BindCraft) from generative model output rather than from scratch.

This latent generative search framework is general and can incorporate many different objectives; reward differentiability is not necessary. The adjacent animation shows a binder being iteratively refined to increase interface hydrogen bonds. All strategies leverage Proteina-Complexa's fast, fully transformer-based architecture, making repeated rollouts computationally feasible. Scaling inference-time compute via extended search during the generative denoising process allows the model to produce candidates with high scores even for difficult targets.

Figure 2. Proteina-Complexa's generation and inference-time optimization pipeline. (Top) Target-conditioned generation: from noise around the target, the model iteratively denoises a partially latent binder, decodes it, and produces a fully atomistic binder-target complex. (Bottom) Beam search for inference-time optimization: multiple stochastic denoising trajectories (colored paths) are maintained in parallel. At regular intervals, each candidate is rolled out to completion—decoded, co-folded with the target, and scored. Promising trajectories (green checks) are retained while low-scoring ones (red crosses) are pruned, and new branches are launched from the surviving candidates.

The Teddymer Dataset

Binder design requires paired binder-target data, yet experimentally resolved multimers in the PDB are scarce. We exploit the fact that domain-domain interactions within AlphaFold Database (AFDB) monomers closely resemble chain-chain interactions in real multimers. Using TED structural domain annotations, we split AFDB monomers into domains and assemble synthetic dimers: 47M AFDB50 structures → 10M dimers (filtered by proximity and CATH annotations) → 3.5M Foldseek clusters → quality-filtered training set. The resulting Teddymer dataset is an order of magnitude larger than the PDB. Proteina-Complexa trains in stages across four datasets: AFDB monomers, Teddymer dimers, PDB multimers, and PLINDER protein-ligand pairs (Figure 3).

Figure 3. The Teddymer dataset. (Left) Representative Teddymer dimer, constructed by splitting an AFDB monomer into its structural domains (colored chains). The zoom-in highlights interface hydrogen bonds, illustrating that domain-domain interfaces exhibit realistic biophysical interactions. (Right) Overview of the filtered training datasets used by Proteina-Complexa.

Visualizations of Generated Binders

Protein Binders. Proteina-Complexa can generate in-silico mini-binder candidates against single-chain and multi-chain targets. Below, the target is shown with transparent surface and interface hydrogen bonds are highlighted in red. Quantitative evaluations in performance section.

Target: H1

Target: PDL1

Target: Claudin1

Small Molecule Binders. Our model can also design binders to bind small molecules, as shown in the examples below. Generated binders use purple and gold color for alpha helices and beta sheets, respectively. Note that all depicted samples in this section fulfill in-silico success criteria (see our paper for details). Quantitative evaluations below.

Target: FAD

Target: IAI

Target: OQO

Enzyme Design. Proteina-Complexa can also tackle enzyme design, following the Atomic Motif Enzyme (AME) benchmark. An atomistic motif of the enzyme active site is provided together with the substrate molecule, and the model must design a protein that faithfully reconstructs the catalytic residues while accommodating the ligand without steric clashes. See example in Figure 4.

Figure 4. AME task M0157 (glyoxalase II, PDB: 1QH5). Two structurally diverse designs generated by Proteina-Complexa for a challenging 6-residue-island active site. This zinc-dependent metalloenzyme requires precise placement of multiple histidine and aspartate residues coordinating the catalytic zinc ions (blue sphere). Left and right show two independent generations with distinct overall folds, both faithfully reconstructing the full catalytic geometry (given side chain structures are shown as thick red sticks). The zoom-ins reveal the reconstructed active site residues surrounding the zinc center and the bound glutathione-derived substrate, confirming accurate motif reconstruction.

Interface Hydrogen Bond Optimization. Proteina-Complexa's inference-time optimization framework can include interface hydrogen bond energies alongside structure prediction rewards, allowing us to generate binders with enhanced interface hydrogen bonding and extended interaction surfaces. This highlights the generality of our test-time scaling approach: structure-based and physical energy-based rewards can be jointly optimized—a capability not available in prior hallucination methods, which only considered folding model scores. See Figure 5.

Figure 5. Interface hydrogen bond optimization (TrkA target). (a) A binder generated with hydrogen bond energy optimization forms extended interface with 15 hydrogen bonds (red, zoom-in), creating dense network of biophysical interactions across a large contact area. (b) A binder generated without hydrogen bond optimization still passes in-silico success criteria but is smaller, with only 1 interface hydrogen bond.

Fold Class-Conditioned Binder Generation. Previous protein generators often produce primarily alpha-helical outputs. By conditioning Proteina-Complexa on fold class labels, we can explicitly control the secondary structure composition of generated binders. This enables generating structurally diverse binders on demand, providing important control. See Figure 6.

Figure 6. Fold class-conditioned binder generation (IFNAR2 target). Left: mainly alpha-helical binder (purple helices). Center: mainly beta-sheet binder (gold strands). Right: mixed alpha-beta binder combining both secondary structure elements.

In silico Performance and Benchmarking

Generative Base Model. We first evaluate Proteina-Complexa's generative model without test-time optimization against publicly available generative baselines. For each method and target, we generate 200 binders and assess them using established in-silico success criteria based on structure prediction model confidence and alignment scores (see our paper for details), how often a method wins across targets, per-sample generation time, and novelty against PDB. Proteina-Complexa significantly outperforms all baselines on both protein and small molecule targets—even when using its own co-generated sequences directly, without ProteinMPNN-based redesign (Table 1, Table 2).

Table 1. Protein target benchmarking (19 targets, 200 samples each). Self: model-generated sequences. MPNN-FI: ProteinMPNN redesign with fixed interface. MPNN: full backbone redesign. Best in green.

Model	# Unique Successes ↑			# Times Best ↑			Time [s] ↓	Novelty ↓
Model	Self	MPNN-FI	MPNN	Self	MPNN-FI	MPNN	Time [s] ↓	Novelty ↓
RFDiffusion	—	—	4.68	—	—	3	70.8	0.87
Protpardelle-1c	—	—	0.73	—	—	0	8.13	0.77
APM	0.31	1.52	3.15	1	0	1	73.1	0.86
Complexa (ours)	9.10	13.6	14.4	14	14	14	15.6	0.80

Table 2. Small molecule target benchmarking (4 targets, 200 samples each). Unique successes per molecule. RFDiffusion-AllAtom uses LigandMPNN; Proteina-Complexa uses self-generated sequences.

Model	# Unique Successes ↑				Time [s] ↓	Novelty ↓
Model	SAM	OQO	FAD	IAI	Time [s] ↓	Novelty ↓
RFDiffusion-AllAtom	2	3	5	8	87.4	0.72
Complexa (ours)	10	6	17	19	13.5	0.71

Inference-Time Compute Scaling. We compare Proteina-Complexa's test-time scaling methods against hallucination baselines (BindCraft, BoltzDesign, AlphaDesign), plotting unique success rate as a function of compute. For easy protein targets, simple best-of-N sampling already outperforms all baselines; for hard targets, structured search (beam search, FKS, MCTS) is required. Across the board, hallucination methods perform poorly under matched compute budgets, while Proteina-Complexa's approaches consistently lead by a large margin. The same pattern holds for small molecule targets, where Proteina-Complexa far outperforms BoltzDesign (Figure 7, Figure 8).

Figure 7. Inference-time compute scaling for protein targets. Unique success rate vs. optimization time (GPU hours) for easy targets (left) and hard targets (right). Proteina-Complexa's search methods (colored curves) consistently outperform hallucination baselines (BindCraft, BoltzDesign, AlphaDesign) under normalized compute budgets.

Figure 8. Inference-time compute scaling for small molecule targets. Unique success rate vs. optimization time, averaged over four molecule targets. Proteina-Complexa's methods again substantially outperform BoltzDesign, the only available hallucination baseline for small molecules.

Enzyme Design (AME Benchmark). We evaluate Proteina-Complexa on the Atomic Motif Enzyme (AME) benchmark, where the model must design a protein that faithfully reconstructs a given active-site motif while accommodating the substrate molecule. The benchmark comprises 41 tasks with 1–7 catalytic residue islands of increasing difficulty. Proteina-Complexa significantly outperforms RFDiffusion2 on nearly all tasks, both with self-generated sequences and LigandMPNN-redesigned sequences (Figure 9).

Figure 9. AME enzyme design benchmark results. Number of unique successes per task (41 tasks, 100 samples each) for Proteina-Complexa vs. RFDiffusion2, comparing self-generated sequences, single LigandMPNN redesign, and best-of-8 LigandMPNN redesigns. Proteina-Complexa outperforms RFDiffusion2 on the vast majority of tasks across all evaluation settings.

2. Experimental Validation of Proteina-Complexa

Latent Generative Search unlocks de novo Design of Untapped Biomolecular Interactions at Scale

Kieran Didi^1,2,*, Danny Reidenbach^1,*, Matthew Penner³, Supriya Ravichandran⁴, Marshall Case⁴, Mike Nichols⁴, Erik Swanson⁴, Alex Reis⁴, Maggie Prescott⁴, Yue Qian⁵, Dongming Qian⁵, Jingjing Yang⁵, Weiji Li⁵, Le Li⁵, Daichi Shonai⁶, Sean Gay⁶, Bhoomika Basu Mallik⁷, Ho Yeung Chim⁷, Liurong Chen⁷, Miguel Atienza Juantay⁷, Hubert Klein⁷, Anna Macintyre⁸, Maxim Secor⁸, Daniele Granata⁸, Zhonglin Cao¹, Guoqing Zhou¹, Tomas Geffner¹, Xi Chen¹, Micha Livne¹, Zuobai Zhang¹, Tianjing Zhang¹, Kyle Gion¹, Michael M. Bronstein^2,9, Martin Steinegger¹⁰, Kristine Deibler⁸, Scott Soderling⁶, Alena Khmelinskaia⁷, Florian Hollfelder³, Christian Dallago^1,6, Emine Kucukbenli¹, Arash Vahdat¹, Pierce Ogden⁴, Karsten Kreis¹

¹ NVIDIA ² University of Oxford ³ University of Cambridge ⁴ Manifold Bio ⁵ Viva Biotech ⁶ Duke University ⁷ LMU Munich ⁸ Novo Nordisk ⁹ AITHYRA ¹⁰ Seoul National University

^* Equal contribution.

Paper

Abstract

In domains from game-play to language, the most powerful AI systems combine a generative model that learns a latent representation of the solution space, with adaptive search at inference time. This principle has not yet been realized in de novo protein design: generative methods produce structures in a single shot without optimizing them, while hallucination methods search over sequences without a learned generative prior to guide them. Finally, both approaches rely on separate models for sequence design. Here, we show that joint sequence and structure generation in a continuous latent space, combined with reward-guided inference-time search, fundamentally changes what protein design can achieve. In a massive-scale screening of over one million protein designs against 127 diverse and challenging targets, our approach, Proteina-Complexa, produced hits for 86 of them, demonstrating unmatched breadth. As part of this experimental campaign, we also carried out the largest experimental head-to-head benchmark of computational binder design methods to date. Proteina-Complexa produced more validated hits than any baseline method, with model generated sequences outperforming all baselines, including post hoc redesign, demonstrating that our underlying co-design of sequence, backbone, and side chains exceeds the quality of commonly used inverse folding models. Across five additional campaigns against challenging therapeutic targets, we achieve a 63.5% hit rate against PDGFR (top K_D = 93.6 pM), nanomolar Nipah virus binders, nanomolar binders to a muscle-wasting receptor that block myostatin signaling in cells, 40–50% hit rates for kinase mini-protein and peptide binders, and the first de novo carbohydrate binders—a target class thought to be inaccessible to current design methods. These results establish latent generative search as a new paradigm for protein design, opening target classes and scales previously out of reach.

This large-scale experimental validation effort of Proteina-Complexa is a cross-institutional collaboration led by NVIDIA with Manifold Bio, Viva Biotech, Novo Nordisk, Duke University, Cambridge University and LMU Munich. Here, we provide highlights; see the preprint for details.

Massive-Scale Benchmark: Testing 1 Million Binders for 127 Targets

We conducted a massive-scale multiplexed phage display campaign with all-to-all binding readout: more than one million designed proteins screened against 127 targets, including diverse, novel, and challenging ones.

Coverage of a 127-target panel. High-throughput screening of 467,176 Proteina-Complexa sequences across all targets and diverse hotspot combinations reveals broad coverage of the target landscape (Figure 10, Table 3). Proteina-Complexa produces on-design hits—binders that bind their intended target—for 86 of 127 targets (68%), with 74 of those yielding target-specific binders. Off-design binding—where a binder designed for one target binds a different one—is also pervasive: 126 of 127 targets have at least one specific off-design hit, indicating latent binding activity beyond the intended targets. These results indicate broad success of Proteina-Complexa across a diverse set of targets. See our preprint for detailed analyses of cross-reactivity, hotspot conditioning effects, and binding specificity patterns.

Figure 10. Proteina-Complexa results across 127-target panel. On-design hits per target, split into specific (dark) and poly-specific (light) binders.

Table 3. Proteina-Complexa coverage of the 127-target panel. On-design: binders that bind their intended target. Off-design: binders that bind any target. Specific: binding exactly one target. Poly: binding 2–4 targets. "Off-design" target counts mean that a binder designed for another target binds to the tested target.

	On-Design				Off-Design
	All	Specific	Poly		All	Specific	Poly
Targets Solved of 127	86	74	57		127	126	127

Method comparison: end-to-end codesign outperforms all baselines. Separately, for one top-ranked in silico hotspot per target, we also sampled and evaluated contemporary open baselines—RFDiffusion, RFDiffusion3, BindCraft, and BoltzGen—each given an approximately equal compute budget and testing the same number of selected candidates from each method's generated pool. This enables head-to-head comparison across 75 targets where at least one method produced an on-design hit (Figure 11, Table 4):

Proteina-Complexa with self-generated sequences achieved a 2.45% hit rate averaged over all targets—more than 3× higher than the next-best self-generated baseline (BoltzGen self, 0.76%) and nearly 1.5× the best redesigned baseline (BoltzGen + ProteinMPNN, 1.81%).
691 on-design hits (630 specific) from 25,707 tested sequences, compared to 514 (414) for BoltzGen+MPNN, 311 (257) for BindCraft, and 86 (83) for RFDiffusion3.
Self-generated sequences outperform re-design. For Proteina-Complexa, co-generated sequences (691 hits) outperform both ProteinMPNN-redesigned (365) and fixed-interface redesigned (374) sequences on the same backbones—the only method where this holds. This provides the first large-scale evidence that end-to-end codesign can eliminate the need for separate inverse-folding models.
High specificity. 91.2% of Proteina-Complexa's on-design hits are target-specific, meaning that a given on-design hit is very likely to bind only its intended target and not cross-react with others.

These results establish Proteina-Complexa as a state-of-the-art openly available method for de novo binder design.

Figure 11. On-design specific hit rate (%) for method–sequence re-design combinations. Methods grouped by re-design strategy: no re-design (native model output), partial re-design (fixed interface), and full re-design (dedicated inverse folding model or MPNN-generated sequences).

Table 4. On-design hits and specificity for each method–sequence re-design combination, evaluated on 75 targets where at least one method achieved an on-design hit (top in silico hotspot per target). Specificity = specific / all. (self) = native model output, (fixed) = fixed-interface redesign, (mpnn) = ProteinMPNN redesign, (IF-model) = dedicated inverse-folding model re-design. Best in green, second best underlined.

Method	On-Design			# Seqs.
Method	All	Specific	Specificity	# Seqs.
RFD (mpnn)	63	56	88.9%	17,607
RFD3 (mpnn)	86	83	96.5%	20,014
RFD3 (self)	16	10	62.5%	28,188
BindCraft (fixed)	311	257	82.6%	20,511
BoltzGen (mpnn)	514	414	80.5%	22,881
BoltzGen (IF-model)	375	300	80.0%	23,063
BoltzGen (self)	244	195	79.9%	25,509
Proteina-Complexa (mpnn)	365	314	86.0%	19,449
Proteina-Complexa (fixed)	374	334	89.3%	19,444
Proteina-Complexa (self)	691	630	91.2%	25,707

Picomolar PDGFR Binders

Beyond the large-scale 127-target screen, we tested Proteina-Complexa on individual targets with more careful candidate selection and filtering. PDGFR is a polar receptor previously targeted with specialized approaches such as beta-strand pairing due to its challenging surface. We generated 9,000 candidates across multiple hotspot combinations, applied a two-stage filtering pipeline (physicochemical properties, confidence metrics, and monomer structure validation), and selected 192 designs for experimental testing by surface plasmon resonance (SPR).

63.5% hit rate. Of 191 successfully expressed designs, 122 showed detectable binding to PDGFR—an exceptionally high hit rate for de novo binder design.
Picomolar affinity. Affinities ranged from 93.6 pM to 1.34 μM, with the strongest binder reaching the double-digit picomolar range (the adjacent animation shows the top picomolar binder).
See our preprint for detailed analyses of interaction types, structural diversity, and filtering metrics.

Binders against ActRIIA blocking Myostatin Signaling in Cells

Activin receptor type IIA (ActRIIA) is a high-affinity receptor for myostatin, activins, and GDF11 that suppresses skeletal muscle growth via Smad2/3 signaling. Blocking ActRIIA is an attractive strategy against muscle wasting—relevant to cancer cachexia, sarcopenia, and lean mass loss during GLP-1 agonist therapies. We designed de novo mini-binders targeting ActRIIA's protein-protein interaction interface (the interactive viewer and Figure 12 show the top 5 binders with functional downstream effects):

Design. Seven rounds varying search algorithms (beam search, Feynman–Kac steering, MCTS) and hotspot conditioning. Top 200 candidates after filtering and scaffold-level clustering; sequences co-generated without redesign.
Expression & binding. 192 of 200 designs expressed; 16 bound ActRIIA by SPR, 8 with sub-µM affinity. Tightest binder: #51 at K_D = 36 nM. Overall 8.3% hit rate from raw computational designs.
Functional validation. Selected binders blocked myostatin- and activin A-induced Smad2/3 signaling in a cellular reporter assay. Designs #51 and #104 achieved IC₅₀ values of 169 nM and 228 nM for myostatin.

Figure 12. De novo designed ActRIIA binders. (a) Predicted structures of binders with nanomolar affinity. (b) SPR sensorgrams with K_D values. (c) Inhibition of activin A- and myostatin-induced Smad2/3 signaling in cells.

Designing Peptides and Minibinders against Kinase Targets

PAK1 binders

CK1δ binders

Protein kinases govern nearly all biological signaling through phosphorylation of serine, threonine, and tyrosine residues, yet genetically encoded tools for interrogating their catalytic activity remain limited. We designed de novo binders targeting the catalytic domains of two kinases—PAK1 and CK1δ—spanning two distinct size regimes: conventional mini-protein binders for PAK1 and short peptide binders for CK1δ (see interactive structures and Figure 13):

PAK1 mini-protein binders (49–74 amino acids). PAK kinases are central effectors of Rho-family GTPases regulating cytoskeletal dynamics and cell survival. We generated 50 mini-protein candidates and screened them via a split-NeoR protein-fragment complementation assay in E. coli. NGS analysis classified 20 of 50 designs (40%) as enriched. Four candidates were validated by co-immunoprecipitation in HEK293T cells: Pk3 and Pk4 each achieved ~4.5-fold enrichment (p<0.01 and p<0.05), while Pk6—at the bacterial selection boundary—still showed significant enrichment (2.8-fold, p<0.01).
CK1δ peptide binders (<31 amino acids). CK1δ plays critical roles in circadian rhythm regulation, Wnt signaling, and neurodegeneration. To test whether Proteina-Complexa can design functional binders in the short-peptide regime, we synthesized 18 peptides via Fmoc solid-phase synthesis, each bearing a biotinylated lysine for streptavidin pull-down against purified GST-tagged CK1δc. 9 of 18 (50%) achieved significant enrichment over the scrambled control. Top hits: CK3 (6.5-fold, p=0.001), CK9 (5.5-fold), CK7 (4.5-fold). A 50% hit rate for peptides under 31 residues—where reduced conformational complexity and minimal binding surface area make design particularly challenging—underscores the generality of the latent codesign framework across the full protein-to-peptide size spectrum.

Figure 13. Kinase binder validation. (Left) PAK1: co-immunoprecipitation enrichment of four selected binders in HEK293T cells. (Right) CK1δ: pull-down enrichment of 18 peptide binders normalized to scrambled control.

De Novo Design and Binder Re-Engineering for the Nipah Virus

De novo binder engaging NiV-G. Hotspot residues in red. Side chains shown for binder and hotspots. Drag to rotate.

The Nipah virus attachment glycoprotein (NiV-G) is a six-bladed beta-propeller that mediates host cell entry by engaging the ephrin-B2 and ephrin-B3 receptors. Its receptor-binding site is a critical neutralization epitope, but the pocket is recessed and partially occluded, making it a challenging design target. As part of the Adaptyv binder competition, we tested Proteina-Complexa in two complementary modes and successfully hit this difficult epitope in both (see interactive structure of the de novo design in the adjacent viewer; quantitative results in Figure 14):

De novo design. 14 candidates generated from scratch, conditioned on hotspot residues at the receptor-binding site and selected after filtering. One hit identified with K_D = 56 nM—demonstrating that Proteina-Complexa can target difficult viral epitopes without any starting scaffold.
Diffuse-denoise re-engineering. An existing scaffold is partially noised and jointly re-generated in backbone and sequence—true structure-sequence codesign, not fixed-backbone optimization. 5 of 6 candidates showed nanomolar binding (1–21 nM).

Figure 14. De novo design and binder re-engineering for NiV-G. (a) Generated de novo binder; zoom-ins into binding pocket and interface hydrogen bonding (K_D = 56 nM). (b) Re-engineered binder via diffuse-denoise protocol: an existing scaffold is partially noised and re-generated. (c) SPR sensorgrams for all binders, with equilibrium dissociation constants (K_D) annotated. Binders show nanomolar affinities.

De Novo Binder Design for Carbohydrates

Carbohydrates are small, densely polar, and present hydroxyl-rich surfaces with no hydrophobic character—no computational method had previously designed a protein that binds a free carbohydrate. We targeted the blood group B antigen, a trisaccharide central to ABO transplant compatibility (see Figure 15 and confirmed hit in the adjacent animation):

Design. 24 designs conditioned on the sugar structure with pocket burial and hydrogen bond rewards; most expressed as soluble proteins in E. coli.
Binding. 5 of 24 designs agglutinated type B red blood cells (2.6–3.6× signal vs. 1.3× positive control)—21% hit rate from a single design round.
Validation. Biolayer interferometry on top hit NV15 confirmed direct, concentration-dependent carbohydrate binding; circular dichroism showed stability beyond 95 °C.

To our knowledge, these are the first de novo designed proteins that bind a free carbohydrate.

Figure 15. De novo binders for the blood group B carbohydrate. (Top) Agglutination signal for all 24 designs; four hits substantially exceed the positive control. (Bottom) Biolayer interferometry sensorgram for NV15 confirming direct, concentration-dependent carbohydrate binding.

Citations

Proteina-Complexa Model:

@inproceedings{didi2026proteinacomplexa,
  title        = {Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute},
  author       = {Kieran Didi and Zuobai Zhang and Guoqing Zhou and Danny Reidenbach and Zhonglin Cao and Sooyoung Cha and Tomas Geffner and Christian Dallago and Jian Tang and Michael M. Bronstein and Martin Steinegger and Emine Kucukbenli and Arash Vahdat and Karsten Kreis},
  booktitle    = {The Fourteenth International Conference on Learning Representations (ICLR)},
  year         = {2026}
}

Experimental Validation:

@misc{didi2026complexavalidation,
  title        = {Latent Generative Search unlocks de novo Design of Untapped Biomolecular Interactions at Scale},
  author       = {Kieran Didi and Danny Reidenbach and Matthew Penner and Supriya Ravi and Marshall Case and Mike Nichols and Erik Swanson and Alex Reis and Maggie Prescott and Yue Qian and Dongming Qian and Jingjing Yang and Weiji Li and Le Li and Daichi Shonai and Sean Gay and Bhoomika Basu Mallik and Ho Yeung Chim and Liurong Chen and Miguel Atienza Juantay and Hubert Klein and Anna Macintyre and Maxim Secor and Daniele Granata and Zhonglin Cao and Guoqing Zhou and Tomas Geffner and Xi Chen and Micha Livne and Zuobai Zhang and Tianjing Zhang and Michael M. Bronstein and Martin Steinegger and Kristine Deibler and Scott Soderling and Alena Khmelinskaia and Florian Hollfelder and Christian Dallago and Emine Kucukbenli and Arash Vahdat and Pierce Ogden and Karsten Kreis},
  howpublished = {\url{https://research.nvidia.com/labs/genair/proteina-complexa/assets/proteina_complexa_validation.pdf}},
  year         = {2026}
}