Peptide Vaccine Technology

T-Cell Epitope Prediction: Computational Vaccine Design

13 min read|March 25, 2026

Peptide Vaccine Technology

30,000+ MHC alleles

Pan-specific algorithms like NetMHCpan can predict peptide binding across more than 30,000 MHC alleles, enabling vaccine design for any human population without experimental testing of each allele individually.

Buckley et al., Briefings in Bioinformatics, 2022

Buckley et al., Briefings in Bioinformatics, 2022

Diagram showing computational prediction of peptide-MHC binding for T-cell epitope identificationView as image

Every T-cell response starts with a peptide. A short fragment of a pathogen protein, typically 8-11 amino acids for CD8+ T cells or 13-25 amino acids for CD4+ T cells, gets loaded onto an MHC molecule on the surface of an antigen-presenting cell. If a T cell recognizes that peptide-MHC complex, the immune response activates. If the wrong peptide is chosen for a vaccine, nothing happens. The entire success of a peptide vaccine depends on selecting fragments that will actually be presented by MHC molecules and recognized by T cells. For how this fits into the broader vaccine design process, see our guide to peptide vaccine design and the pillar article on self-assembling peptide nanoparticle platforms.

For decades, identifying these peptides required laborious experimental testing: synthesizing candidate peptides, testing them in binding assays, and measuring T-cell responses in cell culture. Computational epitope prediction changed this. Algorithms trained on experimental binding data can now predict which peptides from any protein will bind MHC molecules, with accuracy that rivals experimental measurement for well-characterized MHC alleles.

Key Takeaways

  • Computational epitope prediction tools can identify peptides likely to bind MHC molecules and trigger T-cell responses, reducing vaccine design from years of experimental work to days of computation
  • NetMHCpan, the most widely used tool, uses neural networks trained on peptide-MHC binding data to predict binding across 30,000+ MHC alleles, including those with no experimental training data
  • Transformer-based deep learning models (2024-2026) now outperform NetMHCpan in benchmark comparisons, achieving higher accuracy by jointly modeling peptide and MHC features
  • A 2017 study found that only 8% of computationally predicted high-affinity MHC binders actually triggered T-cell responses, revealing a gap between binding prediction and immunogenicity
  • The ProTECT pipeline (2020) integrates epitope prediction with tumor mutation data to identify patient-specific neoantigen targets for personalized cancer vaccines
  • Prediction accuracy differs substantially between MHC class I (CD8+ T cells, better predicted) and MHC class II (CD4+ T cells, harder to predict due to variable-length binding groove)

The Biology Behind the Algorithm

T-cell epitope prediction is fundamentally a protein-protein interaction problem. The MHC molecule has a binding groove, a cleft formed by two alpha-helices sitting on a beta-sheet floor. Peptide fragments from degraded proteins are loaded into this groove inside the cell and then displayed on the cell surface for T-cell inspection.

MHC class I (recognized by CD8+ cytotoxic T cells) has a closed binding groove that accommodates peptides of 8-11 amino acids. The peptide is anchored at specific positions (typically positions 2 and 9 for a 9-mer) by "anchor residues" that fit into pockets in the groove floor. The identity of these anchor residues varies between MHC alleles, which is why different people respond to different peptides from the same pathogen.

MHC class II (recognized by CD4+ helper T cells) has an open-ended groove that accommodates peptides of 13-25 amino acids. The binding core is typically 9 residues, but the peptide can extend beyond the groove at both ends. This variable-length binding makes MHC-II prediction harder than MHC-I prediction.

Florea and colleagues published one of the earliest comprehensive reviews of epitope prediction algorithms in 2003, establishing the computational framework that all subsequent tools build upon.[1] The fundamental approach is the same today: train a model on experimental peptide-MHC binding data, then use that model to score new peptide sequences for their binding likelihood.

NetMHCpan: The Gold Standard

NetMHCpan is the most widely used and cited epitope prediction tool. Developed at the Technical University of Denmark, it uses artificial neural networks trained on curated datasets of experimentally measured peptide-MHC binding affinities.

The "pan" in NetMHCpan refers to its pan-specific approach: rather than training separate models for each MHC allele, it trains a single model that takes both the peptide sequence and the MHC allele sequence as inputs. This allows predictions for any MHC allele, including the thousands of rare alleles for which no experimental binding data exists. The model learns the general rules of peptide-MHC interaction from well-characterized alleles and transfers this knowledge to uncharacterized ones.

NetMHCpan 4.1 and its successors integrate expanded training data and refined neural network architectures. The tool predicts both binding affinity (how tightly the peptide binds) and eluted ligand likelihood (the probability that the peptide would be naturally presented on the cell surface). These are related but not identical: a peptide may bind MHC tightly in vitro but never be generated by the cellular proteasome or transported into the ER where MHC loading occurs.

Buckley and colleagues (2022) systematically evaluated the performance of existing computational models in predicting CD8+ T-cell pathogenic epitopes, finding that while binding prediction accuracy is generally high for common MHC alleles, there are substantial performance differences between tools and significant drops in accuracy for rare alleles and for distinguishing binders from immunogens.[2]

The Prediction-Reality Gap

The most important limitation in the field was quantified by Schmidt and colleagues (2017), who compared computational predictions with experimental T-cell responses and found a striking divergence.[3]

Of peptides predicted to bind MHC with high affinity, only approximately 8% actually triggered T-cell responses in cell-based assays. This means 92% of computationally predicted "good" epitopes failed to produce an immune response.

The gap has multiple causes:

Binding is necessary but not sufficient. A peptide must bind MHC, but it must also be recognized by a T-cell receptor (TCR). Binding prediction does not predict TCR recognition. The T-cell repertoire in any individual is shaped by thymic selection (which eliminates T cells reactive to self-peptides) and prior immune experience.

Antigen processing matters. The cellular proteasome must cleave the parent protein to generate the exact peptide fragment predicted. The TAP transporter must move that fragment into the ER. The peptide must compete with other peptides for MHC loading. Each step reduces the set of peptides that are actually presented.

Immunodominance hierarchy. Even among peptides that are presented and can be recognized, T-cell responses concentrate on a small number of "immunodominant" epitopes. Subdominant epitopes may bind MHC and be presented but fail to elicit measurable responses.

This gap is why the field has moved beyond pure binding prediction toward integrated pipelines that model the entire antigen presentation pathway.

Deep Learning: The Current Frontier

The period from 2024 to 2026 has seen a paradigm shift from traditional neural networks to transformer-based deep learning models for epitope prediction.

Cheng and colleagues (2026) reviewed the progress of deep learning in CD8+ T-cell epitope prediction, documenting how transformer architectures (originally developed for natural language processing) have been adapted for protein sequence analysis.[4]

Key advances include:

Protein language models. Models pre-trained on millions of protein sequences (ESM, ProtTrans) learn general features of protein biochemistry. Fine-tuning these models on peptide-MHC binding data achieves higher accuracy than models trained from scratch on binding data alone.

Cross-attention mechanisms. The CapTransformer and similar models jointly model peptide and MHC features using cross-attention, allowing the model to learn which peptide positions are most important for binding to which MHC groove positions. This architecture outperformed NetMHCpan 4.0 in benchmark comparisons.

Immunogenicity prediction. Newer models attempt to predict not just MHC binding but actual T-cell immunogenicity, directly addressing the prediction-reality gap. These models incorporate features of TCR recognition alongside MHC binding, though accuracy remains substantially lower than for binding prediction alone.

Cancer Application: Neoantigen Prediction

The most clinically impactful application of epitope prediction is in personalized cancer vaccines. Tumor cells accumulate mutations that create "neoantigens": peptides derived from mutated proteins that the immune system has never seen and therefore has not been tolerized against.

Rao and colleagues (2020) developed ProTECT (Prediction of T-Cell Epitopes for Cancer Therapy), an integrated pipeline that takes whole-exome and RNA sequencing data from a patient's tumor, identifies somatic mutations, predicts which mutant peptides will bind the patient's specific MHC alleles, and ranks candidates for vaccine inclusion.[5]

Morisaki and colleagues (2021) demonstrated clinical application: intranodal administration of neoantigen peptide-loaded dendritic cell vaccines, where the peptides were selected by computational prediction, elicited epitope-specific T-cell responses in cancer patients.[6] This represents the full pipeline from prediction to patient: sequence the tumor, predict the epitopes computationally, synthesize the peptides, load them onto dendritic cells, and inject.

For how these personalized neoantigen approaches work clinically, and how tumors evolve to evade peptide vaccines through immune escape, the dedicated articles cover each topic in depth.

Infectious Disease Applications

Epitope prediction is equally valuable for pathogen-targeted vaccines.

Jabbar and colleagues (2018) used computational prediction to identify antigenic peptides from HPV E6 and E7 oncoproteins, selecting candidates for therapeutic HPV vaccine design that target the viral proteins responsible for cervical cancer.[7] Computational prediction reduced the candidate pool from thousands of possible peptide fragments to a manageable number for experimental validation.

Firuzpour and colleagues (2025) applied immunoinformatics to design a multi-peptide HER2 vaccine against breast cancer, using multiple prediction tools in combination to increase confidence in epitope selection.[8]

The COVID-19 pandemic accelerated the field dramatically. Within weeks of the SARS-CoV-2 sequence being published, computational groups worldwide had predicted T-cell epitopes from the spike protein and other viral proteins. These predictions guided rapid vaccine development and helped researchers understand why different individuals mounted different immune responses to infection and vaccination.

Emerging Delivery and Combination Approaches

Epitope prediction does not end at peptide selection. The predicted epitopes must be delivered effectively to generate immune responses.

Yang and colleagues (2025) described supramolecular peptide hydrogel epitope vaccines functionalized with CAR-T cells, combining computationally predicted epitopes with engineered T-cell therapy and a self-assembling peptide delivery platform.[9] This triple-component approach illustrates how epitope prediction feeds into increasingly sophisticated therapeutic platforms.

Shen and colleagues (2026) demonstrated synergistic immune protection using exosomal T-cell epitope vaccines combined with antibody-inducing vaccines, showing that predicted CD8+ epitopes delivered via exosomes enhanced the overall immune response beyond what either component achieved alone.[10]

The reverse vaccinology approach provides the genome-level framework that feeds into epitope prediction, while adjuvant selection determines whether predicted epitopes can actually generate protective immunity in vivo.

What the Tools Cannot Do Yet

Predict T-cell receptor recognition. Binding to MHC is well-modeled. TCR recognition of the peptide-MHC complex is not. The TCR repertoire varies between individuals, is shaped by immune history, and involves contacts with both the peptide and the MHC molecule in a three-body interaction that current models handle poorly.

Predict immunodominance. Which of several presented epitopes will dominate the immune response is influenced by factors beyond MHC binding: competition between epitopes, regulatory T-cell suppression, prior immune exposures, and stochastic elements of T-cell activation.

Handle post-translational modifications. Most prediction tools model only the 20 standard amino acids. Phosphorylation, glycosylation, citrullination, and other modifications that alter peptide-MHC binding are poorly represented in training data.

Predict population-level coverage. While pan-specific models can predict binding for individual alleles, predicting whether a vaccine will generate protective immunity across a genetically diverse population requires modeling MHC allele frequencies, epitope promiscuity, and herd immunity thresholds, a systems-level problem that current tools address only partially.

The Bottom Line

T-cell epitope prediction has evolved from simple sequence motif scanning to deep learning models that predict peptide-MHC binding across 30,000+ alleles. The tools are now accurate enough to drive clinical applications in personalized cancer vaccines and rapid pathogen vaccine design. The critical remaining gap is between binding prediction (which works well) and immunogenicity prediction (which does not), reflecting the unsolved problem of T-cell receptor recognition. Deep learning advances are narrowing this gap, but experimental validation remains essential for clinical translation.

Frequently Asked Questions