Exciting Advancements in AI Genomic Modeling

A recent paper that has been generating energetic discussions and debate about Artificial Intelligence (AI), and its implications for biological discovery was published on the bioRxiv site on Feb 21 of this year (Brixi et al., 2025). The group is a collaboration between the Arc Institute, Stanford University, Berkeley (UCB), Columbia University, and industry partners. The authors announced the creation of Evo 2, a biological foundation model trained on over 9.3 trillion base pairs of DNA of curated sequences from bacteria, archaea, eukarya and bacteriophages, but notably not including viruses that infect eukaryotic hosts – in other words, 100,000 species across the entire tree of life. The Evo 2 code is publicly accessible via Arc’s GitHub (see Figure 1). This genomic language model is a significant improvement compared to the Evo 1 architecture, which was announced in 2024 and solely trained on prokaryotic DNA sequences (Nguyen et al., 2024). Evo 2 has some very impressive capabilities for both the investigation of biological mechanisms and for generative design of novel DNA sequences. The computational improvements that have made this possible are partially due to the use of the StripedHyena 2 model architecture, which is a multi-hybrid model architecture of convolutional operators and attention. Though, the details are beyond the scope and ken of this review, the StripedHyena 2 architecture contributes to the ability of Evo 2 models to handle larger numbers of parameters (up to 40 billion) and longer DNA context lengths (up to 1 million bases). While the publication by Brixi et al. (2025) has not yet been peer-reviewed, and as such the full import and fulfillment of the promise will come later, the potential significance as indicated by the authors will have positive repercussions for years to come.

Figure 1: Evo 2 is trained on data encompassing trillions of nucleotide sequences from all domains of life. Each UMAP point indicates a single genome. (Credit and source: Brixi et al., 2025)

Evo 2 Features Are Revealed by Sparse Autoencoder (SAE)

Though Evo 2 has many interesting properties, one that is particularly fascinating is the presence of latent dimensions or ‘features’ of the model that correspond to different concepts of biology, such as intron-exon boundaries, transcription factor binding sites, and protein structural motifs, among others (Brixi et al., 2025). It is common for large language models (LLMs) to be a black box with important features hidden or difficult to interpret by non-experts. These features in the Evo 2 architecture are revealed by the use of an artificial neural network subtype known as the sparse autoencoder (SAE). The SAE decomposition is so powerful, as it exposes for anyone the layers of latent information that are normally buried and thus not human-interpretable. Sparse Autoencoders (SAEs) are a way to extract interpretable features from a model, but since it is not the focus of this review, a good explanation can be found elsewhere.

Sparse Autoencoder Features

For the purposes of seeing ‘under the hood’ of what Evo 2 knows, the mechanism is less interesting than the fact that some of these SAE features correspond to biological concepts that the model was not specifically ‘taught!’ An example is f/19746, which correlates highly with phage-derived spacer sequences within CRISPR arrays. Two other very striking examples are f/28741 and f/22326, which track closely with α-helix and β-sheet structures in encoded proteins, demonstrating Evo 2‘s understanding of protein structures and boundaries of coding sequences without explicitly being taught! The authors demonstrate the usefulness of these SAE features to annotate a stretch of woolly mammoth chromosome for the coding sequence, exon start and stop boundaries, and full intron definitions. Further investigation of these SAE features could conceivably help many researchers looking for signals in DNA sequences relevant to their particular area of study and biology context. All one needs to start to investigate is DNA, RNA, or protein sequences that align with the facet of interest and access to the online version of the database.

Evo 2 Accurately Predicts the Functional Impacts of Genetic Variations

Perhaps less unconstrained and with more immediate relevance than SAE layers is Evo 2‘s ability to help clinicians in the variant effect predictions realm. Here, the proximal results from Evo 2 are more mixed (Brixi et al., 2025). When a diverse range of human genome variants were evaluated for pathogenicity, Evo 2 did not outperform other field standard models, for example, when calling SNVs (single nucleotide variations) located within the bounds of the coding region of a gene. However, Evo 2 was superior in noncoding regions and with mutations more complex than single nucleotide substitutions. While these zero-shot performances were not industry beaters, the reasonable performances with no specific training encouraged the authors to leverage Evo 2 capabilities by augmenting it with specific training on variants. This supervised version of Evo 2 performed better than the standard when classifying a set of BRCA 1 mutations of unknown significance in breast cancer. This supports the hypothesis that the base case is good and even better than the field standard in noncoding regions and easily trainable to be a variant classifier (i.e., predicting the pathogenic effects of noncoding variation) with significant improvements to existing state-of-the-art variant classifiers.

Functional Assessment of Protein and RNA Changes

Related to clinical variant prediction, but considerably broader in scope, is the assertion that Evo 2 can assess mutational effects across all three domains: prokaryotes, archaea, and eukaryotes (Brixi et al., 2025). A striking initial demonstration of this is shown by Evo 2 finding mutation probabilities that line up with 5’ UTR versus coding sequence. Evo 2 even sees a mutation effect change based on the three base periodicities, even though the model is not specifically trained to recognize either coding sequences or codon structure. Evo 2 also demonstrates mutation effect predictions in line with accepted norms of biological constraints. For example, it predicts changes in rRNA or tRNA sequences to be significantly more sensitive than intergenic or noncoding regions. It also recapitulates that synonymous changes are more tolerated than non-synonymous, frameshift, or premature stop mutations. A wide range of diverse deep mutational scanning (DMS) assays was used to evaluate Evo 2’s performance and comparing it to field standard tools. The outcome demonstrated, yet again, that Evo 2 is comparable or even better than established field standard tools (Brixi et al., 2025). Specifically, Evo 2 is adept at calling mutation sensitivity in both bacterial and human proteins, as well as noncoding RNA (ncRNA). The group also showed that with additional training, Evo2 is a better classifier for intron/exon structure compared to industry standards. Lastly, the authors showed that Evo 2 is capable of evaluating the essential versus non-essential nature of genes in prokaryotes and even long ncRNA (lncRNA) in humans. As before, these performances are striking but not necessarily a paradigm shift in abilities. However, when considered against a field standard prediction tool, Evo 2 performs as well or better without specific biological constraint instruction, which is quite impressive.

Evo 2 Can Model Genes, Genomes, and Chromosomes Starting with Seed Sequences

The last area of Evo 2 skills we will look at is its generative ability to forecast new sequences from given sequence prompts (Brixi et al., 2025). As the authors directly state, Evo 2 is a generative model trained to predict the next base pair in a sequence, and at first examination, this might not seem significant. The authors gave Evo 2 conserved upstream and part coding gene sequences and found that Evo 2 generated high accuracy complete genes (in both prokaryotes and eukaryotes) and better than its predecessor Evo 1. They also found that Evo 2 could create plausible synthetic mitochondrial genomes from prompt sequences. In extending this to very long context, Evo 2 was able to generate plausible bacterial genomes from the prompt, as well as reasonable yeast chromosomes, starting with an appropriate eukaryotic prompt sequence. While these capabilities are interesting, it is also harder to conceptualize future use cases. That is until you look at the paper describing the previous Evo 1 (Nguyen et al., 2024) and the intermediate Evo 1.5 (Merchant et al., 2024) models. With Evo 1, the model could create new synthetic CRISPR-Cas sequences by simply being prompted (Nguyen et al., 2024). This allowed wholesale creation of new protein-RNA complexes to hit different targets and potentially create new biological tools. Researchers of a related article (Merchant et al., 2024) were able to use their model to create new bacterial toxin-antitoxin pairs and new anti-CRISPR proteins from the prompt sequence. Not only might this allow the creation of new biotechnology applications, but it also allowed the authors to highlight that some of these newly created molecules highlight previously unappreciated facets of the underlying biology.

Concluding Notes

Evo 2 is a new high-water mark and significant leap forward as a biological AI foundation model, supplanting its elder sibling Evo 1 (published in 2024). Evo 2 boasts two impressive versions, having 7 billion and 40 billion parameters, respectively, and supporting a context length of 1 million base pairs. Evo 2’s capabilities include predicting and generating novel genetic sequences tailored to specific goals, potentially revolutionizing the creation of new molecular tools. Evo 2 is also positioned to leverage biological mechanism discovery comprehensively across various domains of the tree of life. Particularly, when augmented with specific domain training, Evo 2 has the potential to become the new paradigm for scoring disease-causing mutations, and specifically mutations not confined to DNA coding regions. Evo 2 is especially well suited to help decipher the mysteries contained within noncoding DNA regions, for example, the complex and fascinating worlds of lncRNAs and small nucleolar RNAs (snoRNAs).

An aspect that sets Evo 2 apart is its accessibility to the wider community as an open-source AI model. By offering transparency through publicly available weights, data, and training infrastructure, Evo 2 fosters a collaborative environment, supporting innovation within the biological research landscape. The availability of Evo 2 as an open-source resource signifies a significant stride towards collective advancement and knowledge sharing in the field of AI-driven biology.

References

Brixi G. et al., Genome modeling and design across all domains of life with Evo 2. (2025) bioRxiv, doi: https://doi.org/10.1101/2025.02.18.638918.

Nguyen E. et al., Sequence modeling and design from molecular to genome scale with Evo. (2024) Science, Nov 15;386(6723).

Merchant, A et al., Semantic mining of functional de novo genes from a genomic language model. (2024) bioRxiv, doi: https://doi.org/10.1101/2024.12.17.628962.