While the first reference human genome is more than 20 years old (Skaletsky et al, 2003), the reference human genome GRCh38 Y chromosome was missing almost half of its 57.2 Mb. But the August 23 online publication of the journal Nature has changed this status quo with two papers that have now significantly advanced our understanding of the human Y chromosome. The Y chromosome has been resistant to sequencing and understanding because it is replete with repeated, duplicated, inverted, and palindromic DNA elements. This work was achievable, in particular, due to recent advances in long-read sequencing by Pacific Biosciences with its high fidelity reads (HiFi) technology and Oxford Nanopore with its ultra long reads (ONT) technology, as well as methods development in genome assembly automation.
The now available Y chromosome reference sequence will allow us to study and better understand different disorders linked to the Y chromosome. With this blog post we wanted to share a quick dive into the specifics of these two papers: The complete sequence of a human Y chromosome and Assembly of 43 human Y chromosomes reveals extensive complexity and variation.
Complete Sequence of a Human Y Chromosome
In the first paper, the Telomere-to-Telomere consortium (T2T) published a complete sequence of over 62 million base pairs (Mbp) of the human Y chromosome (T2T-Y) from the HG002 genome (Rhie et al., 2023). The T2T-Y adds over 30 Mbp of new sequence not contained in the established reference GRCh38 Y chromosome which was first release in 2013. In addition to correction of multiple errors, the new sequence completely defines the ampliconic structure of TSPY, DAZ, and RBMY gene families which encode critical Y-specific proteins. 41 additional protein-coding genes, mostly from the TSPY family were also elucidated. Furthermore, the complete sequence of the centromeric and q-arm heterochromatin are noteworthy as those sequences are highly repetitive and notoriously difficult to sequence with older technologies.
The consortium combined this new T2T-Y with their existing CHM13 genome to create the new reference standard T2T-CHM13+Y. The outcome has important benefits:
- A new reference is now available which is inclusive of all human chromosomes
- A significant resource is now available to the field of genomics going forward
- T2T-Y will lead to improved variant calling in genome association studies
- Results will support error checking of other organisms genomic assembly. This is important as human DNA sequence can be a significant contamination in assembled genomes of other species. For example, fragmentary human DNA has lead to mistaken annotations of thousands of spurious bacterial proteins in existing databases.
Another key advancement realized during the T2T-Y assembly was the creation of the Verkko assembler which automates incorporation of HiFi and ONT data in genome assembly. This assembler was used in the second Y chromosome Nature paper.
Comparison of 43 Human Y Chromosomes
In the second paper, the Jackson Laboratory for Genomic Medicine published the de novo assembly and comparative analysis of 43 human Y chromosomes (from the 1000 Genomes Project dataset) from individuals with diverse geographical origins (Hallast et al., 2023). Compared to the GRCh38 Y and T2T-Y assemblies (which are predominantly European in origin and thus represent perhaps the last 50,000 years of evolution) the assemblies from this new study includes some of the deepest-rooted human Y chromosome lineages from Africa and thus represents over 180,000 years of evolutionary change. One striking result from this comparative analysis is the huge variation in size of the Y chromosomes. The 43 Y chromosomes ranged in size from 45.2 to 84.9 Mbp with greatest variation in size occurring in the Yq12 heterochromatic arm (17.6 to 37.2 Mbp), the Y centromere (2-3.1 Mbp) and the DYZ19-repeat arrays (0.06 to 0.4 Mbp). The euchromatic regions of Y showed much less size variation with the exception of one part of the TSPY repeat array which shows significant copy number variation. Within the Yq12 subregion the authors found considerable expansion and contraction in the DYZ1 and DYZ2 repeat units which constitute the region but with the retention of a nearly 1:1 ratio indicating some conserved functional role.
As well as the direct findings of the comparison, the second paper contributes more insights into the function of the Y chromosome. The authors present evidence that the PAR1 (pseudoautosomal region 1) region of Y may be significantly different in size compared to the original model with its defining border moving perhaps 0.5 Mbp. The comparisons of the centromeric structures also showed that the definitional HOR (higher order repeat) has undergone evolution from a distinct ancestral 36-mer to the modern 34-mer which predominates. Indeed, this along with the previously described Yq12 results helps give direction to the future investigation into the evolution of specialized repetitive DNA elements.
Figure 1: De novo assembly outcome with (a) showing the structure of the human Y chromosome on the basis of the GRCh38 Y reference sequence, (b) showing the phylogenetic relationships, (c) showing the proportion of contiguously assembled Y-chromosomal subregions across 43 samples, and (d) showing the geographical origin and sample size of the included 1000 Genomes Project samples. (Image Credit: Hallast et al., 2023)
With this new work in hand, it will hopefully lead to a better understanding of linkage between specific human traits and Y chromosomal sequence. There are already known health consequences to changes in Y chromosome. Some cancers have links to Y chromosome gene changes and loss of Y chromosome is observed in multiple tumor types. Indeed, as humans age it is known that some cells can lose Y chromosome content altogether without the mechanism or effects being well understood. This work should lead to a better understanding of human disease and aging and should also provide significant insight into human evolution and population dynamics.
Hallast et al., Assembly of 43 human Y chromosomes reveals extensive complexity and variation. (2023) Nature, Aug 23. doi: 10.1038/s41586-023-06425-6.
Rhie et al., The complete sequence of a human Y chromosome. (2023) Nature, Aug 23. doi: 10.1038/s41586-023-06457-y.
Rautiainen, et al., Telomere-to-telomere assembly of diploid chromosomes with Verkko. (2022) Nat Biotechnol, Feb 16;10.1038/s41587-023-01662-6.
Skaletsky et al., The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. (2003) Nature, Jun 19;423(6942):825-37.