Hard masked PAR regions in chromosome Y

The mammalian Y chromosome contains regions that are identical to the X chromosome called pseudoautosomal regions (PARs). These regions allow for recombination between the sex chromosomes. When the human Y chromosome was sequenced and assembled, the PAR regions were not sequenced, and therefore were not included in the assembly. Instead, the corresponding sections from the X chromosome sequence were copied onto the Y chromosome. This sequence duplication must be considered when sequence analysis is performed by the software so that allelic duplication can be distinguished from other types of duplications such as repeats and segmental duplication.

When the female DNA sample is sequenced, reads from the PAR regions align to both the X and the Y PAR sequences. This alignment affects the mapping quality of the reads in these regions and creates problems with variant calling on the gender chromosomes. For this reason, the PAR sequence on the Y chromosome is replaced with 'N', or "hard masked", in the hg19 reference. In the GRCh37 reference, the PAR sequence is unmasked. Hard masking the PAR sequence on the Y chromosome preserves the PAR coordinates on the Y chromosome and eliminates the duplication at this locus. The Y chromosome in the hg19 assembly contains two PAR regions that are taken from the corresponding regions in the X chromosome and have identical DNA sequences.

Chromosome Y PAR coordinates

Corresponding chromosome X PAR coordinates

10,001–2,649,520

60,001–2,699,520

59,034,050–59,363,566

154,931,044–155,260,560