Aligned read length calculation

The aligned length of a read at a given accuracy threshold is defined as the greatest position in the read at which the accuracy in the bases, up to and including the position, meets the accuracy threshold. Accuracy is specified using the Phred -10log10 transformation, where 20 represents an error rate of 1%, and 17 represents an error rate of 2%.

In Torrent Suite™ Software, the alignment quality (AQ) score represents alignment quality for the total number of bases that are sequenced or for consecutive sequencing reads. The AQ20 length is the greatest length at which the error rate is 1% or less, and the AQ17 length is the greatest length at which the error rate is 2% or less. The ideal length is the longest perfectly aligned segment. The AQ score for the total number of bases represents the number of all aligned bases in the sequencing reaction that meet a specific AQ score.

For all of these calculations, the alignment is constrained to start from position 1 in the read - that is, no 5' clipping is allowed. The underlying assumption is that the reference to which the read is aligned represents the true sequence that is seen.

Appropriate caution must be taken when values for the AQ score are interpreted for situations in which the sample that is sequenced has substantial differences relative to the reference used. For example for alignments to a rough draft genome, or for samples that are expected to have high mutation rates relative to the reference used. In these situations, the AQ20 and AQ17 lengths can be short even when sequencing quality is excellent.

The AQ20 length is calculated using the following steps:

  • Every base in the read is classified as being correct or not correct according to the alignment to the reference.

  • At every position in the read, the total error rate is calculated up to and including that position.

  • The greatest position at which the error rate is one percent or less is identified and that position defines the AQ20 length.

For example, if a 100‑bp read consists of 80 perfect bases that are followed by 2 errors that are then followed by 18 more perfect bases, the total error rate at position 80 is zero percent. At position 81 the total error rate is 1.2% (1/81), at position 82 the error rate is 2.4%, continuing up to position 100 where it is 2% (2/100). The greatest length at which the error rate is 1% or less is 80, and the greatest length at which the error rate is 2% or less is 100. Therefore, the AQ20 and AQ17 lengths are 80 and 100 bases, respectively.