Overview of the BaseCaller module functionality

In addition to creating a sequence of bases from the 1.WELLS file information, the BaseCaller module also performs read filtering and read trimming.

Notes on read filtering

  • Filters out low-quality reads that were marked during signal processing.

  • Filters out reads that fail basecalling filters.

  • The removed reads do not appear in the BAM file. The BaseCaller module keeps counts of these reads but there is no record of specific reads that are filtered out.

Notes on read trimming

  • Removes specific bases from the read for quality reasons.

  • The read appears in the BAM file.

  • The removed bases do not appear in the BAM file.

The BaseCaller module performs these functions:

  1. Removes low-quality reads that were marked during the signal processing step.

  2. Performs base calling:

    1. From the signal values, creates the sequence of bases.

    2. Estimates the base quality value for each base.

  3. Performs 5′ barcode classification:

    1. Assigns each read to a barcode.

    2. Trims the barcode sequence away if the parameter --trim-barcodes=on is specified. The default is 'on'.

  4. Trims 5′ PCR handle.

  5. Trims 5' unique molecular tag.

  6. Trims extra bases at the 5' end. Trimming is controlled by the parameter --extra-trim-left. The default is 0, meaning no extra trimming.

  7. Filters out reads that are too short. Filtering is controlled by the parameters--min-read-length and -- trim-min-read-len.

  8. Filters out reads that do not have the correct library key. Filtering is turned off by the parameter --keypass-filter.

  9. Trims the P1 adapter at the 3' end.

  10. Classifies and trims the 3′ barcode.

  11. Trims the 3′ PCR handle.

  12. Trims the 3’ unique molecular tag.

  13. Trims extra bases on the 3’ end. Trimming is controlled by the parameter --extra-trim-right . The default is 0, meaning no extra trimming.

  14. Performs quality trimming. Trimming is affected by the parameters --trim-qual-window-size and -- trim-qual-cutoff.

The BaseCaller module classifies and trims read elements from the outside inwards to obtain the query sequence. If an outer element cannot be identified, the BaseCaller module does not attempt to identify and trim an inner one. If trimming the barcode is disabled using the -trim-barcodes parameter, PCR handles, UMTs, or extra-bases will not be trimmed. Read elements on the 5' end will only be trimmed if the P1 adapter was found.

Notes about quality trimming

  • The purpose of quality trimming is to identify where quality problems start at the end of a read. We try to identify when bases fall below a quality threshold and trim both of those bases and a bit before those bases.

  • The parameter --trim-qual-window-size sets the window size for quality trimming. The algorithm slides through the sequence of bases and, each time the window shifts, computes the mean Base QV value for all bases in the window.

  • If the mean Base QV value for all bases in the window falls below a threshold (set by the parameter --trim-qual-cutoff where the default 16), then trims all bases from the center of the window at that time to the 5' end.