Detailed description of steps performed by racoon_clip

Quality filtering

Sequencing reads are filtered for a Phred score >= 10 inside the unique molecular identifier (UMI) at positions 1-10 of each read to ensure reliable sample and duplicate assignment. The cutoff can be changed by specifying another value by the racoon_clip minBaseQuality option.

Demultiplexing, UMI & Adapter trimming

Demultiplexing and 3’ adapters adapter trimming are performed with FLEXBAR (version 3.5.0). FLEXBAR also handles UMIs and trims barcodes.

If demultiplexing is turned on, this is done with the FLEXBAR via the provided barcode_fasta with FLEXBAR parameters --barcodes {input.barcodes} --barcode-unassigned --barcode-error-rate 0.

3’ adapters are trimmed using FLEXBAR options --adapter-trim-end RIGHT --adapter-error-rate 0.1 --adapter-min-overlap 1 --adapter-cycles <as_specified> by default, but adapter trimming can also be turned off.

At the same time, UMIs (and barcodes, if present) are trimmed from the 5’ end of the reads and stored in the read names using FLEXBAR options --umi-tags --barcode-trim-end LTAIL.

Reads that are shorter than 15 nt after trimming are discarded using the FLEXBAR option --min-read-length 15. The cutoff can be changed by specifying another value by the racoon_clip flexbar_minReadLength option.

See also: FLEXBAR—Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms.

Genome alignment

Reads are aligned to the specified genome with STAR (version 2.7.10). In short, the genome is indexed with STAR –runMode genomeGenerate. Then, the reads of each sample are individually aligned to the genome with STAR –runMode alignReads --sjdbOverhang 139 --outFilterMismatchNoverReadLmax 0.04 --outFilterMismatchNmax 999 --outFilterMultimapNmax 1 --alignEndsType "Extend5pOfRead1" --outReadsUnmapped "Fastx" --outSJfilterReads "Unique". Obtained bam files are indexed with SAMtools index (version 1.11). All parameters except --alignEndsType "Extend5pOfRead1" can be changed via racoon_clip options.

See also:

Deduplication

Aligned reads are deduplicated with umi_tools dedup --extract-umi-method read_id --method unique (UMI-tools version 1.1.1).

See also UMI-tools: modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy