Trimmomatic

Trimmomatic stand-alone command-line application

Trimmomatic is a fast, multithreaded command-line tool for preprocessing illumina sequencing data. It has been specially developed for the purification of raw sequence reads by removing low quality bases and adapter sequences prior to further downstream analyses such as mapping or assembly. Trimming your data correctly is an important first step that can significantly improve the accuracy and reliability of your results. The tool performs a series of user-defined trimming steps on FASTQ files. It can process both single-end (SE) and paired-end (PE) data and generate clean reads for analysis.
Trimmomatic is available at the official Github page https://github.com/usadellab/Trimmomatic/.

Understanding the Trimmomatic trimming modes

With simple trimming, each adapter sequence is tested against the reads, and if a sufficiently accurate match is detected, the read is clipped appropriately.

In paired-end data Trimmomatic uses a highly accurate method for adapter removal called palindrome trimming. This mode is specifically designed to handle a common scenario in library preparation where the DNA insert is shorter than the read length. When the DNA fragment being sequenced is shorter than the actual sequencing read length, the sequencing machinery reads through the entire fragment and continues into the adapter sequence ligated to the other end.
The palindrome mode utilises the properties of paired-end reads. The forward and reverse reads are essentially reverse complements of the same original DNA fragment.

Initial Search
Trimmomatic first performs a standard search, looking for known adapter sequences in your provided FASTA file.
Palindrome Alignment
After the initial search, it takes the read pairs and attempts to align them to each other. Because they originate from the same fragment, the overlapping sections should be perfect reverse complements.
Accurate Clipping
A strong palindromic alignment between the two reads is an extremely reliable sign that they have read each other. The tool can then precisely identify the start of the adapter sequence and clip both reads accurately.

This method is far more sensitive than the simple search for adapter sequences, as it uses the information from the read's partner to confirm the presence of adapter contamination.

Terms of use

Usage of this site and download of the Trimmomatic command-line tool follows the GDPR Privacy Notice of Plant Biotechnology Information (plantBI), IBG-4, Jülich Research Centre (FZJ). The detailed GDPR Privacy Notice is available at .

While the Trimmomatic software is licensed under the GPL (General Public License), the adapter sequences are not included in the GPL part, but owned by and used with permission of Illumina. (Oligonucleotide sequences © 2023 Illumina, Inc. All rights reserved.) Suggested adapter sequences are provided for TruSeq2 (as used in GAII machines) and TruSeq3 (as used by HiSeq and MiSeq machines), for both single-end and paired-end mode. These sequences have not been extensively tested, and depending on specific issues which may occur in library preparation, other sequences may work better for a given dataset.

Publication & Contact

Bolger AM, Lohse M, Usadel B (2014)
Trimmomatic: a flexible trimmer for Illumina sequence data.
Bioinformatics. 2014 Aug 1, 30(15): 2114-2120.

For any questions, please feel free to contact us (plabipd@fz-juelich.de). To report a problem, you can also go to the official Github page https://github.com/usadellab/Trimmomatic/.

Download Trimmomatic

Download the Trimmomatic command-line tool from the official Github page https://github.com/usadellab/Trimmomatic/. The simplest option is to download the JAR file (trimmomatic.jar), but it is also possible to download the source code and create the JAR file.

How to run Trimmomatic

When you call up the trimmomatic.jar programme on the command line, use the first parameter to specify whether the sequencing data to be trimmed is of the Paired-End (PE) or Single-End (SE) type.
In addition, there are several optional parameters available, for example to specify the number of CPU threads to use for multi-threading and the type of quality score encoding. See the box below for all optional parameters.

Optional parameters

-threads <threads>
specifies the number of CPU threads to use for multi-threading
-phred33 | -phred64
specifies the quality score encoding. phred33 is the standard for modern Illumina data
-trimlog <log file>
writes a detailed log file
-summary <summary file>
writes a summary of trimming results to a file
-basein <template input file>
sets path to one of the paired-end input files (its mate is auto-detected)
-baseout <template output file>
sets template path used to generate the four paired-end output files
-validatePairs
performs an extra validation step on paired-end reads before trimming to ensure read pairs are consistent
-compressLevel <compression level>
sets the compression level for BZIP2/GZ output files (1=fastest, 9=best compression)
-compressStream | -compressBlock
specifies the compression mode. Block compression is the default
-quiet
suppresses progress output to the console
-version
prints the version tag

Finally, the processing pipeline is set up by selecting and parameterising at least one trimming step. All further trimming steps and their order are optional. The available trimming steps are executed in the order in which they are added to the command line. It is recommended that the trimming step ILLUMINACLIP, if required, is done as early as possible. The trimming works with FASTQ formatted files, either uncompressed or compressed (the gzip format is determined based on the .gz extension).

trimmomatic.jar running in paired-end (PE) mode requires two input FASTQ files (forward and reverse reads). This mode generates four output files: two for reads that are left as pairs and two for reads that are left as singletons.
java -jar <path to trimmomatic.jar> PE [optional parameters]  \
  <input_forward> <input_reverse> \
  <output_forward_paired> <output_forward_unpaired>  \
  <output_reverse_paired> <output_reverse_unpaired>  \
  <step #1> [further optional trimming steps]

trimmomatic.jar running in single-end (SE) mode requires one input FASTQ file and generates one output file.
java -jar <path to trimmomatic.jar> SE [optional parameters]  \
  <input> <output>  \
  <step #1> [further optional trimming steps]

Trimmomatic trimming steps

The FASTQ format uses phred+33 or phred+64 quality scores (depending on the Illumina pipeline used). For example, a score of 20 means 99% and a score of 40 means 99.99% accuracy for the base call (for further details see the Wikipedia site about Phred quality scores). The <quality> parameter in several trimming steps is expected to be a numeric value that refers to phred+33/phred+64 quality scores.

ILLUMINACLIP :<fastaWithAdapters>:<seedMismatches>:<palindromeClipThreshold>:<simpleClipThreshold> \
[:<minAdapterLengthPalindrome>:keepBothReads]
cuts adapter sequences and other Illumina-specific technical sequences from the reads. This is the most common first step.
- fastaWithAdapters | path to a fasta file containing all the adapters
- seedMismatches | maximum number of sequence mismatches allowed in initial search for adapter (2 is a common value)
- palindromeClipThreshold | minimum quality score required for an alignment between pairs of reads (30 is a common value)
- simpleClipThreshold | minimum quality score required for an alignment between an adapter sequence and a read (10 is a common value)
- minAdapterLengthPalindrome | minimum adapter length in palindrome mode (optional)
- keepBothReads | specifies if both reads should be kept in palindrome mode (optional)
LEADING :<quality>
cuts off bases with from the 5'-end of a read
- quality | minimum quality score required to keep a base, common values are between 3 and 20
TRAILING :<quality>
cuts off bases with from the 3'-end of a read
- quality | minimum quality score required to keep a base, common values are between 3 and 20
HEADCROP :<length>
cuts a specified number of bases from the 5'-end of a read
- length | number of bases to remove
TAILCROP :<length>
cuts a specified number of bases from the 3'-end of a read
- length | number of bases to remove
CROP :<length>
trims a read to a specified length by cutting the 3'-end
- length | number of bases to keep
SLIDINGWINDOW :<windowSize>:<quality>
performs a sliding window trim as soon as the average quality score within the window falls below a threshold value
- windowSize | length of the sliding window
- quality | average quality score required, common values are between 10 and 40
MAXINFO :<targetLength>:<strictness>
trims a read to maximize useful information, balancing for read length and quality
- targetLength | user-defined ideal length for a read
- strictness | specifies the trade-off between length and quality, numeric value between 0.0 and 1.0
MAXLEN :<length>
discards a read that is longer than a certain length
- length | maximum permitted length
MINLEN :<length>
discards a read that is shorter than a certain length (after all previous trimming steps)
- length | minimum required length
AVGQUAL :<quality>
discards a read if its average quality is below a threshold value
- quality | minimum required average quality score, common values are between 10 and 30
BASECOUNT :<bases>:<minCount>:<maxCount>
discards a read if the frequency of a base (or multiple bases) is not within a certain range, is below a specified minimum or above a specified maximum
- bases | specifies a base (or multiple bases), single string of concatenated characters, each representing a base (e.g. N or GC)
- minCount | minimum required frequency of a base (or multiple bases) to keep a read
- maxCount | maximum permitted frequency of a base (or multiple bases) to keep a read
TOPHRED33
converts quality scores to Phred+33 quality scores
TOPHRED64
converts quality scores to Phred+64 quality scores

Why did so many of my reads get discarded by MINLEN ?

This usually means that the initial quality of your reads was low or there was significant adapter contamination. The SLIDINGWINDOW or ILLUMINACLIP steps may be trimming a large portion of your reads, causing them to fall below the minimum length threshold. Check your raw data quality with a tool like FastQC.

What is the difference between the paired and unpaired output files in Paired-End (PE) mode ?

After trimming, some reads may remain while their partners are discarded (e.g., for being too short).
- Paired Output (R1_P.fq, R2_P.fq): Contains reads where both partners remained after the trimming process. These should be used for standard paired-end alignment.
- Unpaired Output (R1_U.fq, R2_U.fq): Contains reads where only one partner remained after the trimming process. These can be mapped as single-end reads.
Which adapter file should I use for ILLUMINACLIP ?

Trimmomatic comes with standard adapter files (e.g., TruSeq3-PE.fa). You should select the one that corresponds to the library preparation kit used to generate your data. If you are unsure, contact the sequencing facility that performed the run.

My job failed with an error. What can I do ?

The most common cause of errors is an incorrectly formatted FASTQ file. Make sure that your file strictly complies with the FASTQ format and that each data record consists of exactly four lines. File damage or truncation during transfer can also lead to problems.

I want to report an issue. What can I do ?

You can visit the official Github page https://github.com/usadellab/Trimmomatic/ and report a problem. We or the community can respond there and help you solve your problem.