Sequencing the gigabase plant genome of the wild tomato species Solanum pennellii using Oxford Nanopore single molecule sequencing

 

Contributors

Maximilian Schmidt1, Alexander Vogel1, Alisandra Denton1, Benjamin Istace6, Alexandra Wormit1, Henri van de Geest2, Marie E. Bolger3, Saleh Alseekh4, Janina Maß3, Christian Pfaff3, Ulrich Schurr3, Roger Chetelat, Florian Maumus, Jean-Marc Aury6, Alisdair R. Fernie4, Dani Zamir5, Anthony Bolger1, Björn Usadel1,3.

 

1Institute for Botany and Molecular Genetics, BioEconomy Science Center, RWTH Aachen University, Aachen, Germany.

2Wageningen Plant Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands

3Institute for Bio- and Geosciences (IBG-2: Plant Sciences), Forschungszentrum Jülich, Jülich, Germany.

4Department of Molecular Physiology, Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany.

5Faculty of Agriculture, Hebrew University of Jerusalem, Rehovot, Israel.

6Genoscope (CEA) and UMR 8030 CNRS-Genoscope-Université d'Evry, 2 rue Gaston Crémieux, BP5706, 91057 Evry, France.

 

 

Background

Recent updates in Oxford Nanopore technology (R9.4) have made it possible to obtain GBases of sequence data from a single flowcell. However, unlike other next generation sequencing technology, Oxford nanopore based sequencing doesn’t require any a priori capital investments. We therefore evaluated whether Oxford nanopore can be used to analyze plant genomes. To this aim, we sequenced and are assembling an accession of the wild tomato species Solanum pennellii. This accession was identified spuriously as an tomato accessions. Unlike the frequently used Solanum pennelii LA716 accession, for which we have previously generated a high quality draft genome, this new accession does not appear to exhibit any dwarfed, necrotic leaf phenotype when introgressed into modern tomato cultivars.

Here we present approximately 134 Gbases of third generation sequencing data representing a raw coverage of ca 110x. This corresponds to 110GBases of data passing the Oxford nanopores quality filter representing about 90x coverage. In addition we provide approximately 20-30x coverage of Illumina data. Average Q value represents a normal average of all Q  (as delivered in e.g. FastQC) values in a read and is thus higher than the one reported by Oxford nanopores.

Please cite our manuscript:

De novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing

The Plant Cell, Oct 2017. 

The data is open.... also of course available via EBI

We will periodically add updates here as well.

 

MinION data

 

FAST5 files

Fastq files Fastq
unfiltered filtered
# reads Yield RL average longest Read avg. Q # reads Yield RL average longest Read avg. Qto
20161027_Spenn_001_001 (512GB) 20161027_Spenn_001_001 252,424 3,339,142,817 13,228 258042 8.96 205,010 2,818,546,677 13,748 71457 9.55
20161101_Spenn_002_002 (826GB) 20161101_Spenn_002_002 439,240 5,183,715,529 11,802 171384 8.79 340,880 4,202,561,007 12,329 81554 9.49
20161103_Spenn_003_003 (1.1TB) 20161103_Spenn_003_003 520,761 6,650,042,702 12,770 160531 8.87 412,111 5,466,343,110 13,264 95933 9.43
20161108_Spenn_004_004 (753GB) 20161108_Spenn_004_004 431,400 5,252,529,782 12,176 147621 8.86 343,999 4,360,721,187 12,677 105470 9.45
20161108_Spenn_004_005 (1.1TB) 20161108_Spenn_004_005 561,300 7,058,376,081 12,575 206494 8.95 458,908 6,009,783,868 13,096 92999 9.46
20161110_Spenn_005_006 (813GB) 20161110_Spenn_005_006 380,518 5,364,783,960 14,099 190799 8.83 298,794 4,460,813,385 14,929 131801 9.44
20161110_Spenn_005_007 (732GB) 20161110_Spenn_005_007 346686 4,956,793,607 14,297 164918 9.07 285,832 4,294,311,690 15,023 109564 9.55
20161112_Spenn_006_008 (431GB) 20161112_Spenn_006_008 219,392 2,942,797,165 13,413 131281 9.1 176,739 2,535,756,222 14,347 89450 9.63
20161112_Spenn_006_009 (732GB) 20161112_Spenn_006_009 379,071 5,171,422,068 13,642 149605 9.07 313,732 4,464,764,479 14,231 100221 9.55
20161114_Spenn_007_010 (1.1TB) 20161114_Spenn_007_010 451,070 6,721,699,609 14,901 198894 8.74 344,128 5,460,999,489 15,869 100546 9.4
20161114_Spenn_007_011 (211GB) 20161114_Spenn_007_011 92,994 1,334,258,832 14,348 107121 8.59 66,439 1,035,193,238 15,581 90489 9.39
20161116_Spenn_009_012 (799GB) 20161116_Spenn_009_012 523,760 5,360,546,441 10,238 162037 9.00 430,481 4,558,323,604 10,589 67428 9.54
20161116_Spenn_009_013 (250GB) 20161116_Spenn_009_013 166,466 1,717,528,003 10,317 114435 8.98 134,116 1,441,345,290 10,747 56948 9.57
20161118_Spenn_008_014 (174GB) 20161118_Spenn_008_014 122,505 1,146,492,883 9,358 146319 8.96 99,017 961,490,600 9,710 58788 9.59
20161118_Spenn_008_015 (296GB) 20161118_Spenn_008_015 186,513 1,815,953,333 9,736 181283 8.84 148,452 1,495,777,948 10,075 62177 9.50
20161121_Spenn_011_016 (727GB) 20161121_Spenn_011_016 328,053 4,215,219,836 12,849 161459 8.84 260,702 3,485,323,522 13,368 81602 9.50
20161121_Spenn_011_017 (423GB) 20161121_Spenn_011_017 203,146 2,594,558,262 12,771 153016 8.79 155,876 2,105,331,803 13,506 88254 9.5
20161123_Spenn_012_018 (650GB) 20161123_Spenn_012_018 327,709 3,870,110,250 11,809 180596 8.76 254,308 3,133,636,771 12,322 85535 9.49
20161123_Spenn_012_019 (896GB) 20161123_Spenn_012_019 499,116 5,886,334,951 11,793 167972 8.87 392,448 4,878,338,636 12,430 118685 9.51
20161123_Spenn_012_020 (646GB) 20161123_Spenn_012_020 370,723 4,223,196,504 11,391 181030 8.92 290,213 3,500,334,970 12,061 90173 9.54
20161125_Spenn_010_021 (564GB) 20161125_Spenn_010_021 416,857 3,459,880,516 8,299 150499 8.88 335,613 2,913,698,492 8,681 56044 9.46
20161130_Spenn_014_022 (277GB) 20161130_Spenn_014_022 144,680 1,763,487,955 12,188 188526 8.95 115,211 1,490,693,758 12,938 77113 9.52
20161130_Spenn_014_023 (559GB) 20161130_Spenn_014_023 249,427 3,061,683,900 12,274 169609 8.64 186,877 2,447,344,412 13,096 93030 9.39
20161130_Spenn_014_024 (851GB) 20161130_Spenn_014_024 395,151 5,042,395,024 12,760 230409 8.61 294,534 3,946,433,116 13,398 78966 9.41
20161202_Spenn_016_025 (784GB) 20161202_Spenn_016_025 371,137 4,578,018,827 12,335 192749 8.63 280,153 3,604,968,054 12,867 109370 9.41
20161202_Spenn_016_026 (827GB) 20161202_Spenn_016_026 431,745 5,333,852,075 12,354 202965 8.68 326,658 4,214,255,431 12,901 153099 9.43
20161202_Spenn_016_027 (949GB) 20161202_Spenn_016_027 484,175 6,103,006,087 12,604 305986 8.68 371,827 4,859,413,464 13,069 79985 9.42
20161204_Spenn_015_028 (512GB) 20161204_Spenn_015_028 251,396 2,705,390,485 10,761 140518 8.58 182,745 2,111,380,840 11,553 83721 9.42
20161204_Spenn_015_029 (600GB) 20161204_Spenn_015_029 317,257 3,566,412,110 11,241 253497 8.76 244,735 2,884,284,143 11,785 97690 9.51
20161206_Spenn_017_030 (1.2TB) 20161206_Spenn_017_030 1,133,214 7,324,445,582 6,463 163606 8.89 908,610 6,019,467,101 6,625 94944 9.55
20161206_Spenn_017_031 (1.1TB) 20161206_Spenn_017_031 1,009,641 7,161,552,200 7,093 228704 8.87 798,551 5,802,073,420 7,266 108965 9.59

 

Pass filter, and downsampling thereof

All down sampling was performed per run, not overall. The sets are 80%, 60% and 40% of the "pass" reads from each MinION run.
Reads were randomly selected independently for each level.

% of pass reads pass reads canu-corrected
100 100% (97GB) 100% (31GB)
80 80% (77GB) Coming soon
60 60% (58GB) Coming soon
40 40% (39GB) 40% (13GB)

 

MiSeq data

SpnLY-PF55-MS01-01-1_S1_L001_R1_001
SpnLY-PF55-MS01-01-1_S1_L001_R2_001
SpnLY-PF55-MS01-01-2_S1_L001_R1_001
SpnLY-PF55-MS01-01-2_S1_L001_R2_001
SpnLY-PF55-MS01-01-3_S1_L001_R1_001
SpnLY-PF55-MS01-01-3_S1_L001_R2_001

 

 

Assemblies

Assembly N50 L50 Total size Largest contig Total contigs Illumina mapping rate % Qualimap Discrepancy rate % complete BUSCO
Canu (929MB) 1.55 169 961.83 10.01 2010 98.95 0.82 96.46
SMARTdenovo (923MB) 1.06 270 955.31 5.84 1901 98.99 0.91 96.11
Miniasm (945MB) 1.75 156 977.78 9.49 2704 98.24 2.48 85.69
CanuSMARTdenovo (885MB) 2.52 106 915.60 12.72 899 98.98 0.85 96.46

 

 

 

 

 

 

 

All sequence length in Mbp

All assemblies are 5x pilon polished

 

Acknowledgement

We want to acknowledge partial funding through the German Ministry of Education and Research 0315961 and 031A053 and 031A536C, and the Ministry of Innovation, Science and Research within the framework of the NRW Strategieprojekt BioSC (no. 313/323-400-002 13).