Citrus sinensis Genome

Source

The Citrus sinensis v1.1 (Sweet orange) assembly and the following text comes from JGI Phytozome. The Citrus sinensis v2 (Sweet orange) assembly was downloaded from Citrus sinensis Annotation Project (CAP) at the Huazhong Agricultural University, China.

Overview

Sweet orange (citrus, Citrus sinensis) represents the largest citrus cultivar group grown in the world, accounting for about 70% of the total. Brazil, Florida (USA), and China are the three largest sweet orange producers. Sweet orange is considered an introgression of a natural hybrid of mandarin and pummelo.

The Citrus sinensis Annotation Project (CAP) was constructed in 2012 upon the completion of the sweet orange genome sequencing with the aim of providing the scientific community with an accurate annotation of the sweet orange genome sequence.

The goal of the Citrus Genome Project is to generate a draft sequence of the sweet orange genome using Next Generation (454) sequence generated by 454 Life Sciences (team headed by Chinnappa Kodira), University of Florida (team headed by Fred Gmitter) as well as Sanger sequence generated by JGI (team led by Daniel Rokhsar). EST sequence has been generated by JGI, University of Florida, and 454 Life Sciences. There is a separate deep Sanger sequencing project by the International Citrus Genome Consortium of a haploid derived from Clementine mandarin.
Assembly details 

Statistics for v2 assembly

Genome Size

This version (v1.1) of the assembly is 327 Mb spread over 9 chromosomes and a pseudomolecule with unplaced contigs.

Loci

29,655 protein-coding loci have been predicted on this assembly with 44,275 alternative transcripts.

Sequencing, Assembly, and Annotation

Whole genome sequencing strategy

The dihaploid line derived from another culture of C. sinensis cv. Valencia was used for paired-end tag DNA sequencing using an Illumina GAII platform. Approximately 785 million 2x100 bp reads were generated from libraries with different fragment sizes (300 bp, 2 kb, 10 kb and 20 kb).

How was the assembly generated?

Contig assembly, scaffolding and gap filling of sequencing data was done using the assembler SOAPdenovo, with the three DNA-PET libraries (2 kb, 10 kb and 20 kb) being used to link contigs into scaffolds. Scaffolds and contigs were refined further with the gap-filling module in SOAPdenovo (GapCloser), which is used for bridging scaffold gaps, to produce an assembly with a scaffold N50 of 427 kb that covered 286 Mb of the citrus genome. To further improve the assembly, we used the optimal scaffolder Opera12 with DNA-PET reads (10 kb and 20 kb) and BAC ends (125 kb, 5,136 BAC-end sequences) in order of increasing library size to construct larger scaffolds and fill gaps using GapCloser (final genome size of 320.5 Mb and N50 of 1.69 Mb).

Reference Publication

Qiang Xu, Ling-Ling Chen, Xiaoan Ruan, Dijun Chen, Andan Zhu, Chunli Chen, Denis Bertrand, Wen-Biao Jiao, Bao-Hai Hao, Matthew P Lyon, Jiongjiong Chen, Song Gao, Feng Xing, Hong Lan, Ji-Wei Chang, Xianhong Ge, Yang Lei, Qun Hu, Yin Miao, Lun Wang, Shixin Xiao, Manosh Kumar Biswas, Wenfang Zeng, Fei Guo, Hongbo Cao, Xiaoming Yang, Xi-Wen Xu, Yun-Jiang Cheng, Juan Xu, Ji-Hong Liu, Oscar Junhong Luo, Zhonghui Tang, Wen-Wu Guo, Hanhui Kuang, Hong-Yu Zhang, Mikeal L Roose, Niranjan Nagarajan, Xiu-Xin Deng and Yijun Ruan The draft genome of sweet orange (Citrus sinensis). Nature Genetics. 2013 45:59-66

Data sets 
The genome assembly, pseudomolecules, annotations and genome browser are available through the links below.
Bulk Datasets (via CAP) Pathways
BLAST Gene curation
Genome browser   


Statistics for v1.1 assembly

Genome Size

This version (v1.1) of the assembly is 319 Mb spread over 12,574 scaffolds. Half the genome is accounted for by 236 scaffolds 251 kb or longer.

Loci

The current gene set (orange1.1) integrates 3.8 million ESTs with homology and ab initio -based gene predictions (see below). 25,376 protein-coding loci have been predicted, each with a primary transcript. An additional 20,771 alternative transcripts have been predicted, generating a total of 46,147 transcripts. 16,318 primary transcripts have EST support over at least 50% of their length. Two-fifths of the primary transcripts (10,813) have EST support over 100% of their length.

Sequencing, Assembly, and Annotation

Whole genome sequencing strategy

Genomic sequence was generated using a whole genome shotgun approach with 2Gb sequence coming from GS FLX Titanium; 2.4 Gb from FLX Standard; 440 Mb from Sanger paired-end libraries; 2.0 Gb from 454 paired-end libraries

How was the assembly generated?

The 25.5 million 454 reads and 623k Sanger sequence reads were generated by a collaborative effort by 454 Life Sciences, University of Florida and JGI. The assembly was generated by Brian Desany at 454 Life Sciences using the Newbler assembler.

Reference Publication

Wu GA, Prochnik S, Jenkins J, Salse J, Hellsten U, Murat F, Perrier X, Ruiz M, Scalabrin S, Terol J, Takita MA, Labadie K, Poulain J, Couloux A, Jabbari K, Cattonaro F, Del Fabbro C, Pinosio S, Zuccolo A, Chapman J, Grimwood J, Tadeo FR, Estornell LH, Munoz-Sanz JV, Ibanez V, Herrero-Ortega A, Aleza P, Perez-Perez J, Ramon D, Brunel D, Luro F, Chen C, Farmerie WG, Desany B, Kodira C, Mohiuddin M, Harkins T, Fredrikson K, Burns P, Lomsadze A, Borodovsky M, Reforgiato G, Freitas-Astua J, Quetier F, Navarro L, Roose M, Wincker P, Schmutz J, Morgante M, Machado MA, Talon M, Jaillon O, Ollitrault P, Gmitter F, Rokhsar D, Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nature biotechnology. 2014 32:7:656-62


Data sets 
The genome assembly, pseudomolecules, annotations and genome browser are available through the links below.
Bulk Datasets (via Phytozome) Pathways
BLAST Gene curation
Genome browser