MCOT v1.0 files =============== MCOTv1.0_Diaphorina_citri_protein_AHRD.fasta (Proteins with AHRD descriptions) MCOTv1.0_Diaphorina_citri_cds.zip MCOTv1.0_Diaphorina_citri_protein.zip MCOTv1.0_Diaphorina_citri_transcript.zip cufflinks.zip: Original cufflinks output used for MCOT Oases.zip: Original Oases output used for MCOT Trinity.zip: Original Trinity output used for MCOT MCOT works by comparing maker and cufflinks models with de novo assemblies by Trinity and Oases. There are 23,419 transcripts, and 30,557 protein sequences. All MCOT protein names are followed by one of these terms: MM (original sequence from maker, select maker): 3,660 MO (original sequence from maker, select Oases): 255 MT (original sequence from maker, select Trinity): 341 CC (original sequence from cufflinks, select cufflinks): 18,015 CT (original sequence from cufflinks, select Trinity): 6,609 CO (original sequence from cufflinks, select Oases): 6,943 TT (original sequence from Trinity, select Trinity): 339 OO (original sequence from Oases, select Oases): 595 E.g. MCOT14092.0.CT | N-carbamoylputrescine amidase | Similar to KRM11194.1 | ***- | PANTHER PTHR23088 | PANTHER PTHR23088:SF9 | Pfam PF00795 CT means “original sequence from cufflinks, select trinity” means that the query sequence is from cufflinks. After comparing this protein sequence to Trinity and Oases, we think the best model is the one from trinity, so we select the one from trinity. Some genes may only exist only in de novo models by Trinity or Oases. That’s why “original sequence” may be from Trinity or Oases. The description includes Pfam domains, GO terms and a description generated using Uniprot, Interproscan and the AHRD pipeline. MCOT v1.1 file ============== MCOTv1.1_Diaphorina_citri_protein_OGS-AHRD.fasta (Proteins with OGSv1.0 Manually curated and AHRD descriptions) Mapping file: Dcitr_NCBI_to_OGSv1.0_id_mapFile_MCOT-Dmel.txt (describes the changes between the original NCBI annotations and the OGSv1.0, MCOT, and Dmel) E.g. MCOT14215.0.CC | Toll-13 | Similar to N0A0P9 | *** | PANTHER PTHR24365 | Pfam PF13855 | Pfam PF00560 | Similar to curated Dcitr19281.1 (Perc. Ident. 98.29 Perc. Cov. 99.84): Toll-13 The description includes Pfam domains, GO terms and a description generated using Uniprot, Interproscan and the AHRD pipeline (Swissprot, Trembl and Dmel dbs). Additionally, descriptions contains OGSv1.0 manually curated descriptions with percentage identity >85% and percentage coverage >65%.