Data in highlighted phylogenies
Lineage | Level | Taxa | Root |
---|---|---|---|
Viridiplantae | Genera | 513 | Klebsormidium nitens |
Liliopsida | Species | 182 | Amborella trichopoda |
Eudicots | Species | 817 | Amborella trichopoda |
Chlorophyta | Species | 83 | Klebsormidium nitens |
Fungi | Genera | 989 | Polychytrium aggregatum |
Ascomycota | Genera | 591 | Saitoella complicata |
Basidiomycota | Species | 782 | Wallemia mellicola |
Metazoa | Genera | 2405 | Amphimedon queenslandica |
Arthropoda | Genera | 993 | Limulus polyphemus |
Vertebrata | Genera | 1199 | Collorhinchus milii |
Bulk research data
Data file: Compiled genome statistics.
Description: NCBI metadata, assembly quality statistics, BUSCO annotation statistics and taxonomic information for all genomes analyzed.
Data file: CUSCO genes.
Description: Curated BUSCO gene set for 10 lineages that have upto 7% higher precision in annotations.
Data file: BUSCO gene length statistics.
Description: Annotated gene length summary statistics for all BUSCO genes in 10 lineages across all taxa.
Data file: Gene alignments.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Aligned fasta files of gene alignments done by Muscle v5.
Data file: Treefiles.
Description: Accounts for 3,566 computed trees are provided in the research description. There are three sets of trees.
1. One tree each per lineage computed with amino acid states ranging from 2-14 and alignment lengths ranging from 1,000-15,000.
2. For five selected lineages (Eudictos, Ascomycota, Basidiomcyota, Arthropoda, Vertebrata), 10 trees each from 5 sets of sampled sites at 9 total conditions (3 rates x 3 alignment lengths). Rate profiles were of amino acid states 2, 8 and 14. Alignment lengths were 1,000, 5,000 and 10,000.
3. Based on the results, sets of about 50-100 trees for all 10 lineages of the highest possible rate configurations. Amino acid states 8 and 14 were also included.
Data file: Treeset taxonomic congruity.
Description: For sets of 50 trees created under 9 experimental conditions (3 rates x 3 alignment lengths) for the 5 tested lineages, the extent of taxonomic congruity is measured by the number of families resolved as monophyletic by the phylogenies.
There are a total of 543 families that were tested.
Data file: Conserved gene blocks.
Description: For the 10 BUSCO lineages, colinear gene blocks with identified (true) and remnant (null) BUSCO genes that were found to be conserved across very long divergence times were extracted.
Search for gene blocks of up to about 8 genes were computationally feasible and the 10 gene blocks having the highest incidence have been cataloged.
Data file: Synteny plots of Oryza chromosomes.
Description: The Oryza genus was presented as a case-study demonstrating the utility of BUSCO syntenic information in assembly evaluations because of the presence of highly contigous reference assemblies for several species within the genus.
The synteny plots are split by chromosome.
Data file: Compleasm annotations for all assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled compleasm output for all assemblies. These have been referred to as true or identified genes.
Data file: Compleasm annotations for chromosome-level assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled compleasm output for chromosome-level assemblies. These assemblies are used during phyca (collinearity) analysis.
Data file: Compleasm annotations for all BUSCO-depleted assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled Compleasm output for all assemblies after deleting all BUSCO genes from the assemblies. These have been referred to as null or remnant genes.