Data in highlighted phylogenies
 
Lineage Level Taxa Root
Viridiplantae Genera 513 Klebsormidium nitens
Liliopsida Species 182 Amborella trichopoda
Eudicots Species 817 Amborella trichopoda
Chlorophyta Species 83 Klebsormidium nitens
Fungi Genera 989 Polychytrium aggregatum
Ascomycota Genera 591 Saitoella complicata
Basidiomycota Species 782 Wallemia mellicola
Metazoa Genera 2405 Amphimedon queenslandica
Arthropoda Genera 993 Limulus polyphemus
Vertebrata Genera 1199 Collorhinchus milii
       
Bulk research data
   

Data file: Compiled genome statistics.
Description: NCBI metadata, assembly quality statistics, BUSCO annotation statistics and taxonomic information for all genomes analyzed.

Data file: CUSCO genes.
Description: Curated BUSCO gene set for 10 lineages that have upto 7% higher precision in annotations.

Data file: BUSCO gene length statistics.
Description: Annotated gene length summary statistics for all BUSCO genes in 10 lineages across all taxa.

Data file: Gene alignments.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Aligned fasta files of gene alignments done by Muscle v5.

Data file: Treefiles.
Description: Accounts for 3,566 computed trees are provided in the research description. There are three sets of trees. 1. One tree each per lineage computed with amino acid states ranging from 2-14 and alignment lengths ranging from 1,000-15,000. 2. For five selected lineages (Eudictos, Ascomycota, Basidiomcyota, Arthropoda, Vertebrata), 10 trees each from 5 sets of sampled sites at 9 total conditions (3 rates x 3 alignment lengths). Rate profiles were of amino acid states 2, 8 and 14. Alignment lengths were 1,000, 5,000 and 10,000. 3. Based on the results, sets of about 50-100 trees for all 10 lineages of the highest possible rate configurations. Amino acid states 8 and 14 were also included.

Data file: Treeset taxonomic congruity.
Description: For sets of 50 trees created under 9 experimental conditions (3 rates x 3 alignment lengths) for the 5 tested lineages, the extent of taxonomic congruity is measured by the number of families resolved as monophyletic by the phylogenies. There are a total of 543 families that were tested.

Data file: Conserved gene blocks.
Description: For the 10 BUSCO lineages, colinear gene blocks with identified (true) and remnant (null) BUSCO genes that were found to be conserved across very long divergence times were extracted. Search for gene blocks of up to about 8 genes were computationally feasible and the 10 gene blocks having the highest incidence have been cataloged.

Data file: Synteny plots of Oryza chromosomes.
Description: The Oryza genus was presented as a case-study demonstrating the utility of BUSCO syntenic information in assembly evaluations because of the presence of highly contigous reference assemblies for several species within the genus. The synteny plots are split by chromosome.

Data file: Compleasm annotations for all assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled compleasm output for all assemblies. These have been referred to as true or identified genes.

Data file: Compleasm annotations for chromosome-level assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled compleasm output for chromosome-level assemblies. These assemblies are used during phyca (collinearity) analysis.

Data file: Compleasm annotations for all BUSCO-depleted assemblies.
Viridiplantae
Liliopsida
Eudicots
Chlorophyta
Fungi
Ascomycota
Basidiomycota
Metazoa
Arthropoda
Vertebrata
Description: Compiled Compleasm output for all assemblies after deleting all BUSCO genes from the assemblies. These have been referred to as null or remnant genes.