• A comparative genomics multitool for scientific discovery and conservation

      Genereux, Diane P.; Serres, Aitor; Armstrong, Joel; Johnson, Jeremy; Marinescu, Voichita D.; Murén, Eva; Juan, David; Bejerano, Gill; Casewell, Nicholas R.; Chemnick, Leona G.; et al. (2020)
      The Zoonomia Project is investigating the genomics of shared and specialized traits in eutherian mammals. Here we provide genome assemblies for 131 species, of which all but 9 are previously uncharacterized, and describe a whole-genome alignment of 240 species of considerable phylogenetic diversity, comprising representatives from more than 80% of mammalian families. We find that regions of reduced genetic diversity are more abundant in species at a high risk of extinction, discern signals of evolutionary selection at high resolution and provide insights from individual reference genomes. By prioritizing phylogenetic diversity and making data available quickly and without restriction, the Zoonomia Project aims to support biological discovery, medical research and the conservation of biodiversity.
    • Camera settings and biome influence the accuracy of citizen science approaches to camera trap image classification

      Egna, Nicole; O'Connor, David; Stacy-Dawes, Jenna; Tobler, Mathias W.; Pilfold, Nicholas W.; Neilson, Kristin; Simmons, Brooke; Davis, Elizabeth Oneita; Bowler, Mark; Fennessy, Julian; et al. (2020)
      Scientists are increasingly using volunteer efforts of citizen scientists to classify images captured by motion-activated trail cameras. The rising popularity of citizen science reflects its potential to engage the public in conservation science and accelerate processing of the large volume of images generated by trail cameras. While image classification accuracy by citizen scientists can vary across species, the influence of other factors on accuracy is poorly understood. Inaccuracy diminishes the value of citizen science derived data and prompts the need for specific best-practice protocols to decrease error. We compare the accuracy between three programs that use crowdsourced citizen scientists to process images online: Snapshot Serengeti, Wildwatch Kenya, and AmazonCam Tambopata. We hypothesized that habitat type and camera settings would influence accuracy. To evaluate these factors, each photograph was circulated to multiple volunteers. All volunteer classifications were aggregated to a single best answer for each photograph using a plurality algorithm. Subsequently, a subset of these images underwent expert review and were compared to the citizen scientist results. Classification errors were categorized by the nature of the error (e.g., false species or false empty), and reason for the false classification (e.g., misidentification). Our results show that Snapshot Serengeti had the highest accuracy (97.9%), followed by AmazonCam Tambopata (93.5%), then Wildwatch Kenya (83.4%). Error type was influenced by habitat, with false empty images more prevalent in open-grassy habitat (27%) compared to woodlands (10%). For medium to large animal surveys across all habitat types, our results suggest that to significantly improve accuracy in crowdsourced projects, researchers should use a trail camera set up protocol with a burst of three consecutive photographs, a short field of view, and determine camera sensitivity settings based on in situ testing. Accuracy level comparisons such as this study can improve reliability of future citizen science projects, and subsequently encourage the increased use of such data.
    • Does placental invasiveness lead to higher rates of malignant transformation in mammals?Response to: ‘Available data suggests positive relationship between placental invasion an malignancy’

      Boddy, Amy M.; Abegglen, Lisa M.; Aktipis, Athena; Schiffman, Joshua D.; Maley, Carlo C.; Witte, Carmel L. (2020)
      In our study, Lifetime cancer prevalence and life history traits in mammals, we reported the prevalence of neoplasia and malignancy in a select group of mammals housed at San Diego Zoo Global from 1964 to 1978 and 1987 to 2015 [1]. We also used these data to evaluate associations between life history traits and measures of population health. Our analysis showed placental invasiveness could not predict the proportion of animals diagnosed with neoplasia or malignancy. In a response to our article, Drs Wagner and colleagues describe a different calculation to test for a relationship between placental invasiveness and malignancy. They identified and included previously published veterinary neoplasia and malignancy data with our published dataset and suggest a positive relationship between placental invasiveness and development of malignancy (referred to as malignancy rate in Wagner and colleagues’ response). These data provided support for the Evolved Levels of Invasiveness (ELI) hypothesis [2]. We are pleased that other investigators find our data useful, and wholeheartedly agree with Drs Wagner and colleagues in the need to identify more data on cancer in a wide variety of species. Notwithstanding, this updated analysis brings up a number of topics that we would like to address....
    • Highly accurate long-read HiFi sequencing data for five complex genomes

      Hon, Ting; Mars, Kristin; Young, Greg; Tsai, Yu-Chih; Karalius, Joseph W.; Landolin, Jane M.; Maurer, Nicholas; Kudrna, David; Hardigan, Michael A.; Steiner, Cynthia C.; et al. (2020)
      The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25?kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria?×?ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.