A comprehensive, online, updateable tree of life

https://github.com/kcranston/OpenTree_NSF_2016

Where is the tree of life?

Phylogeny provides a mechanism through which to interpret the patterns and processes of evolution and to predict the responses of life to rapid environmental change. Phylogenies and phylogenetic methods are now being used to enhance agriculture, identify and combat diseases, conserve biodiversity, and predict responses to global climate change and to biological invasions.
(tl;dr: We need trees to do cool and important science)

What do we know about the tree?

Can I browse / search / download it?

    Summarize existing knowledge into a tree that is:

    1. Complete

    2. Online

    3. Updateable

Status

Original funding (2012 - 2015): 11 PIs, 10 institutions

Supplement (2015 - present): Cranston, Holder, Smith

Two Greatest Challenges

(Technical and social)

I. Publishing tree files not a community norm

phylesystem

A git-based data store for community-curated phylogenetic estimates. Bioinformatics (2015) 31 (17): 2794-2800

  • 7757 trees from 3401 studies
  • 4779 commits from 117 curators (90 non-opentree)
  • git backend + python API + curation application

Biggest challenges

(Technical and social)

I. Publishing tree files not a community norm

II. Taxonomy resources are incomplete and out of date

Taxonomy resources

  • Many taxonomy databases; differ in coverage and fitness
  • Long lag: taxonomy publication -> taxonomy databases
    • Taxa and names missing
    • Relationships out of date
  • Much data not openly available

Open Tree Taxonomy

  • 7 input taxonomies + user-contributed patches
  • clade-based input priority
  • TNRS services to resolve synonyms, homonyms, etc

Infrastructure

Infrastructure

bold = version 2.0

  • Web applications:
    • study curator
    • browsers for tree and for taxonomy
  • Pipelines:
    • taxonomy merging
    • tree synthesis
  • Databases:
    • tree store (github)
    • synthetic tree and taxonomy (neo4j)
    • tree index (postgreSQL)

Interoperability via APIs

rotl: R package wraps APIs
provides trees given species
imports trees for comparative analyses

Other collaboration

Providing taxonomy feedback
Joint workshops, webinars

Community building

Tree-for-all hackathon

U Michigan, September 2014

https://github.com/OpenTreeOfLife/hackathon

  • Invited people to build on our APIs
  • Participants from Arbor, rOpenSci, iDigBio, Supertree Toolkit, IPNI, Atlas Living Australia, Fossil Calibration Database, Species File Group

FuturePhy / Arbor / OpenTree workshop

U Florida, February 2016

https://blog.opentreeoflife.org/2016/03/09/futurephy-clade-workshops/

  • Clades: barnacles, catfishes, beetles
  • Participants: taxonomy, systematics, ecology, phylogenetic methods, bioinformatics, genomes, ontologies, and scientific illustration
  • Tested tree input, custom synthesis, conflict visualization

Demo!

https://tree.opentreeoflife.org

What could we do better?

Synthetic tree woefully underrepresents phylogenetic knowledge

Clade Tips Phylo tips % phylo
Embryophyta 296,611 14,425 4.9%
Fungi 309,631 628 0.2%
Metazoa 1,467,443 21,323 1.5%
Insecta 979,709 3098 0.3%

Improving the tree

Incorporate more trees from phylesystem

Import trees from TreeBASE, Dryad

Scrape trees from images

Incorporate more taxonomy feedback / resources

Encourage more community input of trees

OpenTree needs community curation

Motivating producers of trees to contribute data

  • Social motivation
    • curator statistics, leaderboards
    • favorites, notifications
  • Research services
    • custom synthesis
    • conflict analysis
    • hosting

More needed improvements

https://tree.opentreeoflife.org

Tree visualization

Branch lengths / divergence times

Links back to raw data (sequences, specimens)

Greatly expanding dark parts of the tree

Scientific American, March 2016

    Need resources describing OTUs in microbial trees

    (aka Tell Me More about “Prevotella dentalis ES 2772 DSM 3688”)

Summary

Have process for curating and synthesizing phylogenetic and taxonomic data into comprehensive tree of life

Have data and services available through APIs; in use by biodiversity informatics community

Need to motivate participation of systematics community by providing tools and services