1. Computational Science Initiative Event

    "Data Discovery in Linked Science"

    Presented by Dr. Line Pouchard, Purdue University

    Friday, September 30, 2016, 2 pm
    John Dunn Seminar Room, Bldg. 463

    Hosted by: Kerstin Kleese van Dam

    Linked Science is the practice of inter-connecting scientific assets by publishing, sharing and linking scientific data and processes in end-to-end loosely coupled workflows that enable the sharing and re-use of scientific data. Linked science relies on provenance, curation, and preservation for obtaining reproducible results and sharing datasets for re-use. Many scientific questions are addressed by using data that live in numerous data centers and archives that do not expose their content directly to search engines. A researcher must perform searches on numerous sites, using queries based on multiple metadata schemas, and access data from heterogeneous sources in order to find the data they need to answer scientific questions. We will present a semantic service to enhance the discoverability of datasets in earth science data archives and how it is used in a Linked Science scenario related to the discovery of datasets for a climate change study. We also present how the applicability of this service is expanded by the deployment of an ontology repository and the creation of mappings between ontology entities to link various annotations. Linked ontology entities provide new annotations for datasets. When datasets are better annotated with more granular descriptions, they can be discovered, accessed and operationalized by workflows for numerous provenance and other tasks. This work demonstrates that the use of ontologies—even lightweight ones—provides a path for helping domain experts find the information that they need from heterogeneous datasets for use in complex multi-disciplinary studies, builds better trust in results, and works toward improved reproducibility.