Clojure Where it Counts: Tidying Data Science Workflows - Pier Federico Gherardini & Ben Kamphaus

Learn how to streamline data science workflows with Clojure, leveraging Datomic's schema-agnostic data structures and query engine to integrate complex, heterogenous data with ease.

Key takeaways
  • Data sets in R are equivalent to data in RDF, similar to triple stores.
  • Cancer research involves complex data integration from multiple sources, including clinical information, gene expression profiles, and imaging data.
  • Datomic allows for efficient querying and integration of heterogenous data sets.
  • A critical aspect of data science is being able to combine data from different sources to extract meaningful insights.
  • The Datomic Meta Model provides a consistent data model across all data sources.
  • Aiming to break down silos by making data more accessible and integrable is a key goal.
  • Working with complex data sets can involve manually constructing queries, leading to errors and inefficiencies.
  • Schema-agnostic data structures (such as datums) enable better handling of complex data.
  • Cogito’s data log parses data in R, translating queries into Datomic-compatible structures.
  • The Datomic Query engine is capable of optimizing queries using various heuristics.
  • Creating a common schema for disparate data sources allows for more efficient querying and analysis.
  • Immutable data structures ensure that modifications are tracked and reproducible, supporting analytical reproducibility.
  • Integration with R and Datomic enables data scientists to focus on modeling and analysis, rather than data storage and retrieval.
  • The goal of Cognitex is to empower analysts with better tools and workflows for managing complex data sets.
  • The project includes several components, including data ingestion, query optimization, and visualization.