We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Clojure Where it Counts: Tidying Data Science Workflows - Pier Federico Gherardini & Ben Kamphaus
Learn how to streamline data science workflows with Clojure, leveraging Datomic's schema-agnostic data structures and query engine to integrate complex, heterogenous data with ease.
- Data sets in R are equivalent to data in RDF, similar to triple stores.
- Cancer research involves complex data integration from multiple sources, including clinical information, gene expression profiles, and imaging data.
- Datomic allows for efficient querying and integration of heterogenous data sets.
- A critical aspect of data science is being able to combine data from different sources to extract meaningful insights.
- The Datomic Meta Model provides a consistent data model across all data sources.
- Aiming to break down silos by making data more accessible and integrable is a key goal.
- Working with complex data sets can involve manually constructing queries, leading to errors and inefficiencies.
- Schema-agnostic data structures (such as datums) enable better handling of complex data.
- Cogito’s data log parses data in R, translating queries into Datomic-compatible structures.
- The Datomic Query engine is capable of optimizing queries using various heuristics.
- Creating a common schema for disparate data sources allows for more efficient querying and analysis.
- Immutable data structures ensure that modifications are tracked and reproducible, supporting analytical reproducibility.
- Integration with R and Datomic enables data scientists to focus on modeling and analysis, rather than data storage and retrieval.
- The goal of Cognitex is to empower analysts with better tools and workflows for managing complex data sets.
- The project includes several components, including data ingestion, query optimization, and visualization.