Spark Application Coexisting w/ NOSQL Databases - Gokul Prabagaren & Nagesh Kumar Vinnakota

Designing schema for NoSQL databases requires careful planning. Learn how Spark applications can coexist with MongoDB and Cassandra to improve query performance and scalability in this talk.

Key takeaways
  • Designing schema for NoSQL databases requires a deep understanding of the data and its relationships.
  • Distributed NoSQL databases like Cassandra and MongoDB require careful planning and optimization to achieve optimal performance.
  • Incorrect partitioning can lead to slow query performance and data inconsistencies.
  • Using the right partitioner is crucial for efficient data storage and retrieval in distributed NoSQL databases.
  • Pushdown to the server can improve query performance and reduce data transfer over the network.
  • Incorrectly modeling data can lead to unwanted results and anti-patterns in NoSQL databases.
  • MongoDB is suitable for use cases that require flexible schema and high scalability, while Cassandra is suitable for use cases that require low latency and high availability.
  • Spark applications can benefit from the use of NoSQL databases like MongoDB and Cassandra to improve query performance and scalability.
  • Co-locating Spark executors with data storage can improve query performance and reduce data transfer over the network.
  • The right partitioner can help to reduce the number of partition splits and improve query performance in distributed NoSQL databases.
  • Schema design in NoSQL databases is critical to achieving optimal performance and scalability.
  • Using the right data model and schema can help to improve query performance and reduce data inconsistencies in NoSQL databases.
  • The use of Spark and NoSQL databases can improve the performance and scalability of big data applications.
  • The right caching mechanism can help to improve query performance and reduce data inconsistencies in NoSQL databases.