Reimagining the Relational Database for the Cloud Era
Relational databases provide a powerful platform for handling data and enable developers to rapidly create rich applications. However, even though the foundational mathematics, language and model are more than scalable, current implementations have put unnatural limits on its usefulness. deepSQL has reimagined the implementation of the relational database, unleashing its performance, scalability and flexibility through three fundamental principles:
- Minimize information to the least storage representation in disk, memory, and CPU cache
- Collapse operations to the lowest cost needed to service requests
- Maximize parallel resource utilization across all resources
The result is a NewSQL solution that unleashes the relational database to concurrently service any combination of workloads with low-latency performance at cloud scale, but uses the well-known MySQL features and API with JSON support.
Continuously Adaptive Sequential Summarization of Information (CASSI)
At the heart of deepSQL is CASSI, the combination of architecture, intelligence and process that continuously organizes and services data based on the unique characteristics of current workloads, available virtual or physical hardware resources and their cost efficiency. Three interdependent components inform, organize and optimize in a continuous loop. These components are dynamic data structures, kernel scheduler and the intelligence engine.
Dynamic Data Structures
deepSQL separates data structures in memory, disk and CPU cache into independent entities. They are dynamically organized depending on the composition of the data and the needs of the application as expressed through workloads using the system. These structures are represented virtually and include mechanisms that allow for data summarization. The result is highly efficient information representation that accelerates CRUD operations while minimizing IOPS.
All transactions, including delta changes, are stored to disk as streaming, append-only operations in segmented column stores. This provides many of the benefits found in OLAP systems without sacrificing the transactional performance and compliance needed for OLTP systems. It virtually eliminates seeks during writes and dramatically reduces seeks during read operations, significantly improving IOPS efficiency and disk throughput for both operations.
Dynamic Virtual Data Representation and Summarization
CPU Cache Structures
Data and operational (code) expressions are collapsed into the minimal viable form and linearly organized in cache. Combined with lockless algorithms that minimize context switching, the CPU operates at maximum parallel efficiency regardless of workload combination.
Dynamic In-Memory Structures
deepSQL’s Dynamic Virtual Data Representation breaks table data into logical segments, or chunks, that are dynamically sized for maximum efficiency based on data type, entropy and workload request characteristics. Logical segments are then represented by virtual segments that contain summary metadata (cardinality, counts, etc.) without containing the data itself.
These virtual segments are further aggregated into summary segments which contain summary metadata on the virtual segments, creating a hierarchy of information on the data. Not only does this capability utilize the least amount of memory necessary to represent all of the data in the system, but allows highly efficient, low-latency operations in-memory on the least amount of information needed to satisfy the request.
Dynamic Summarization uses the information in virtual and logical segments to create a predictive model of how the data is changing over time and plan the data organization and layout.
For example, if a table has a column named Gender with an option of male or female, Dynamic Summarization will add data such as the count of males and females as metadata in a summary segment. Instead of counting the number of males when a query comes in, the answer will already be available in-memory. This combination of various segment types enables the data to be represented in a summarization hierarchy that is predictively organized.
It is this organizational process that allows deepSQL to scale to table sizes beyond 1 trillion rows while enabling low-latency performance at scale for concurrent transaction and analytic requests. The result is a database that delivers in-memory performance yet transcends the limits of physical system memory while maintaining full ACID compliance even on a single node.
The kernel scheduler intelligently controls when and where work is performed resulting in a virtually lockless database. Through Dynamic Resource Awareness, it’s able to understand all the hardware available to the system and calculates their capabilities. It will detect whether the database is running on a physical or virtual platform and adjust things like kernel or user mode scheduling accordingly.
Work such as compression and data reorganization can be performed outside of the transaction work stream to maintain maximum system throughput while eliminating stranded system resources. The result is massive parallelism for concurrent workloads that is linearly scalable based on CPU and memory while requiring minimal IOPS.
The intelligence Engine monitors the holistic system and makes predictive determinations using Machine Learning on how best to reorganize data across the system. The result is an unsupervised learning system that continually optimizes the system without taking the database offline.
Continuous Adaptive Sequential Summarization of Information
Optimization Feedback Loop
CASSI models, learns and predictively optimizes for unique workload usage patterns, dynamically adjusting data as the conditions of the workload change over time. This adaptive process is a self-tuning loop encompassing four stages: observe & analyze, predict & adapt, orchestrate & optimize and closing the loop.