Reimagining the Relational Database for the Cloud Era

Relational databases provide a powerful platform for handling data and enable developers to rapidly create rich applications.  However, even though the foundational mathematics, language and model are more than scalable, current implementations have put unnatural limits on its usefulness. deepSQL has reimagined the implementation of the relational database, unleashing its performance, scalability and flexibility through three fundamental principles:

  • Minimize information to the least storage representation in disk, memory, and CPU cache
  • Collapse operations to the lowest cost needed to service requests
  • Maximize parallel resource utilization across all resources

The result is a NewSQL solution that unleashes the relational database to concurrently service any combination of workloads with low-latency performance at cloud scale, but uses the well-known MySQL features and API with JSON support.

Continuously Adaptive Sequential Summarization of Information (CASSI)

At the heart of deepSQL is CASSI, the combination of architecture, intelligence and process that continuously organizes and services data based on the unique characteristics of current workloads, available virtual or physical hardware resources and their cost efficiency. Three interdependent components inform, organize and optimize in a continuous loop. These components are dynamic data structures, kernel scheduler and the intelligence engine.

Dynamic Data Structures

deepSQL separates data structures in memory, disk and CPU cache into independent entities. They are dynamically organized depending on the composition of the data and the needs of the application as expressed through workloads using the system. These structures are represented virtually and include mechanisms that allow for data summarization. The result is highly efficient information representation that accelerates CRUD operations while minimizing IOPS.

On-Disk Structures

All transactions, including delta changes, are stored to disk as streaming, append-only operations in segmented column stores. This provides many of the benefits found in OLAP systems without sacrificing the transactional performance and compliance needed for OLTP systems. It virtually eliminates seeks during writes and dramatically reduces seeks during read operations, significantly improving IOPS efficiency and disk throughput for both operations.

Dynamic Virtual Data Representation and Summarization

Dynamic Virtual Data Representation and Summarization

CPU Cache Structures

Data and operational (code) expressions are collapsed into the minimal viable form and linearly organized in cache. Combined with lockless algorithms that minimize context switching, the CPU operates at maximum parallel efficiency regardless of workload combination.

Dynamic In-Memory Structures

deepSQL’s Dynamic Virtual Data Representation breaks table data into logical segments, or chunks, that are dynamically sized for maximum efficiency based on data type, entropy and workload request characteristics. Logical segments are then represented by virtual segments that contain summary metadata (cardinality, counts, etc.) without containing the data itself.

These virtual segments are further aggregated into summary segments which contain summary metadata on the virtual segments, creating a hierarchy of information on the data. Not only does this capability utilize the least amount of memory necessary to represent all of the data in the system, but allows highly efficient, low-latency operations in-memory on the least amount of information needed to satisfy the request.

Dynamic Summarization

Dynamic Summarization uses the information in virtual and logical segments to create a predictive model of how the data is changing over time and plan the data organization and layout.

For example, if a table has a column named Gender with an option of male or female, Dynamic Summarization will add data such as the count of males and females as metadata in a summary segment. Instead of counting the number of males when a query comes in, the answer will already be available in-memory. This combination of various segment types enables the data to be represented in a summarization hierarchy that is predictively organized.

It is this organizational process that allows deepSQL to scale to table sizes beyond 1 trillion rows while enabling low-latency performance at scale for concurrent transaction and analytic requests. The result is a database that delivers in-memory performance yet transcends the limits of physical system memory while maintaining full ACID compliance even on a single node.

Kernel Scheduler

The kernel scheduler intelligently controls when and where work is performed resulting in a virtually lockless database. Through Dynamic Resource Awareness, it’s able to understand all the hardware available to the system and calculates their capabilities. It will detect whether the database is running on a physical or virtual platform and adjust things like kernel or user mode scheduling accordingly.

Work such as compression and data reorganization can be performed outside of the transaction work stream to maintain maximum system throughput while eliminating stranded system resources. The result is massive parallelism for concurrent workloads that is linearly scalable based on CPU and memory while requiring minimal IOPS.

Intelligence Engine

The intelligence Engine monitors the holistic system and makes predictive determinations using Machine Learning on how best to reorganize data across the system. The result is an unsupervised learning system that continually optimizes the system without taking the database offline.

Continuous Adaptive Sequential Summarization of Information

Continuous Adaptive Sequential Summarization of Information

Optimization Feedback Loop

CASSI models, learns and predictively optimizes for unique workload usage patterns, dynamically adjusting data as the conditions of the workload change over time. This adaptive process is a self-tuning loop encompassing four stages: observe & analyze, predict & adapt, orchestrate & optimize and closing the loop.

  • Observe & Analyze

    CASSI will scan the host hardware environment on start and determine the hardware available to it including CPUs, cores, caches, memory, network, storage and more and determine costs for use during operations. As data flows into the system, CASSI will observe & analyze the volume, velocity and variety of data and model its behavior over time.
  • Predict & Adapt

    CASSI will then configure the database parameters to utilize the total available resources of the system based on cost assignment. It will then predict & adapt using machine learning algorithms on how to best represent the data for reads and writes.
  • Orchestrate & Optimize

    The system then intelligently leverages the full resources of the machine as needed. As the system requests information, CASSI will orchestrate & optimize data organization, reads and writes and memory / disk representations based on the best choices to achieve the highest level of performance and scale.
  • Closing the Loop

    The loop is completed as CASSI observes & analyzes the ongoing results of the automated tuning to continue the optimization process.

Unprecedented Performance at Scale

deepSQL creates a database that not only greatly increases database performance, but also drives scale into the hundreds of billions of rows.  With other databases, you must choose to optimize for either reads or writes, however deepSQL provides true hybrid transactional (OLTP) and analytical (OLAP) processing on the same data set while increasing performance at scale.

USE CASES
All Tests performed on MySQL 5.5
MySQL with Deep EngineMySQL with InnoDBImprovement

Streaming Data test (Machine-to-Machine)

iiBench Maximum Transactions / second with Single index

(25 clients, 4 GB Cache, 32 cores, HDDs)

3.79M/Sec217k/sec17x

Transactional Workload Test (Financial)

Sysbench transaction rate

(1M rows, 32 cores, 4GB cache, HDDs)

15.083/sec1.381/sec11x

Complex Transactional Test (e-Commerce)

DBT-2 transaction rate using HDD

(50 clients, 20 warehouses, scale=1, HDDs)

205.184/min15.086/min13.6x

Social Media Transactional Test (Twitter)

iiBench with 250M Rows, 7 Indexes w/ composite keys

(24 clients, 4GB cache, 250M rows loaded, 24 cores, SSDs)

  • Database Creation
  • First query from Cold Start
  • Second query from Cold Start
  • Disk Storage footprint (uncompressed)

 

 

  1. 15 Minutes
  2. 50 Seconds
  3. 1 Second
  4. 29GB

 

 

  1. 24 Hours
  2. 5.5 Minutes
  3. 240 Seconds
  4. 60GB

 

 

  1. 96x
  2. 6.6x
  3. 240x
  4. 42%