Big Data > NoSQL Data Modeling
A New Look at Data Modeling in a NoSQL World
Schemaless by Design
By Michael Elkin, DBS-H Co-Founder
This article provides a fresh look at data modeling, by addressing the challenge of integrating increasingly popular NoSQL databases across the enterprise with SQL databases. We explore some of the concepts of NoSQL database design, which is radically different from SQL design, and look at how the business perspective of data can be leveraged for sound NoSQL database modeling. To that end, the Document Relations Diagram (DRD) is a model that facilitates schemaless database mapping, by virtue of its user-friendly and visual display. The benefits of denormalization for increasing data accessibility are also discussed.
How to Integrate SQL and NoSQL Data?
MongoDB, Cassandra, Couchbase, and Hadoop have become key players in the big data arena, thanks to their performance, ability to scale out, agile development, reduced downtime, flexible data modeling, and cloud deployment.
Designed for speed and a specific purpose, NoSQL databases don’t require the rigid database schemas of SQL. As a result, NoSQL databases allow for quick deployment and adaptation. They can handle data sets with diverse structures and fields, and run well in distributed modes.
Although SQL databases are outperformed by NoSQL technology, they are still crucial to data management, by virtue of their standardized handling of transactions.
As NoSQL technology becomes more prevalent in the enterprise scene, data architects are faced with the challenge of designing data models that incorporate schemaless systems. According to IT professionals at the Enterprise Data World 2015 conference in Washington, D.C, neglecting the important issue of data modeling could lead to database disorder. Common data modeling practices call for a change that will facilitate database manageability, where NoSQL and SQL databases can co-exist seamlessly in the same enterprise. NoSQL technologies cannot be leveraged without first thinking about how they fit into the larger IT picture.
What is a Schemaless Database?
Unlike SQL database design, NoSQL database design is not about tables and joins; it requires a paradigm shift to think about a whole new non-relational data structure.
Working with NoSQL systems requires “a fundamental mindset change” on the part of data architects who are versed in the ways of relational databases, according to Donovan Hsieh, a senior enterprise data architect at eBay Inc.
Moreover, NoSQL is not a singular system, but actually includes four primary database models: key-value stores, wide column stores, graph databases, and document databases, which is the focus of this article.
Diagram 1: NoSQL Database Models
When developers can clearly visualize the NoSQL design, it becomes transparent that schema design is required in the NoSQL data world, just as it’s necessary for SQL databases.
A direct link exists between schemaless design and dynamically typed languages. Since it’s easy to represent schemaless design constructs in PHP, Python and Ruby, mapping SQL to the NoSQL database should be a natural fit.
What’s Your Point of View?
For organizations to succeed in managing all their data, they need to address the main challenge of integrating SQL and NoSQL database models. Right now, companies are looking for a solution to map their SQL-based relational data to NoSQL-based non-relational data.
Since NoSQL data modeling is not as well researched as SQL database design, it lacks the systematic theory found in relational databases. Contrary to popular opinion, NoSQL data design often requires a deeper understanding of the business perspective of data. This article outlines some of the basic design principles required for good NoSQL database modeling.
Whereas relational modeling typically depends on the structure of available data, NoSQL data modeling is driven by the types of queries. The main design theme in NoSQL database design is “what questions do I have?” or “what is my point of view?”
When putting data into a NoSQL database, developers should have a usage pattern in mind. If developers know the questions that business users will ask of data, then they can build their database objects accordingly. The questions that users ask, and not the data itself, determine how to model database mapping.
Document Relations Diagram: Visual Database Mapping
Among the various models that exist for schemaless database mapping, the Document Relations Diagram (DRD) stands out with its user-friendly and visual display of NoSQL database modeling.
The origin for the Document Relations Diagram (DRD) is that in a NoSQL environment, architects, data scientists, analysts, DBAs, and software developers can design documents according to business case, or point of view. When planning schemaless database design, developers need to consider the user’s point of view, and ask themselves: what is the business case for this mapping? For example, in the case of a flight database, is it flights, passengers, or tickets?
Based on a proprietary algorithm developed by DBS-H, the Document Relations Diagram provides a default recommendation for a specific business case. It also enables developers to look at data from different angles, and design data modeling accordingly. A different point of view changes the mapping of embedded documents.
The Document Relations Diagram allows developers to build a new structure in the big data NoSQL database, and create links between SQL and NoSQL databases with pre-defined symbols that visually represent the elaborate mapping inherent in NoSQL schema design.
A major advantage of NoSQL schema design, as reflected in the Document Relations Diagram, is that most data can be accessed from one document. One document includes an array of other documents that have the same structure, as indicated by the [ ] symbol. Developers can customize design mapping, by adding or removing embedded documents from an array of documents.
There is no need to join different tables. One-to-many relationships are minimized through the use of embedded documents and fewer joins. With NoSQL schema design, almost everything can be found in a single place. The flexible nature of NoSQL easily accommodates big documents with complex internal structures, as well as external documents that are related via connector lines.
The following DRD example shows how data for a flight database is mapped in a NoSQL database:
Diagram 2: Denormalization: Helping to Map SQL to NoSQL
Basic principles for good NoSQL data design are data duplication and denormalization, which involves copying the same data into multiple documents to optimize query processing. Denormalization also tailors the user’s data to a particular data mode. In general, denormalization is helpful for the following trade-offs:
· Input/output per query vs total data volume – Denormalization can be used to group all data that is needed to process a query in a single place. This means that for different query flows, the same data is accessed in different combinations. This requires data duplication, which increases total data volume.
· Processing complexity vs total data volume – Modeling-time normalization and consequent query-time joins increase the complexity of the query processor, especially in distributed systems. Denormalization allows you to store data in a query-friendly structure that simplifies query processing.
With data duplication across tables, data becomes more accessible, thus increasing overall agility and flexibility.