A COMPARITIVE STUDY BETWEEN GRAPH DATABASE AND RELATIONAL DATABASE
Abstract- This research documents a comprehensive evaluation of the emerging graph databases along with a benchmark study to compare it to the existing relational model. With the ease of the graphical representation brought in with Neo4j, we saw the opportunity to attempt getting details about the various attributes in the dataset and analyze this data to present a statistical view along with its popular counterpart, MySQL. The ultimate goal of this study is to determine whether a traditional relational database system like MySQL, can be replaced completely in production, by a graph database, such as Neo4j.
Keywords: Graph database, Relational Database, 0Neo4j, Cypher, Property Graph Model.
I. INTRODUCTION
Data is ever increasing. We need a system to represent, store and manipulate complex information, detect correlations and patterns, construct data models etc. Furthermore, being independently maintained, data can change in time or even change its base structure, making it difficult for modelling systems to accommodate these changes. Current representation and storage systems are not very flexible in dealing with structural changes and also they are not powered with the ability of performing complex data manipulations of the sort mentioned above.
Relational Database Management Systems are probably the ones that we are most familiar with in 21st century computer science. Relational databases store
Data objects can model relational data or advanced data types such as graphics, movies, and audio. Smalltalk, C++, Java, and others are objects used in object-oriented data. The object-relational is a combination of relational and object-oriented databases. Traditional and advanced data types can be used to construct database management systems. These systems can connect to a company’s website and update records as needed. Database Approach The main purpose of a database is data storage that can be stored and retrieved when needed. A popular common language called structured query language (SQL) is used to store and retrieve data in relational database. This language enables the systems to run a report or modify data or remove the data from the database. A database management system (DBMS) controls all aspects of a database, this is not limited to the creation, maintenance, and use of database. The DBMS ensures proper applications are able to access the database. An important purpose of a DBMS is to maintain the data definitions (data dictionary) for all the data elements in the database. It also enforces data integrity and security measures. Data Models Data models provide a contextual framework and graphical representation that aid in the definition of data elements. In a relational database, the data model lays the foundation for the database and identifies important entities,
With the advent in technology there has also been a steep increase in the crime rate. These crimes can be closely related to the graph database model. Usually the crimes have a number of sources from which they can start. These sources can be considered as the nodes of the graph. Usually these speculations lead to one or more paths which further add to the case. These are connected by edges leading to newer nodes. Thus forming a graph. The greatest similarity between the two involves eliminating formation of huge relational database. This involves the first step towards the construction of the graph database.
A relational database is a database that consists of a collection of tables with columns showing entities, and rows showing data. This type of database uses a primary key and foreign key. The foreign key in another table will point to the primary key of a table, and this is how tables can relate to each other. This permits for one-to-one, one-to-many, and many-to-many relationship between the data. An advantage of relational databases includes the ease of adding or modifying new tables and entities without needing to change the structure of the database already in place. Relational database have many features, including indexing, setting data type, and setting validation tests, all these help to ensure data integrity.
Relational data is when you can put data in a computer one time and it grows
Relational database contains data records that do not have a preset of relationships, permitting the user to define his or her relationship when accessing the data. Since users have much control over the data being accessed, relational databases can perform a variety of tasks. Such as defining the database; querying the database; adding, editing, and deleting data from the database; modifying the structure of the database; securing data from public access; communicating within the network; and exporting and importing data (Murthy, 2008).
Since 1960 and beyond the need for an efficient data management and retrieval of data has always been an issue due to the growing need in business and academia. To resolve these issues a number of databases models have been created. Relational databases allow data storage, retrieval and manipulation using a standard Structured Query Language (SQL). Until now, relational databases were an optimal enterprise storage choice. However, with an increase in growth of stored and analyzed data, relational databases have displayed a variety of limitations. The limitations of scalability, storage and efficiency of queries due to the large volumes of data [1] [2].
Relational database normalization is the process of decomposing relations with anomalies to produce well structured relations. This paper delves into relational database normalization giving the reason why a database schema in third normal form is considered to be of higher quality than an un-normalized database schema.
Graph database: Strength: designed for data whose relations are well represented as a graph and has elements which are interconnected. Graph databases are well-suited to irregular and complex structures. Weakness: Relationships are stored at the individual record level and uses more
Earlier the heterogeneous systems integration is a major issue. Therefore it has to be considered if the system proposed is amenable to multiple platforms and will give the same results. There are many applications available to enforce this, but it is to be borne in mind that in a data management system many components have to be integrated with the system and the layering of these components becomes critical. (Thuraisingham, 2001) But the requirements today have gone beyond the traditional DBMS and the system is expected too seamlessly perform varied functions like tracking functions, data analysis and reporting, vendor
Databases allow us to easily store and retrieve data in a purely digital format. The strength of this is that large amounts of data can be stored and retrieved with minimal effort on the part of the user. Opposed to manually flipping through files, one can quickly pull up the requested data through a computer program. Many systems that were conventionally paper and file based have been converted to a digital format which are now stored in one or more databases.
Data has always been analyzed within companies and used to help benefit the future of businesses. However, the evolution of how the data stored, combined, analyzed and used to predict the pattern and tendencies of consumers has evolved as technology has seen numerous advancements throughout the past century. In the 1900s databases began as “computer hard disks” and in 1965, after many other discoveries including voice recognition, “the US Government plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tape.” The evolution of data and how it evolved into forming large databases continues in 1991 when the internet began to pop up and “digital storage became more cost effective than paper. And with the constant increase of the data supplied digitally, Hadoop was created in 2005 and from that point forward there was “14.7 Exabytes of new information are produced this year" and this number is rapidly increasing with a lot of mobile devices the people in our society have today (Marr). The evolution of the internet and then the expansion of the number of mobile devices society has access to today led data to evolve and companies now need large central Database management systems in order to run an efficient and a successful business.
A triplestore is a non-relational database that stores and retrieves triples using semantic queries. Triples store data and are imported and exported using the Resource Description Framework (RDF). A triple is a data entity comprising three components a subject, predicate and object that describe a set of data such as “We have shoes”. The subject is “We”, the predicate is “have” and the object is “shoes”. Triples provide an easy and flexible way of modeling data that is similar to how the human brain functions. Triples have a semantic structure that can easily represent connections between structured data and free flowing text. Triples form interconnected data networks (graphs), which are easy-to-read and can represent complex data structures.
The modern RDBMS advancements are not capable of supporting unstructured information with ideal space necessity. The plan winds up plainly mind-boggling and is henceforth troublesome for designers. The requirement for unstructured information administration is so annoying with conventional RDBMS arrangements (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). Moreover, RDBMS turns out to be an exorbitant answer for creating light-footed web applications with direct information investigation necessities. NoSQL is developing as a proficient possibility in this situation, which connects the issues related with RDBMS innovation. The market development can credit to creative dispatches of NoSQL arrangements, and collective endeavors by NoSQL sellers and clients. The endeavors of organizations, to enhance their market offerings, are creating the request of NoSQL, as a back-end bolster (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). The emergence of agile software development is creating the demand for NoSQL (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). They offer users much more avenues to accept data in many different forms. NoSQL is adaptable as SQL but offers many more uses that can apply to many organizations.
Currently, there are two major of database management systems which are used to deal with data, the first one called Relational Database Management System (RDBMS) which is the traditional relational databases, it deals with structured data and have been popular since decades from 1970, while the second one called Not only Structure Query Language databases (NoSQL), they have been dealing with semi-structured and unstructured data; the NoSQL term was introduced for the first time in 1998 by Carlo Strozzi and Eric Evans reintroduced the term NoSQL in early 2009, and now the NoSQL types are gaining their popularity with the development of the internet and the social media. NoSQL are intending to override the cons of RDBMS, such as fixed schemas, JOIN operations and handling the scalability problems. With the appearance of Big Data,