What is the fuzz about Big Data?

What is the fuzz about Big Data?
on 10 Apr 2018 09:55 AM
  • Rang Technologies
  • Big Data

With the exponential growth in the usage of social networking, internet of things and devices like sensors, there has been a huge increase in demand for technologies which can store complex volumes of data and scale horizontally with minimum cost and flexibility to store unstructured data. Here, speed and scalability are important rather than consistency and structure. Traditional databases can no longer support the kind of data that gets generated from web click streams, sensor data, videos and file sharing etc. This is where NoSQL databases come into picture. NoSQL databases can effectively handle the three V's of Big Data (Velocity, Volume and Variety) effectively. In this blog, I am going to briefly touch upon traditional RDBMS, their issues in dealing with big data and various NoSQL databases and their features.

Reasons for explosion of big data:
• Online interactions
• Sensors
• Social network interactions

Issues with RDBMS:
• Weak clustering
• Traditional databases do not scale linearly i.e. Adding one more node to a single instance of MySQL server does not improve the performance twice.
Since traditional databases enforce strict ACID rules, every piece of data needs to be synchronized across all the clusters before a transaction is completed. This adds significant overhead to the database system making linear scaling impossible to achieve.

Important Features of NoSQL databases:
1) Ability to store semi-structured and unstructured data.
2) Ability to scale horizontally on commodity hardware.
3) Relaxed ACID properties

CAP Theorem or Brewer's theorem, proposed by Eric Brewer states that only two out of three aspects of scaling out can be satisfied by any distributed database.

Consistency: All nodes see the same data
Availability: Each client can always read and write. In other words, each client always receives a response.
Partition Tolerance: System continues to be up and running even though there is some message drop or loss between nodes.

Different types of NoSQL databases:
• Key Value stores
• Document databases
• Wide column or column-family stores
• Graph database

Key value Stores:
A key value database is a hash table or dictionary, where key is used to search and retrieve the values associate with the key.
Key — Search based on Key, alpha-numeric
Value — text, lists, set or complex objects

Some of the examples of Key Value databases are Redis, Voldemort, Berkeley Db and Riak.

Usage: Key value databases are used for storing information related to user profiles, session information, product information etc.

Document databases:
In document databases, both key and value are searchable, and value is a semi-structure data (name, value) pair. The value column may vary from row to row. The values are typically in JSON, BSON or XML formats.
Examples include MongoDB which is in BSON format, CouchDB which contains JSON format.

Usage: Storing and managing text documents, email messages and XML documents.

Column-Family stores:
Column family databases store data in the form of a column-oriented model. Column family databases contain Keyspace which is like schema in rdbms. A keyspace contains all column families (like tables in rdbms) which in turn contains rows and each row containing multiple columns.
Each row consists of super columns which are a set of columns within a column. Each wide column consists of multiple column and value pairs. Data is stored in schema less nature so that each of the rows can contain different number of columns.
Below is an example of a wide column database structure:

Examples are Cassandra, Apache Hbase, Google Big Table, Dynamo DB

Usage: Useful for storing complex unstructured data, massively scalable and highly complex and varied data.

GRAPH databases:
Graph databases are based on graph theory. Graph databases have nodes, edges and properties. Nodes are entities and edges are relationships between nodes. Each node has attributes (key-value pair). Graph databases have structured nodes of interconnected key-value pairs.

Examples of Graph database are Neo4j, Infogrid, InfiniteGraph etc.
Usage: Graph databases are useful for social media network analysis, exploring relationships and analyzing web browsing behavior.

I hope that this blog provides a clear understanding of difference between SQL and NoSQL databases and various types, advantages and their applications in real time.