Big Data Overview

It’s a data age now. Data is everywhere, like that famous line from the poem ‘The Rime of the Ancient Mariner’, water water everywhere but not a drop to drink. Thank God, we are driving value from that data…yes it’s drinkable. Internet accounts most of the database that we see today. It is unstoppable kind of entity and now looks indispensable for our lives now. Well, in this post, we are going to see, if not all, some aspects of Big Data.

Let’s begin with some of its common features:

  • Due to variety it is segmented and cumulative.
  • It does not facilitate direct decisions. You got to drive value for informed decision making.
  • It comes in many forms and is not a substitute for structured data. It is further classified into three types of data: structured, semi, and unstructured.
  • Generally, it is wide with hundreds of fields.
  • It is kind of unstoppable and versatile as it is created every day.
  • For business and organizations use, it is generated internally and externally.
  • It can be managed by database frameworks like Hadoop and Cassandra.

Big Data is characterized by six Vs, which are:


Volume denotes the huge scaling of data, ranging from terabytes to zettabytes. The impact of internet and social media resulted into the explosion of data, for this reason big data is sometimes also referred as digital data. Data has grown from gigabytes to terabytes, petabytes, exabytes and zettabytes. And the internet data is expected to exceed ten zettabytes in next ten years.

People involved into Big Data must go through the data size tables, as you can see that big data has been increasing so drastically that some new terms for data-size has been added, see the image.


Velocity accounts for the streaming of data and the movement of large volumes of data at high speed. It refers to the speed at which the data grows. Today, it is impossible to imagine people without a gadget, as a result continuous data is being generated through their gadgets, like tablets, mobile, laptops, smart devices, etc.

The various sources of data are as depicted in the picture:

Due to the increase in the global customer base and transactions and interactions with customers, the data created within an organization is growing along with external data. The contributors to this data growth are as follows:

  • Web
  • Billing
  • ERP
  • Machine Data
  • Network Elements
  • Social Media
  • Surveys


Variety refers to managing the complexity of data in different structures, ranging from relational data to logs and raw text. It refers to the different types of data, including text, images, audio, video, XML, and HTML. The three types of data are:

  • Structured data: it is represented in tabular format. Example: MySQL database.
  • Semi-structured data: data that does not have a formal data model. Example XML files
  • Unstructured data: data that does not have a predefined data model. Example text files.


It refers to the truthfulness of data


It refers to the presentation of data in a graphical format.


It refers to the derived value of an organization from using big data…basically it is done by the big data analytics.

Industry-wise use of big data:

Every industry has some use for big data. Some of the big data use cases are as follows:

  • Retail Sector: explicitly used for affinity detection and performing market analysis.
  • Credit Card Companies: detect fraudulent purchases to guide customers. Examine loan history before handing out credit card, CIBIL history and all.
  • Banks: Examine customer data before giving loan.
  • Medical Diagnostics: diagnose patient’s illness based on symptoms.
  • Digital Marketing: find effective marketing channels.
  • Insurance Companies: to make policies and calculate premiums.
  • Manufacturing Units and Oil Rigs: reduce risk of equipment failure.
  • Advertising: identify target audience

Big Data Analytics:

With the origin of big data analytics, complete sets of data can be used instead of sample data to conduct an analysis.

Big Data analytics help in:

  • Finding associations in data
  • Predicting future outcomes
  • Performing prescriptive analysis
  • Taking data-driven decisions
  • Increasing safety
  • Reducing maintenance cost
  • Prevent failures

Traditional technology can be compared with big data technology in the following ways:

Traditional Technology:

  • Limited scalability
  • Uses highly parallel processors
  • Data in one place
  • High-end hardware used
  • Uses storage technology, such as SAN

Big Data Technology:

  • Highly scalable (RDBMS as vertical and Non-relational as horizontal)
  • Uses distributed processing
  • Data is distributed
  • Commodity hardware used
  • Uses distributed data with data redundancy

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s