Apache Cassandra is a NoSQL database technology system. To understand Cassandra deeply, one first must go through the concept of database, especially non-relational or NoSQL. Cassandra is important for business, and is also becoming popular for its features which we will see later in the post. Well for now, lets’ first its history as in where it came from and how.
It was first build by Facebook for carrying out search enquiries. In 2008, it was declared as an open source technology as people can use it for free and contribute towards its growth. Next, in 2009 it was accepted by Apache and now it is an Apache-based project. Since it is an open-source, it is free. Well, there is Datastrax version, it is paid one and it works on top of Apache Cassandra. The paid version one can help you in processing and monitoring data quite fast. However, most of the companies prefer using Apache Cassandra, which is also used by learners and developers.
Apache Cassandra Overview:
Apache Cassandra lies somewhere between Google’s Bigtable and Amazon’s Dynamo. It is one of the popular NoSQL databases, like Mongo DB and Apache HBase. Cassandra is open source, works on distributed technology, and in CAP theorem, it fits for AP, means availability and partition (fault tolerance).
Cassandra’s architecture consists of nodes only, where collective nodes are called clusters. It doesn’t work on Master and Slave node method like Hadoop, well here every node is equal and one can retrieve or process data from any node, even if one node goes down. And for data replication, users can create as many as replicas as they wish. Because of distributed node system, Cassandra processes data very fast and there is no risk of losing data. Another great aspect is that user can add as many nodes as possible for data storage and it scales horizontally. Adding data nodes in Cassandra is extremely helpful and easy.
Writes Fast: Cassandra is designed in such a way that it can store massive amount of data along with the ability to write data really fast, without affecting the schema on-read factor.
Transaction Support: Cassandra supports ACID: Atomicity, Consistency, Isolation, Durability. For online transactions Cassandra is unbeatable.
Amazing Distribution: Because of node system methodology, it is easy to replica data and, data can be accessed from any data center.
Flexible Data Storage: Since Cassandra is NoSQL database, it can accommodate structured, semi-structured and unstructured database without the hassles of schema on-read and -write.
Scalable: If the need arises to increase data, well you can add nodes to it in a horizontal way. In RDBMS, horizontal scaling is not possible, thus Cassandra is gaining momentum. Or we can say that it offers linear performance, which assures quick response time while performing a query in data.
Almost Failure Free: Since according to CAP theorem, Cassandra provides A&P, it is considered that Cassandra records no failure, because upon failure of one node, users can access another node. Here all nodes are equal.
Cassandra is not a substitute for RDBMS, well it is an alternate to it. Many big companies across the world use it for its phenomenal performance record. In the next post, we will see about its architecture.