How to install Cassandra on CentOS 8

Share on Social Media

In this guide, you will learn how to install Cassandra on CentOS 8 and configure initial security. #centlinux #linux #cassandra

What is Apache Cassandra?

Apache Cassandra is an open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Here are the key aspects of Apache Cassandra:

Key Features

  • Distributed Architecture: Cassandra is designed as a distributed system, allowing it to seamlessly scale across multiple nodes and handle large volumes of data.
  • No Single Point of Failure: Data is replicated across multiple nodes, ensuring high availability and fault tolerance. There is no single point of failure in the system.
  • Linear Scalability: Cassandra provides linear scalability by adding more nodes to the cluster. It can handle thousands of nodes across multiple data centers.
  • Schema-Free Model: Cassandra follows a schema-free data model, allowing for flexible and dynamic storage of structured, semi-structured, and unstructured data.
  • High Performance: Optimized for fast reads and writes, making it suitable for use cases that require low latency and high throughput.
  • Tunable Consistency: Offers tunable consistency levels, allowing developers to choose between strong consistency for critical operations or eventual consistency for improved performance.
  • Query Language: Cassandra Query Language (CQL) is used to interact with the database, resembling SQL syntax with additional NoSQL capabilities.
  • Built-in Replication: Data is automatically replicated across nodes based on configurable replication factors, ensuring data redundancy and availability.
  • Multi-Data Center Replication: Supports replication across multiple data centers, allowing for global distribution of data with low latency access.

Use Cases

  • Big Data Applications: Used in big data environments where scalability, high availability, and performance are critical, such as real-time analytics and IoT applications.
  • Time Series Data: Suitable for storing and analyzing time-series data, such as logs, sensor data, and financial transactions.
  • Content Management: Used in content management systems where flexible schema design and horizontal scalability are essential.
  • Messaging and Chat Applications: Powers messaging platforms and chat applications that require low-latency data access and high availability.
  • Recommendation Systems: Supports recommendation engines by efficiently storing and retrieving user data and preferences.
Install Cassandra on Centos 8
Install Cassandra on Centos 8

Apache Cassandra Ecosystem

  • Apache Cassandra: The core database system.
  • DataStax Enterprise: A commercial distribution of Cassandra with additional enterprise features.
  • Cassandra Query Language (CQL): SQL-like language for interacting with Cassandra.
  • Cassandra Drivers: Client libraries in various programming languages (e.g., Java, Python, Node.js) for application integration.
  • Cassandra Tools: Various tools for monitoring, management, and data modeling.

Summary

Apache Cassandra is a robust NoSQL database known for its distributed architecture, high availability, linear scalability, and schema flexibility. It is widely adopted in industries requiring scalable and fault-tolerant solutions for handling large-scale data operations.

Recommended Training: Amazon DynamoDB Data Modeling for Architects & Developers from Rajeev Sakhuja

5152646 e3f8 8
show?id=oLRJ54lcVEg&bids=1597309

Environment Specification

We are using a KVM based CentOS 8 virtual machine with following specification.

  • CPU – 3.4 Ghz (2 cores)
  • Memory – 2 GB
  • Storage – 20 GB
  • Operating System – CentOS Linux 8.2
  • Hostname – cassandra-01.centlinux.com
  • IP Address – 192.168.116.206 /24

Update your Linux Operating System

Connect with cassandra-01.centlinux.com as root user by using a ssh tool.

As a best practice, update existing software packages in your Linux operating system.

# dnf update -y

Verify version of active Linux kernel by using uname command.

# uname -r
4.18.0-193.6.3.el8_2.x86_64

Verify version of your Linux operating system.

# cat /etc/redhat-release
CentOS Linux release 8.2.2004 (Core)

Install Cassandra Yum Repository

Apache Software Foundation provides official yum repositories for each version of Cassandra software.

You are required to add the yum repository as mentioned at Cassandra download page.

Create a repo file by using vim text editor.

# vi /etc/yum.repos.d/cassandra.repo

Add following directives in this file.

[cassandra]
name=Apache Cassandra
baseurl=https://downloads.apache.org/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS

Here, 311x is the respective version of Apache Cassandra i.e. 3.11.

It is the latest version at the time of this writing. Therefore, we are using it. If you want to install any other version of Apache Cassandra then you should update the version number in repo file accordingly.

Build yum cache for newly installed yum repository. Accept GPG keys if asked to do so.

# dnf makecache
CentOS-8 - AppStream                            7.3 kB/s | 4.3 kB     00:00
CentOS-8 - Base                                 5.0 kB/s | 3.9 kB     00:00
CentOS-8 - Extras                               162  B/s | 1.5 kB     00:09
Apache Cassandra                                2.1 kB/s | 3.6 kB     00:01
Metadata cache created.

Apache Cassandra 3.11 yum repository has been installed on your Linux server.

Install Cassandra on CentOS 8

Apache Cassandra requires JVM (Java Virtual Machine) to run. Although, we can explicitly install Java on your Linux server, but if you install Cassandra software by using dnf command, it will automatically installs all required dependencies including Java.

Therefore, you should directly install Cassandra on CentOS 8 server by using dnf command.

# dnf install -y cassandra

cqlsh (Cassandra Query Language Shell) requires Python to run. Therefore, you are also required to install Python as well.

Currently, Apache Cassandra is only compatible with Python 2.7. Therefore, you need to install the same on your Linux server.

# dnf install -y python2

Cassandra service is SystemV based, therefore, you have to use the legacy commands to enable and start it.

# service cassandra start
Starting cassandra (via systemctl):                        [  OK  ]
# chkconfig cassandra on

Verify the status of cassandra.service.

# systemctl status cassandra.service
â cassandra.service - LSB: distributed storage system for structured data
   Loaded: loaded (/etc/rc.d/init.d/cassandra; generated)
   Active: active (running) since Sat 2020-08-01 11:18:50 PKT; 51s ago
     Docs: man:systemd-sysv-generator(8)
 Main PID: 48050 (java)
    Tasks: 50 (limit: 12331)
   Memory: 1.1G
   CGroup: /system.slice/cassandra.service
           ââ48050 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el8_2.x86_64>

Aug 01 11:18:46 cassandra-01.centlinux.com systemd[1]: Starting LSB: distribute>
Aug 01 11:18:46 cassandra-01.centlinux.com runuser[47978]: pam_unix(runuser:ses>
Aug 01 11:18:50 cassandra-01.centlinux.com cassandra[47966]: Starting Cassandra>
Aug 01 11:18:50 cassandra-01.centlinux.com systemd[1]: Started LSB: distributed>

Use the nodetool command to verify the status of your Cassandra cluster.

# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  70.01 KiB  256          100.0%            7d916cdb-8065-42d0-97c0-c88c68b68aa3  rack1

Apache Cassandra has been installed on your Linux server.

Configure NoSQL Database Security

Configuration files for Apache Cassandra are located in /etc/cassandra/conf directory.

It is a safe practice to take a backup of the original configuration file, before modifying it. Therefore, create a copy of the original cassandra.yaml configuration file as follows.

# cd /etc/cassandra/conf/
# cp cassandra.yaml cassandra.yaml.bkp

Now, edit this file by using vim text editor.

# vi /etc/cassandra/conf/cassandra.yaml

Locate following parameters in this file.

authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
roles_validity_in_ms: 2000
permissions_validity_in_ms: 2000

And update their values as follows.

authenticator: org.apache.cassandra.auth.PasswordAuthenticator
authorizer: org.apache.cassandra.auth.CassandraAuthorizer
roles_validity_in_ms: 0
permissions_validity_in_ms: 0

Restart Cassandra service to take changes into effect.

# systemctl restart cassandra.service

Create a Database Admin user

Connect to cqlsh prompt by using the Cassandra default username/password.

# cqlsh -u cassandra -p cassandra
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.7 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help. 
cassandra@cqlsh> 

Create a database admin user by executing following command.

cassandra@cqlsh> CREATE ROLE ahmer WITH PASSWORD = 'Ahmer@1234' AND SUPERUSER = true AND LOGIN = true;

Exit from cqlsh prompt.

cassandra@cqlsh> exit

Again connect to cqlsh as new admin user.

# cqlsh -u ahmer -p Ahmer@1234
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.7 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help. 
ahmer@cqlsh> 

For better security it is always advisable to remove/disable the default users. Therefore, revoke admin role and login permissions from cassendra user.

ahmer@cqlsh> ALTER ROLE cassandra WITH PASSWORD = 'cassandra' AND SUPERUSER = false AND LOGIN = false;

Revoke all permissions from cassendra user.

ahmer@cqlsh> REVOKE ALL PERMISSIONS ON ALL KEYSPACES FROM cassandra;

Grant all permissions to new admin user.

ahmer@cqlsh> GRANT ALL PERMISSIONS ON ALL KEYSPACES TO ahmer;

Exit from cqlsh prompt.

ahmer@cqlsh> exit

Apache Cassandra has been configured. It is now ready to become part of a Cassandra cluster.

Read Also: How to install Cassandra-web on CentOS 8

Conclusion

Installing Apache Cassandra on CentOS 8 allows you to leverage a powerful, scalable, and distributed NoSQL database for handling large volumes of data with high availability.

By following the installation steps, configuring the necessary dependencies, and starting the Cassandra service, you can set up a reliable database environment. Once installed, ensure that Cassandra is running properly, optimize performance settings, and begin managing your data efficiently.

Need expert AWS and Linux system administration? From cloud architecture to server optimization, I provide reliable and efficient solutions tailored to your needs. Hire me on Fiverr today!

FAQs

What is Apache Cassandra, and why is it used?
Apache Cassandra is a distributed NoSQL database designed for high availability, fault tolerance, and scalability, making it ideal for handling large amounts of data across multiple servers.

What are the prerequisites for installing Cassandra on CentOS 8?
Before installing Cassandra, ensure that Java (OpenJDK 8 or later) is installed, as Cassandra runs on the Java Virtual Machine (JVM). Additionally, configure YUM repositories for downloading the latest Cassandra packages.

Why is Apache Cassandra preferred over traditional relational databases?
Cassandra offers horizontal scalability, high fault tolerance, and decentralized architecture, making it more suitable for handling big data applications compared to traditional relational databases.

What configurations are required after installing Cassandra?
After installation, the cassandra.yaml configuration file should be adjusted to define cluster name, seed nodes, replication settings, and other performance optimizations based on the deployment environment.

How can I verify that Apache Cassandra is running correctly?
After starting the Cassandra service, you can check its status using system commands, review log files for errors, and use the cqlsh shell to connect to the database and execute queries.

Looking for something?

Leave a Reply