Share on Social Media

In this guide, you will learn how to install Cassandra on CentOS 8 and configure initial security. #centlinux #linux #cassandra

What is Apache Cassandra?

Apache Cassandra is an open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Here are the key aspects of Apache Cassandra:

Key Features

  • Distributed Architecture: Cassandra is designed as a distributed system, allowing it to seamlessly scale across multiple nodes and handle large volumes of data.
  • No Single Point of Failure: Data is replicated across multiple nodes, ensuring high availability and fault tolerance. There is no single point of failure in the system.
  • Linear Scalability: Cassandra provides linear scalability by adding more nodes to the cluster. It can handle thousands of nodes across multiple data centers.
  • Schema-Free Model: Cassandra follows a schema-free data model, allowing for flexible and dynamic storage of structured, semi-structured, and unstructured data.
  • High Performance: Optimized for fast reads and writes, making it suitable for use cases that require low latency and high throughput.
  • Tunable Consistency: Offers tunable consistency levels, allowing developers to choose between strong consistency for critical operations or eventual consistency for improved performance.
  • Query Language: Cassandra Query Language (CQL) is used to interact with the database, resembling SQL syntax with additional NoSQL capabilities.
  • Built-in Replication: Data is automatically replicated across nodes based on configurable replication factors, ensuring data redundancy and availability.
  • Multi-Data Center Replication: Supports replication across multiple data centers, allowing for global distribution of data with low latency access.

Use Cases

  • Big Data Applications: Used in big data environments where scalability, high availability, and performance are critical, such as real-time analytics and IoT applications.
  • Time Series Data: Suitable for storing and analyzing time-series data, such as logs, sensor data, and financial transactions.
  • Content Management: Used in content management systems where flexible schema design and horizontal scalability are essential.
  • Messaging and Chat Applications: Powers messaging platforms and chat applications that require low-latency data access and high availability.
  • Recommendation Systems: Supports recommendation engines by efficiently storing and retrieving user data and preferences.

Apache Cassandra Ecosystem

  • Apache Cassandra: The core database system.
  • DataStax Enterprise: A commercial distribution of Cassandra with additional enterprise features.
  • Cassandra Query Language (CQL): SQL-like language for interacting with Cassandra.
  • Cassandra Drivers: Client libraries in various programming languages (e.g., Java, Python, Node.js) for application integration.
  • Cassandra Tools: Various tools for monitoring, management, and data modeling.

Summary

Apache Cassandra is a robust NoSQL database known for its distributed architecture, high availability, linear scalability, and schema flexibility. It is widely adopted in industries requiring scalable and fault-tolerant solutions for handling large-scale data operations.

Recommended Online Training: From 0 to 1: The Cassandra Distributed Database

886058 a6aeshow?id=oLRJ54lcVEg&offerid=1606991.886058&bids=1606991

Environment Specification

We are using a KVM based CentOS 8 virtual machine with following specification.

  • CPU – 3.4 Ghz (2 cores)
  • Memory – 2 GB
  • Storage – 20 GB
  • Operating System – CentOS Linux 8.2
  • Hostname – cassandra-01.centlinux.com
  • IP Address – 192.168.116.206 /24

Update your Linux Operating System

Connect with cassandra-01.centlinux.com as root user by using a ssh tool.

As a best practice, update existing software packages in your Linux operating system.

# dnf update -y

Verify version of active Linux kernel by using uname command.

# uname -r
4.18.0-193.6.3.el8_2.x86_64

Verify version of your Linux operating system.

# cat /etc/redhat-release
CentOS Linux release 8.2.2004 (Core)

Install Cassandra Yum Repository

Apache Software Foundation provides official yum repositories for each version of Cassandra software.

You are required to add the yum repository as mentioned at Cassandra download page.

Create a repo file by using vim text editor.

# vi /etc/yum.repos.d/cassandra.repo

Add following directives in this file.

[cassandra]
name=Apache Cassandra
baseurl=https://downloads.apache.org/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS

Here, 311x is the respective version of Apache Cassandra i.e. 3.11.

It is the latest version at the time of this writing. Therefore, we are using it. If you want to install any other version of Apache Cassandra then you should update the version number in repo file accordingly.

Build yum cache for newly installed yum repository. Accept GPG keys if asked to do so.

# dnf makecache
CentOS-8 - AppStream                            7.3 kB/s | 4.3 kB     00:00
CentOS-8 - Base                                 5.0 kB/s | 3.9 kB     00:00
CentOS-8 - Extras                               162  B/s | 1.5 kB     00:09
Apache Cassandra                                2.1 kB/s | 3.6 kB     00:01
Metadata cache created.

Apache Cassandra 3.11 yum repository has been installed on your Linux server.

Install Cassandra on CentOS 8

Apache Cassandra requires JVM (Java Virtual Machine) to run. Although, we can explicitly install Java on your Linux server, but if you install Cassandra software by using dnf command, it will automatically installs all required dependencies including Java.

Therefore, you should directly install Cassandra on CentOS 8 server by using dnf command.

# dnf install -y cassandra

cqlsh (Cassandra Query Language Shell) requires Python to run. Therefore, you are also required to install Python as well.

Currently, Apache Cassandra is only compatible with Python 2.7. Therefore, you need to install the same on your Linux server.

# dnf install -y python2

Cassandra service is SystemV based, therefore, you have to use the legacy commands to enable and start it.

# service cassandra start
Starting cassandra (via systemctl):                        [  OK  ]
# chkconfig cassandra on

Verify the status of cassandra.service.

# systemctl status cassandra.service
â cassandra.service - LSB: distributed storage system for structured data
   Loaded: loaded (/etc/rc.d/init.d/cassandra; generated)
   Active: active (running) since Sat 2020-08-01 11:18:50 PKT; 51s ago
     Docs: man:systemd-sysv-generator(8)
 Main PID: 48050 (java)
    Tasks: 50 (limit: 12331)
   Memory: 1.1G
   CGroup: /system.slice/cassandra.service
           ââ48050 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el8_2.x86_64>

Aug 01 11:18:46 cassandra-01.centlinux.com systemd[1]: Starting LSB: distribute>
Aug 01 11:18:46 cassandra-01.centlinux.com runuser[47978]: pam_unix(runuser:ses>
Aug 01 11:18:50 cassandra-01.centlinux.com cassandra[47966]: Starting Cassandra>
Aug 01 11:18:50 cassandra-01.centlinux.com systemd[1]: Started LSB: distributed>

Use the nodetool command to verify the status of your Cassandra cluster.

# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  70.01 KiB  256          100.0%            7d916cdb-8065-42d0-97c0-c88c68b68aa3  rack1

Apache Cassandra has been installed on your Linux server.

Configure NoSQL Database Security

Configuration files for Apache Cassandra are located in /etc/cassandra/conf directory.

It is a safe practice to take a backup of the original configuration file, before modifying it. Therefore, create a copy of the original cassandra.yaml configuration file as follows.

# cd /etc/cassandra/conf/
# cp cassandra.yaml cassandra.yaml.bkp

Now, edit this file by using vim text editor.

# vi /etc/cassandra/conf/cassandra.yaml

Locate following parameters in this file.

authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
roles_validity_in_ms: 2000
permissions_validity_in_ms: 2000

And update their values as follows.

authenticator: org.apache.cassandra.auth.PasswordAuthenticator
authorizer: org.apache.cassandra.auth.CassandraAuthorizer
roles_validity_in_ms: 0
permissions_validity_in_ms: 0

Restart Cassandra service to take changes into effect.

# systemctl restart cassandra.service

Create a Database Admin user

Connect to cqlsh prompt by using the Cassandra default username/password.

# cqlsh -u cassandra -p cassandra
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.7 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help. 
cassandra@cqlsh> 

Create a database admin user by executing following command.

cassandra@cqlsh> CREATE ROLE ahmer WITH PASSWORD = 'Ahmer@1234' AND SUPERUSER = true AND LOGIN = true;

Exit from cqlsh prompt.

cassandra@cqlsh> exit

Again connect to cqlsh as new admin user.

# cqlsh -u ahmer -p Ahmer@1234
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.7 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help. 
ahmer@cqlsh> 

For better security it is always advisable to remove/disable the default users. Therefore, revoke admin role and login permissions from cassendra user.

ahmer@cqlsh> ALTER ROLE cassandra WITH PASSWORD = 'cassandra' AND SUPERUSER = false AND LOGIN = false;

Revoke all permissions from cassendra user.

ahmer@cqlsh> REVOKE ALL PERMISSIONS ON ALL KEYSPACES FROM cassandra;

Grant all permissions to new admin user.

ahmer@cqlsh> GRANT ALL PERMISSIONS ON ALL KEYSPACES TO ahmer;

Exit from cqlsh prompt.

ahmer@cqlsh> exit

Apache Cassandra has been configured. It is now ready to become part of a Cassandra cluster.

Read Also: How to install Cassandra-web on CentOS 8

Conclusion

In above guide, you have learned how to install Cassandra on CentOS 8, you have also configured recommended security configurations as well. Cassandra: The Definitive Guide: Distributed Data at Web Scale 2nd Edition (PAID LINK) by Jeff Carpenter is a very good book and we strongly recommend that you should read it.

Leave a Reply