Share on Social Media

Learn step-by-step how to install Apache Solr Server on CentOS 8. This comprehensive guide covers prerequisites, installation commands, and configuration tips to get your Solr server up and running efficiently. #centlinux #linux #solr

What is Apache Solr?

Apache Solr is an open-source search platform developed by the Apache Software Foundation, built on the robust Apache Lucene library. It is designed to handle large-scale search applications with high-performance and real-time indexing capabilities.

Key features of Solr include:

  1. Scalability and Flexibility: Solr is capable of indexing and searching large volumes of data quickly. It supports distributed searching and indexing, which allows it to scale horizontally by adding more servers to handle increased load.
  2. Powerful Full-Text Search: Solr provides advanced full-text search features such as phrase matching, wildcard search, fuzzy search, and more. It is highly optimized for text search with powerful query capabilities.
  3. Faceted Search and Filtering: Solr supports faceted search, which allows users to narrow down search results based on predefined categories. This is particularly useful in e-commerce and content management systems.
  4. Rich Document Handling: Solr can index and search a variety of document formats, including JSON, XML, CSV, and rich text formats such as PDF, Word, and more, making it versatile for different data sources.
  5. Real-Time Indexing: Solr supports near real-time indexing, enabling it to handle continuous data updates efficiently. This is essential for applications requiring up-to-date search results, such as news websites and social media platforms.
  6. Extensible and Customizable: Solr is highly extensible with a plugin architecture, allowing developers to customize its functionality to meet specific requirements. It also supports various languages and can be integrated with other big data tools.
  7. Administrative Tools: Solr includes a comprehensive set of administrative tools for managing indexes, monitoring server performance, and configuring search parameters through a web-based interface.
  8. Community and Ecosystem: As part of the Apache project, Solr benefits from a large and active community, providing extensive documentation, tutorials, and a wide range of plugins and integrations.

Overall, Apache Solr is a powerful and flexible search platform suitable for a wide range of applications, from simple websites to complex enterprise systems requiring advanced search capabilities.

Recommended Online Training: Introduction to Apache Solr

3029220 f19e 2show?id=oLRJ54lcVEg&offerid=1606991.3029220&bids=1606991

Apache Solr vs Elasticsearch

Apache Solr and Elasticsearch are both leading open-source search platforms built on the Apache Lucene library. They are designed to handle large-scale search and indexing operations efficiently. While they share many similarities, there are some key differences between them:

Apache Solr

  • Architecture: Solr uses a more traditional, enterprise-centric architecture. It relies on a master-slave configuration for distributed search, which can be more complex to set up but is highly reliable.
  • Configuration: Solr uses XML for its configuration files. This can be seen as more verbose but also allows for very fine-grained control over configurations.
  • Community and Support: Solr has a large, active community with extensive documentation and support from the Apache Software Foundation.
  • Query Capabilities: Solr offers powerful query capabilities, including advanced faceting and filtering. It is often praised for its robust feature set for complex search applications.
  • Integration: Solr integrates well with Hadoop and other big data platforms, making it a good choice for applications within the Hadoop ecosystem.
  • SolrCloud: For distributed search, Solr uses SolrCloud, which provides features like distributed indexing, replication, and automatic failover.

Elasticsearch

  • Architecture: Elasticsearch uses a modern, distributed architecture that makes it easier to scale horizontally. It employs a peer-to-peer configuration, which simplifies cluster management.
  • Configuration: Elasticsearch uses JSON for configuration and communication, which is often seen as simpler and more user-friendly.
  • Real-Time Indexing: Elasticsearch is known for its near real-time indexing capabilities, making it ideal for applications that require rapid updates and low-latency search.
  • Community and Ecosystem: Elasticsearch has a very active community and is backed by Elastic NV, which provides commercial support and enterprise features through the Elastic Stack (ELK Stack: Elasticsearch, Logstash, Kibana).
  • Query DSL: Elasticsearch offers a rich query DSL (Domain Specific Language) that allows for complex search queries to be constructed in a very flexible manner.
  • Plugins and Integrations: Elasticsearch has a robust ecosystem with many plugins and integrations available, making it versatile for various use cases, including logging, monitoring, and analytics.

Comparison Summary

  • Scalability: Both Solr and Elasticsearch scale well, but Elasticsearch’s peer-to-peer architecture can be easier to manage.
  • Configuration: Solr’s XML configuration offers fine-grained control, while Elasticsearch’s JSON configuration is more straightforward and user-friendly.
  • Real-Time Capabilities: Elasticsearch generally has an edge in real-time indexing and search.
  • Feature Set: Solr’s feature set for complex search and faceting is very robust, whereas Elasticsearch offers flexibility with its query DSL and extensive plugin ecosystem.
  • Support and Ecosystem: Solr benefits from strong support within the Apache community, while Elasticsearch has extensive commercial support and a comprehensive ecosystem through Elastic Stack.

Choosing between Solr and Elasticsearch depends on specific requirements, including the need for real-time indexing, ease of configuration, scalability needs, and the desired ecosystem of tools and integrations. Both are powerful tools that excel in different areas and can be tailored to a wide range of search applications.

Read Also: How to install Elasticsearch on Rocky Linux 9

Environment Specification

We are using a minimal CentOS 8 KVM machine with following specifications.

  • CPU – 3.4 Ghz (2 cores)
  • Memory – 2 GB
  • Storage – 20 GB
  • Operating System – CentOS 8.2
  • Hostname – solr-01.centlinux.com
  • IP Address – 192.168.116.230 /24

Update your Linux Server

Connect with solr-01.centlinux.com as root user by using a ssh client.

By using dnf command, update software packages in your Linux server.

# dnf update -y

Verify the Linux operating system and Kernel version.

# uname -r
4.18.0-193.28.1.el8_2.x86_64

# cat /etc/redhat-release
CentOS Linux release 8.2.2004 (Core)

Install OpenJDK on Linux Server

Apache Solr is written in Java programming language, therefore it requires Java Development Kit (JDK) 8 or later to run enterprise search services.

OpenJDK is available in standard yum repository and can be installed easily. Alternatively, you can also install Oracle Java SE on your Linux server.

For the sake of simplicity, we are installing OpenJDK 11 on the Linux server.

# dnf install -y java-11-openjdk

After successful installation, verify the Java version.

# java -version
openjdk version "11.0.9" 2020-10-20 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.9+11-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.9+11-LTS, mixed mode, sharing)

OpenJDK has been installed on your Linux server.

Install Apache Solr Server on CentOS 8

You can download Apache Solr from Github or their official website.

Apache Solr Downloads
Apache Solr Downloads

From official download page, copy the URL of your required version of Apache Solr software and then use wget command to download it.

# cd /tmp
# wget https://downloads.apache.org/lucene/solr/8.7.0/solr-8.7.0.tgz
Download Apache Solr
Download Apache Solr

Extract the installation script from downloaded tarball as follows.

# tar xf solr-8.7.0.tgz solr-8.7.0/bin/install_solr_service.sh --strip-components=2

Now, execute the extracted installation script to install Apache Solr Server on your Linux machine.

# ./install_solr_service.sh solr-8.7.0.tgz
We recommend installing the 'lsof' command for more stable start/stop of Solr
id: âsolrâ: no such user
Creating new user: solr

Extracting solr-8.7.0.tgz to /opt


Installing symlink /opt/solr -> /opt/solr-8.7.0 ...


Installing /etc/init.d/solr script ...


Installing /etc/default/solr.in.sh ...

Service solr installed.
Customize Solr startup configuration in /etc/default/solr.in.sh
*** [WARN] *** Your open file limit is currently 1024.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
*** [WARN] ***  Your Max Processes Limit is currently 3674.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
NOTE: Please install lsof as this script needs it to determine if Solr is listening on port 8983.

Started Solr server on port 8983 (pid=2241). Happy searching!


Found 1 Solr nodes:

Solr process 2241 running on port 8983
Solr at http://localhost:8983/solr not online.

Don’t worry about the above warnings, we will rectify them one by one.

Post Installation Configurations

Install lsof software package as required by the Apache Solr.

# dnf install -y lsof

Enable Solr search service by using following Linux command.

# systemctl enable solr
solr.service is not a native service, redirecting to systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable solr

Verify that the Solr search service is running on default port 8983.

# ss -tulpn | grep 8983
tcp    LISTEN  0       50                         *:8983                *:*      users:(("java",pid=2241,fd=153))

To rectify the warnings during startup of service due to File and Process limits. You need to define the security limits as required by the Apache Solr Enterprise Search Server.

Open limits.conf file in vim editor.

# vi /etc/security/limits.conf

And add following directives in this file.

solr   soft   nofile   65536
solr   hard   nofile   65536
solr   soft   nproc    65536
solr   hard   nproc    65536

Restart the Solr service using legacy service command. There will be no warnings this time.

# service solr restart
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 4524 to stop gracefully.
Waiting up to 180 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=4865). Happy searching!

Configure Linux Firewall

Apache Solr uses default network port 8983/tcp. Therefore, you need to allow this port in Linux firewall.

# firewall-cmd --permanent --add-port=8983/tcp
success
# firewall-cmd --reload
success

Create Apache Solr Collection

Create an example solar collection in enterprise search server.

# su - solr -c "/opt/solr/bin/solr create -c testcol1 -n data_driven_schema_configs"

Created new core 'testcol1'

Open URL http://192.168.116.230:8983/solr/ in a client browser.

Apache Solr Server Dashboard
Apache Solr Server Dashboard

You are now at the dashboard of the Apache Solr web UI. You can check the recently created collection by selecting it from the drop-down box in left-side pane.

Apache Solr Collections
Apache Solr Collections

Have a look at Mastering Apache Solr 7.x: An expert guide to advancing, optimizing, and scaling your enterprise search (PAID LINK) by Packt Publishing.

Read Also: How to install Jetty on CentOS 7

Final Thoughts

Installing Apache Solr Server on CentOS 8 can significantly enhance your search capabilities. By following this guide, you should now have a fully functional Solr server up and running. If you encounter any issues or need professional assistance with your Linux server, feel free to reach out to me on Fiverr for expert Linux administration services: Linux Administrator.

Leave a Reply