Learn step-by-step how to install Apache Solr Server on CentOS 8. This comprehensive guide covers prerequisites, installation commands, and configuration tips to get your Solr server up and running efficiently. #centlinux #linux #solr
Table of Contents
What is Apache Solr?
Apache Solr is an open-source search platform developed by the Apache Software Foundation, built on the robust Apache Lucene library. It is designed to handle large-scale search applications with high-performance and real-time indexing capabilities.
Key features of Solr include:
- Scalability and Flexibility: Solr is capable of indexing and searching large volumes of data quickly. It supports distributed searching and indexing, which allows it to scale horizontally by adding more servers to handle increased load.
- Powerful Full-Text Search: Solr provides advanced full-text search features such as phrase matching, wildcard search, fuzzy search, and more. It is highly optimized for text search with powerful query capabilities.
- Faceted Search and Filtering: Solr supports faceted search, which allows users to narrow down search results based on predefined categories. This is particularly useful in e-commerce and content management systems.
- Rich Document Handling: Solr can index and search a variety of document formats, including JSON, XML, CSV, and rich text formats such as PDF, Word, and more, making it versatile for different data sources.
- Real-Time Indexing: Solr supports near real-time indexing, enabling it to handle continuous data updates efficiently. This is essential for applications requiring up-to-date search results, such as news websites and social media platforms.
- Extensible and Customizable: Solr is highly extensible with a plugin architecture, allowing developers to customize its functionality to meet specific requirements. It also supports various languages and can be integrated with other big data tools.
- Administrative Tools: Solr includes a comprehensive set of administrative tools for managing indexes, monitoring server performance, and configuring search parameters through a web-based interface.
- Community and Ecosystem: As part of the Apache project, Solr benefits from a large and active community, providing extensive documentation, tutorials, and a wide range of plugins and integrations.
Overall, Apache Solr is a powerful and flexible search platform suitable for a wide range of applications, from simple websites to complex enterprise systems requiring advanced search capabilities.
Recommended Online Training: Introduction to Apache Solr
Apache Solr vs Elasticsearch
Apache Solr and Elasticsearch are both leading open-source search platforms built on the Apache Lucene library. They are designed to handle large-scale search and indexing operations efficiently. While they share many similarities, there are some key differences between them:
Apache Solr
- Architecture: Solr uses a more traditional, enterprise-centric architecture. It relies on a master-slave configuration for distributed search, which can be more complex to set up but is highly reliable.
- Configuration: Solr uses XML for its configuration files. This can be seen as more verbose but also allows for very fine-grained control over configurations.
- Community and Support: Solr has a large, active community with extensive documentation and support from the Apache Software Foundation.
- Query Capabilities: Solr offers powerful query capabilities, including advanced faceting and filtering. It is often praised for its robust feature set for complex search applications.
- Integration: Solr integrates well with Hadoop and other big data platforms, making it a good choice for applications within the Hadoop ecosystem.
- SolrCloud: For distributed search, Solr uses SolrCloud, which provides features like distributed indexing, replication, and automatic failover.
Elasticsearch
- Architecture: Elasticsearch uses a modern, distributed architecture that makes it easier to scale horizontally. It employs a peer-to-peer configuration, which simplifies cluster management.
- Configuration: Elasticsearch uses JSON for configuration and communication, which is often seen as simpler and more user-friendly.
- Real-Time Indexing: Elasticsearch is known for its near real-time indexing capabilities, making it ideal for applications that require rapid updates and low-latency search.
- Community and Ecosystem: Elasticsearch has a very active community and is backed by Elastic NV, which provides commercial support and enterprise features through the Elastic Stack (ELK Stack: Elasticsearch, Logstash, Kibana).
- Query DSL: Elasticsearch offers a rich query DSL (Domain Specific Language) that allows for complex search queries to be constructed in a very flexible manner.
- Plugins and Integrations: Elasticsearch has a robust ecosystem with many plugins and integrations available, making it versatile for various use cases, including logging, monitoring, and analytics.
Comparison Summary
- Scalability: Both Solr and Elasticsearch scale well, but Elasticsearch’s peer-to-peer architecture can be easier to manage.
- Configuration: Solr’s XML configuration offers fine-grained control, while Elasticsearch’s JSON configuration is more straightforward and user-friendly.
- Real-Time Capabilities: Elasticsearch generally has an edge in real-time indexing and search.
- Feature Set: Solr’s feature set for complex search and faceting is very robust, whereas Elasticsearch offers flexibility with its query DSL and extensive plugin ecosystem.
- Support and Ecosystem: Solr benefits from strong support within the Apache community, while Elasticsearch has extensive commercial support and a comprehensive ecosystem through Elastic Stack.
Choosing between Solr and Elasticsearch depends on specific requirements, including the need for real-time indexing, ease of configuration, scalability needs, and the desired ecosystem of tools and integrations. Both are powerful tools that excel in different areas and can be tailored to a wide range of search applications.
Read Also: How to install Elasticsearch on Rocky Linux 9
Environment Specification
We are using a minimal CentOS 8 KVM machine with following specifications.
- CPU – 3.4 Ghz (2 cores)
- Memory – 2 GB
- Storage – 20 GB
- Operating System – CentOS 8.2
- Hostname – solr-01.centlinux.com
- IP Address – 192.168.116.230 /24
Update your Linux Server
Connect with solr-01.centlinux.com as root user by using a ssh client.
By using dnf command, update software packages in your Linux server.
# dnf update -y
Verify the Linux operating system and Kernel version.
# uname -r 4.18.0-193.28.1.el8_2.x86_64 # cat /etc/redhat-release CentOS Linux release 8.2.2004 (Core)
Install OpenJDK on Linux Server
Apache Solr is written in Java programming language, therefore it requires Java Development Kit (JDK) 8 or later to run enterprise search services.
OpenJDK is available in standard yum repository and can be installed easily. Alternatively, you can also install Oracle Java SE on your Linux server.
For the sake of simplicity, we are installing OpenJDK 11 on the Linux server.
# dnf install -y java-11-openjdk
After successful installation, verify the Java version.
# java -version openjdk version "11.0.9" 2020-10-20 LTS OpenJDK Runtime Environment 18.9 (build 11.0.9+11-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.9+11-LTS, mixed mode, sharing)
OpenJDK has been installed on your Linux server.
Install Apache Solr Server on CentOS 8
You can download Apache Solr from Github or their official website.
From official download page, copy the URL of your required version of Apache Solr software and then use wget command to download it.
# cd /tmp # wget https://downloads.apache.org/lucene/solr/8.7.0/solr-8.7.0.tgz
Extract the installation script from downloaded tarball as follows.
# tar xf solr-8.7.0.tgz solr-8.7.0/bin/install_solr_service.sh --strip-components=2
Now, execute the extracted installation script to install Apache Solr Server on your Linux machine.
# ./install_solr_service.sh solr-8.7.0.tgz We recommend installing the 'lsof' command for more stable start/stop of Solr id: âsolrâ: no such user Creating new user: solr Extracting solr-8.7.0.tgz to /opt Installing symlink /opt/solr -> /opt/solr-8.7.0 ... Installing /etc/init.d/solr script ... Installing /etc/default/solr.in.sh ... Service solr installed. Customize Solr startup configuration in /etc/default/solr.in.sh *** [WARN] *** Your open file limit is currently 1024. It should be set to 65000 to avoid operational disruption. If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh *** [WARN] *** Your Max Processes Limit is currently 3674. It should be set to 65000 to avoid operational disruption. If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh NOTE: Please install lsof as this script needs it to determine if Solr is listening on port 8983. Started Solr server on port 8983 (pid=2241). Happy searching! Found 1 Solr nodes: Solr process 2241 running on port 8983 Solr at http://localhost:8983/solr not online.
Don’t worry about the above warnings, we will rectify them one by one.
Post Installation Configurations
Install lsof software package as required by the Apache Solr.
# dnf install -y lsof
Enable Solr search service by using following Linux command.
# systemctl enable solr solr.service is not a native service, redirecting to systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install enable solr
Verify that the Solr search service is running on default port 8983.
# ss -tulpn | grep 8983 tcp LISTEN 0 50 *:8983 *:* users:(("java",pid=2241,fd=153))
To rectify the warnings during startup of service due to File and Process limits. You need to define the security limits as required by the Apache Solr Enterprise Search Server.
Open limits.conf file in vim editor.
# vi /etc/security/limits.conf
And add following directives in this file.
solr soft nofile 65536 solr hard nofile 65536 solr soft nproc 65536 solr hard nproc 65536
Restart the Solr service using legacy service command. There will be no warnings this time.
# service solr restart Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 4524 to stop gracefully. Waiting up to 180 seconds to see Solr running on port 8983 [/] Started Solr server on port 8983 (pid=4865). Happy searching!
Configure Linux Firewall
Apache Solr uses default network port 8983/tcp. Therefore, you need to allow this port in Linux firewall.
# firewall-cmd --permanent --add-port=8983/tcp success # firewall-cmd --reload success
Create Apache Solr Collection
Create an example solar collection in enterprise search server.
# su - solr -c "/opt/solr/bin/solr create -c testcol1 -n data_driven_schema_configs" Created new core 'testcol1'
Open URL http://192.168.116.230:8983/solr/ in a client browser.
You are now at the dashboard of the Apache Solr web UI. You can check the recently created collection by selecting it from the drop-down box in left-side pane.
Have a look at Mastering Apache Solr 7.x: An expert guide to advancing, optimizing, and scaling your enterprise search (PAID LINK) by Packt Publishing.
Read Also: How to install Jetty on CentOS 7
Final Thoughts
Installing Apache Solr Server on CentOS 8 can significantly enhance your search capabilities. By following this guide, you should now have a fully functional Solr server up and running. If you encounter any issues or need professional assistance with your Linux server, feel free to reach out to me on Fiverr for expert Linux administration services: Linux Administrator.