Enlightening - The Dark Art of Solr Search with Drupal
Why this blog post?
Often when I add a search function to a Drupal website using Apache Solr, I'm amazed at how complex some people think this is. Many developers/site builders are of the belief that this is some kind of very-hard-to-master black art. They could not be more wrong.
So what I want to contribute back to the Drupal community is an understanding of how Solr works, why/how it differs from Drupal Core Search module, and the benefits Solr has over core search.
Note: all Annertech hosting packages come with the ability for your site to use Apache Solr built-in.
The focus of this blog post will be for site builders. No code, and only a minimal amount of command line skill is required.
We will also look at the results from benchmarks which conclusively prove the performance benefit that can be obtained from using Apache Solr. And as we all know, performance benefit means more conversions and higher sales on your website/e-commerce platform.
I've used Solr in a variety of different projects over the past 4 years. Most recently on the Royal Museums Greenwich website where Solr is also being used to search non-Drupal content.
What is Solr
Solr is a blazingly fast open source enterprise search platform that can rival bigger name brands such as Google.
- Written in Java
- Easy to implement in a servlet container (Tomcat, Jetty)
- REST-like API
- Index content via XML, JSON, binary or CSV over HTTP
- Query it via HTTP GET and receive XML, JSON, CSV or binary results
- Advanced Full-Text Search Capabilities
When do you use Solr?
There are overall scenarios where you should consider using Solr.
- If searching is a major part/feature of your site.
- If finding content is mainly done by searching and you would like to offer an auto complete (like when you are using Google) and a "Did you mean"-feature (also, like Google), Solr is the way to go.
- If you need fine grained search control, like access control.
- If you want to use faceted search, Solr really starts to shine.
- If you have lots of content, which I'd say starts around 5,000 nodes, Solr's speed of returning results really overtakes Drupal's core search module.
Sound like a good fit? Here's the modules you'll need.
Here is a list of modules you might consider using when you want Drupal to connect to a Solr instance. There are a few more out there that might be the ones you need, but in my opinion, these are the best for beginners.
The Sarnia module is probably not needed for beginners, but the module is so cool that I had to mention it. It allows you to search in an index that was not created by Drupal. This means that you can have a 3rd party system indexing the content, and still display it in Drupal, using the views module.
Let's get started
Start a Solr instance
- Hurry, get a copy of Solr.
- Next, from your console, extract the downloaded package and enter the 'example' folder.
- Type 'java -jar start.jar' and the server will start running.
- You can verify this by visiting http://localhost:8983/solr/admin. You should see the Solr admin page.
The Drupal part
- Enable the 'Search API' and 'Search API Solr' modules mentioned earlier. Follow the installation instructions carefully. Especially 'Search API Solr' which contains information on how to copy configuration files from the module folder into the Solr server. Remember to restart the Solr instance after you copy the new configuration.
- Enable search_api_solr module
- Go to admin/config/search/search_api
- Add new server
- Add new index
- Configure what fields to index. These will be available from a the view used as a search page.
- Press the ‘Index now’ button. This will send the content to Solr instance.
- Have cake. This is a very important part! Do not skip under any circumstances.
The new search page
All you need to do now is create a view, using the new node index, set up your exposed filters (search for the fulltext field), and the fields to be displayed.
Visit your newly created search page, and watch Solr do it's magic.
Here's an example of a pretty simple search page, inspired by Drupal Core's search result page.
Here is a more complex one, using some exposed filters, to narrow down the search result.
The next image shows how a page, using the faceted search feature, could look.
But how does it perform?
DISCLAIMER: This is not a valid test, and the result is only used here to give a glimpse of what the difference might look like. The results will most likely look different on your environment. You can see the shell scripts I used to perform the test in the document folder I pointed out earlier. All tests were performed on a system where content was using cache.
I used a pretty standard Drupal setup for this test. A few content types, and some modules for SEO and some for making the content editing easier. What isn't ordinary about this Drupal installation is that is that it contains 70,000 nodes.
You can find the specs on the machine used, including server software, in the document folder I pointed out earlier, in the 'Search execution time' spreadsheet.
The results are pretty clear: Drupal core's search takes longer as you put more words in the query. This may seem fair enough, as it makes the searching more complex. But when using Solr you can't barely see any difference at all. It doesn't really matter if you search for 1 or 6 keywords. The response time for a page is almost the same.
Super extra mega bonus!
I've put together a Drupal installation with a Solr bundle. Feel free to download it to test it yourself. Don't forget to read the README.txt. You can say thanks by leaving a comment below.
About using Solr in a production environment
Solr for developers is quite easy to set up, but if you want to use it in a production environment, there are more things to take into account, such as security, performance optimisation, etc.
If you are not a skilled IT administrator I strongly recommend that you ally yourself with a team of experienced Solr maintainers/administrators, so they can help you set up things properly.