Multi Core Apache Solr on Ubuntu 10.04 for Drupal with Auto Provisioning

Apache Solr is an excellent full text index search engine based on Lucene. Solr is increasingly being used in the Drupal community for search. I use it for search for a lot of my projects. Recently Steve Edwards at Drupal Connect blogged about setting up a mutli core Solr server on Ubuntu 9.10 (aka Karmic). Ubuntu 10.04LTS was released a couple of months ago and it makes the process a bit easier, as Apache Solr 1.4 has been packaged. An additional advantage of using 10.04LTS is that it is supported until April 2015, whereas suppport for 9.10 ends in 10 months - April 2011.

As an added bonus in this howto you will be able to auto provision solr cores just by calling the right URL.

In this tutorial I will be using Jetty rather than tomcat which some tutorials recommend, as Jetty performs well and generally uses less resources.

Install Solr and Jetty

Installing jetty and Solr just requires a simple command

$ sudo apt-get install solr-jetty openjdk-6-jdk

This will pull down Solr and all of the dependencies, which can be alot if you have a very stripped down base server.

Configuring Jetty

Configuring Jetty is very straight forward. First we backup the existing /etc/default/jetty file like so:

sudo cp -a /etc/default/jetty /etc/default/jetty.bak

Then simply change your /etc/default/jetty to be like this (the changes are highlighted):

# Defaults for jetty see /etc/init.d/jetty for more

# change to 0 to allow Jetty to start

# change to 'no' or uncomment to use the default setting in /etc/default/rcS 

# Run Jetty as this user ID (default: jetty)
# Set this to an empty string to prevent Jetty from starting automatically

# Listen to connections from this network host (leave empty to accept all connections)
#Uncomment to restrict access to localhost
#JETTY_HOST=$(uname -n)

# The network port used by Jetty

# Timeout in seconds for the shutdown of all webapps

# Additional arguments to pass to Jetty    

# Extra options to pass to the JVM         
#JAVA_OPTIONS="-Xmx256m -Djava.awt.headless=true"

# Home of Java installation.

# The first existing directory is used for JAVA_HOME (if JAVA_HOME is not
# defined in /etc/default/jetty). Should contain a list of space separated directories.
#JDK_DIRS="/usr/lib/jvm/default-java /usr/lib/jvm/java-6-sun"

# Java compiler to use for translating JavaServer Pages (JSPs). You can use all
# compilers that are accepted by Ant's build.compiler property.

# Jetty uses a directory to store temporary files like unpacked webapps

# Jetty uses a config file to setup its boot classpath

# Default for number of days to keep old log files in /var/log/jetty/

If you don't include the JETTY_HOST entry Jetty will only bind to the local loopback interface, which is all you need if your drupal webserver is running on the same machine. If you set the JETTY_HOST make sure you configure your firewall to restrict access to the Solr server.

Configuring Solr

I am assuming you have already installed the Apache Solr module for Drupal somewhere. If you haven't, do that now, as you will need some config files which ship with it.

First we enable the multicore support in Solr by creating a file called /usr/share/solr/solr.xml with the following contents:

<solr persistent="true" sharedLib="lib">
 <cores adminPath="/admin/cores" shareSchema="true" adminHandler="au.com.davehall.solr.plugins.SolrCoreAdminHandler">

You need to make sure the file is owned by the jetty user if you want it to be dymanically updated, otherwise change persistent="true" to persistent="false", don't include the adminHandler attribute and don't run the commands below. Also if you want to auto provision cores you will need to download the jar file attached to this post and drop it into the /usr/share/solr/lib directory (which you'll need to create).

sudo chown jetty:jetty /usr/share/solr
sudo chown jetty:jetty /usr/share/solr/solr.xml
sudo chmod 640 /usr/share/solr/solr.xml
sudo mkdir /usr/share/solr/cores
sudo chown jetty:jetty /usr/share/solr/cores

To keep your configuration centralised, symlink the file from /usr/share/solr to /etc/solr. Don't do it the other way, Solr will ignore the symlink.

sudo ln -s /usr/share/solr/solr.xml /etc/solr/

Solr needs to be configured for Drupal. First we backup the existing config file, just in case, like so:

sudo mv /etc/solr/conf/schema.xml /etc/solr/conf/schema.orig.xml
sudo mv /etc/solr/conf/solrconfig.xml /etc/solr/conf/solrconfig.orig.xml

Now we copy the Drupal Solr config files from where you installed the module

sudo cp /path/to/drupal-install/sites/all/modules/contrib/apachesolr/{schema,solrconfig}.xml /etc/solr/conf/

Solr needs the path to exist for each core's data files, so we create them with the following commands:

sudo mkdir -p /var/lib/solr/cores/{,subdomain_}example_com/{data,conf}
sudo chown -R jetty:jetty /var/lib/solr/cores/{,subdomain_}example_com

Each of the cores need their own configuration files. We could implement some hacks to use a common set of configuration files, but that will make life more difficult if we ever have to migrate some of cores. Just copy the common configuration for all the cores:

sudo bash -c 'for core in /var/lib/solr/cores/*; do cp -a /etc/solr/conf/ $core/; done'

If everything is configured correctly, we should just be able to start Jetty like so:

sudo /etc/init.d/jetty start

If you visit http://solr.example.com:8080/solr/admin/cores?action=STATUS you should get some xml that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">0</int>
	<lst name="status"/>

If you get the above output everything is working properly

If you enabled auto provisioning of Solr cores, you should now be able to create your first core. Point your browser at http://solr.example.com:8080/solr/admin/cores?action=CREATE&name=test1&i... If it works you should get output similar to the following:

<?xml version="1.0" encoding="UTF-8"?>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">1561</int>
	<str name="core">test1</str>
	<str name="saved">/usr/share/solr/solr.xml</str>

I would recommend using identifiable names for your cores, so for davehall.com.au I would call the core, "davehall_com_au" so I can easily find it later on.

Security Note: As anyone who can access your server can now provision solr cores, make sure you restrict access to port 8080 to only allow access from trusted IP addresses.

For more information on the commands available, refer to the Solr Core Admin API documenation on the Solr wik.

Next in this series will be how to use this auto provisioning setup to allow aegir to provision solr cores as sites are created.

Site Refresh

Our site hasn't changed very much over the last 4 years, but the business has changed a lot. The biggest change was the (uneventful and long overdue) upgrade to Drupal 6 a few months ago.

During the last week or so the site has been updated and refocused. The major changes include:

This also signals our return to regular blogging. There are a few posts in the pipeline. There should be a good mix of drupal and sys admin posts in the coming weeks.

As always, feedback is welcome.

eBook Review: Theming Drupal: A First Timer’s Guide

My experience themeing Drupal, like most of my coding skills, have been developed by digging up useful resources on line and some trail and error. I have an interest in graphic design, but never really studied it. I can turn out sites which look good, but my "designs" don't have the polish of a professionally designed site. I own quite a few (dead tree) books on development and project management. Generally I like to read when I am sick of sitting in front of a screen. The only ebooks I consider reading are short ones.

Emma Jane Hogbin offered her Drupal theming ebook Theming Drupal: A First Timer’s Guide to her mailing list subscribers for free. I am not a big fan of vendor mailing lists, most of the time I scan the messages and hit delete before the bottom. In the case of Emma, rumour has it that it is really worthwhile to subscribe to her list - especially if you are a designer interested in themeing Drupal. Emma also offered free copies of her ebook to those who begged, so I subscribed and I begged.

The first thing I noticed about the book was the ducks on the front cover, I'm a sucker for cute animal pics. The ebook is derived from Emma's training courses and the book she coauthored with Konstantin Kaefer, Front End Drupal. Readers are assumed to have some experience with HTML, CSS and PHP. The book is pitched at designers and programmers who want to get into building themes for Drupal.

The reader is walked through building a complete Drupal theme. The writing is detailed and includes loads of references for obtaining additional information. It covers building a page theme, content type specific themeing and the various base themes available for Druapl. The book is a very useful resource for anyone working on a Drupal theme.

Although I have themed quite a few Drupal sites, Emma's guide taught me a few things. The book is a good read for anyone who wants to improve their knowledge of Drupal themeing. Now to finish reading Front End Drupal ...

First Impressions Motorola Dext and Drupal Editor for Android

Today I purchased a Motorola Dext (aka Cliq) from Optus. Overall I like it. It feels more polished than the Nokia N97 which I bought last year. The range of apps is good. Even though the phone only ships with Android 1.6, 2.1 for the Dext is due in Q3 2010.

The apps seem to run nice and fast. The responsive touch screen is bright and clear. I am yet to try to make a call on it from home, but the 3G data seems as fast as my Telstra 3G service, so the signal should be ok.

The keyboard is very functional, albeit cramped with my fat thumbs. The home screen is a little cluttered for my liking too, but it won't take much to clean that up. I will miss my funambol sync, which is only available for Android 2.x

I started writing this post using the Drupal Editor for Android app, which is pretty nice. The GPL app uses the XML-RPC and Drupal core's Blog API module. Overall it feels like a stripped down version of Bilbo/Blogilo. Drupal Editor is an example of an app which does one thing and does it simply but well. The only thing I haven't liked about it was when originally writing this post. I bumped the save button and published an incomplete and poorly written post. Next time I will untick the publish checkbox until I am ready to really publish it.

I would still like a HTC Desire, but Telstra is only offering them on a $65 plan with no value. The Nokia N900 was off my list, due to the USB port of death and Nokia's spam policies. The Nexus One was on the list too, but a local warranty was a consideration.

Solr Replication, Load Balancing, haproxy and Drupal

I use Apache Solr for search on several projects, including a few using Drupal. Solr has built in support for replication and load balancing, unfortunately the load balancing is done on the client side and works best when using a persistent connection, which doesn't make a lot of sense for php based webapps. In the case of Drupal, there has been a long discussion on a patch in the issue queue to enable Solr's native load balancing, but things seem to have stalled.

In one instance I have Solr replicating from the master to a slave, with the plan to add additional slaves if the load justifies it. In order to get Drupal to write to the master and read from either node I needed a proxy or load balancer. In my case the best lightweight http load balancer that would easily run on the web heads was haproxy. I could have run varnish in front of solr and had it do the load balancing but that seemed like overkill at this stage.

Now when an update request hits haproxy it directs it to the master, but for reads it balances the requests between the 2 nodes. To get this setup running on ubuntu 9.10 with haproxy 1.3.18, I used the following /etc/haproxy/haproxy.cfg on each of the web heads:

    log   local0
    log   local1 notice
    maxconn 4096
    nbproc 4
    user haproxy
    group haproxy

    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    maxconn 2000
    balance roundrobin
    stats enable
    stats uri /haproxy?stats

frontend solr_lb
    bind localhost:8080
    acl master_methods method POST DELETE PUT
    use_backend master_backend if master_methods
    default_backend read_backends

backend master_backend
    server solr-a weight 1 maxconn 512 check

backend slave_backend
    server solr-b weight 1 maxconn 512 check

backend read_backends
    server solr-a weight 1 maxconn 512 check
    server solr-b weight 1 maxconn 512 check

To ensure the configuration is working properly run

wget http://localhost:8080/solr -O -
on each of the web heads. If you get a connection refused message haproxy may not be running. If you get a 503 error make sure solr/jetty/tomcat is running on the solr nodes. If you get some html output which mentions Solr, then it should be working properly.

For Drupal's apachesolr module to use this configuration, simply set the hostname to localhost and the port to 8080 in the module configuration page. Rebuild your search index and you should be right to go.

If you had a lot of index updates then you could consider making the master write only and having 2 read only slaves, just change the IP addresses to point to the right hosts.

For more information on Solr replication refer to the Solr wiki, for more information on configuring haproxy refer to the manual. Thanks to Joe William and his blog post on load balancing couchdb using haproxy which helped me get the configuration I needed after I decided what I wanted.

Check Drupal Module Status Using Bash

When you run a lot of drupal sites it can be annoying to keep track of all of the modules contained in a platform and ensure all of them are up to date. One option is to setup a dummy site setup with all the modules installed and email notifications enabled, this is OK, but then you need to make sure you enable the additional modules every time you add something to your platform.

I wanted to be able to check the status of all of the modules in a given platform using the command line. I started scratching the itch by writing a simple shell script to use the drupal updates server to check for the status of all the modules. I kept on polishing it until I was happy with it, there are some bits of which are a little bit ugly, but that is mostly due to the limitations of bash. If I had to rewrite the it I would do it in PHP or some other language which understands arrays/lists and has http client and xml libraries.

The script supports excluding modules by using a extended grep regular expression pattern and nominating a major version of drupal. When there is a version mismatch it will be shown in bolded red, while modules where the versions match will be shown in green. The script filters out all dev and alpha releases, after all the script is designed for checking production sites. Adding support for per module update servers should be pretty easy to do, but I don't have modules to test this with.

To use the script, download it, save it somewhere handy, such as

, make it executable (run
chmod +x ~/bin/check-module-status.sh
). Now it is ready for you to run it -
~/bin/check-module-status.sh /path/to/drupal
and wait for the output.

Packaging Drush and Dependencies for Debian

Lately I have been trying to avoid non packaged software being installed on production servers. The main reason for this is to make it easier to apply updates. It also makes it easier to deploy new servers with meta packages when everything is pre packaged.

One tool which I am using a lot on production servers is Drupal's command line tool - drush. Drush is awesome it makes managing drupal sites so much easier, especially when it comes to applying updates. Drush is packaged for Debian testing, unstable and lenny backports by Antoine Beaupré (aka anarcat) and will be available in universe for ubuntu lucid. Drush depends on PEAR's Console_Table module and includes some code which automagically installs the dependency from PEAR CVS. The Debianised package includes the PEAR class in the package, which is handy, but if you are building your own debs from CVS or the nightly tarballs, the dependency isn't included. The auto installer only works if it can write to /path/to/drush/includes, which in these cases means calling drush as root, otherwise it spews a few errors about not being able to write the file then dies.

A more packaging friendly approach would be to build a debian package for PEAR Console_Table and have that as a dependency of the drush package in Debian. The problem with this approach is that drush currently only looks in /path/to/drush/includes for the PEAR class. I have submitted a patch which first checks if Table_Console has been installed via the PEAR installer (or other package management tool). Combine this with the Debian source package I have created for Table_Console (see the file attached at the bottom of the post), you can have a modular and apt managed instance of drush, without having to duplicate code.

I have discussed this approach with anarcat, he is supportive and hopefully it will be the approach adopted for drush 3.0.

Update The drush patch has been committed and should be included in 3.0alpha2.

Upcoming Book Reviews

Packt Publishing seem to have liked my review of Drupal 6 Javascript and jQuery, so much so they have asked me to review another title. On my return from linux.conf.au and Drupal South in New Zealand, a copy of the second edition of AJAX and PHP was waiting for me at the post office. I'll be reading and reviewing the book during February.

I will cover LCA and Drupal South in other blog posts once I have some time to sit down and reflect on the events. For now I will just gloat about winning a spot prize at Drupal South. I walked away with Emma Jane Hogbin and Konstantin Käfer's book, Front End Drupal. I've wanted to buy this title for a while, but shipping from the US made it a bit too pricey even with the strong Australian Dollar. I hope to start reading it in a few weeks, with a review to follow shortly after.

Got a book for me to review? I only read books in dead tree format as I mostly read when I want to get away from the screen. Feel free to contact me to discuss it further.

Updating all of your Drupal Sites at Once - aka Lazy Person's Aegir

Aegir is an excellent way to manage multi site drupal instances, but sometimes it can be a bit too heavy. For example if you have a handful of sites, it can be overkill to deploy aegir. If there is an urgent security fix and you have a lot of sites (I am talking 100s if not 1000s) to patch, waiting for aegir to migrate and verify all of your sites can be a little too slow.

For these situations I have a little script which I use to do the heavy lifting. I keep in ~/bin/update-all-sites and it has a single purpose, to update all of my drupal instances with a single command. Just like aegir, my script leverages drush, but unlike aegir there is no parachute, so if something breaks during the upgrade you get to keep all of the pieces. If you use this script, I would recommend always backing up all of your databases first - just in case.

I keep my "platforms" in svn, so before running the script I run a svn switch or svn update depending on how major the update is. If you are using git or bzr, you would do something similar first. If you aren't using any form of version control - I feel sorry for your clients.

So here is the code, it should be pretty self explanatory - if not ask questions via the comments.

#!/bin/sh # Update all drupal sites at once using drush - aka lazy person's aegir # # Written by Dave Hall # Copyright (c) 2009 Dave Hall Consulting http://davehall.com.au # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # Alternatively you may use and/or distribute it under the terms # of the CC-BY-SA license http://creativecommons.org/licenses/by-sa/3.0/ # Change this to point to your instance of drush isn't in your path DRUSH_CMD="drush" if [ $# != 1 ]; then     SCRIPT="`basename $0`"     echo "Usage: $SCRIPT path-to-drupal-install"     exit 1; fi SITES_PATH="$1" PWD=$(pwd) cd "$SITES_PATH/sites"; for site in `find ./ -maxdepth 1 -type d | cut -d/ -f2 | egrep -v '(.git|.bzr|.svn|all|^$)'`; do     if [ -f "${site}/settings.php" ]; then         echo updating $site         $DRUSH_CMD updatedb -y -l $site     fi done # Lets go back to where we started cd "$PWD"

OK, so my script isn't any where as awesome as aegir, but if you are lazy (or in a hurry) it can come in handy. Most of the time you will probably still want to use aegir.


Make sure you make the script executable (hint run chmod +x /path/to/update-all-sites)

If you don't have drush in your path, I would recommend you add it, but if you can't then change DRUSH_CMD="drush" to point to your instance of drush - such as DRUSH_CMD="/opt/drush/drush".

Thanks to Peter Lieverdink (aka cafuego) for suggesting the improved regex.

<?php print t('hello world'); ?>

My blog is now syndicated on Planet Drupal. I am very excited about this - thanks Simon.

For the last 8 years or so I have been running my own IT consulting business, focusing on free/open source software and web application development. My clients have range from micro businesses up to well known geek brands like SGI. Until recently I lead the phpGroupWare project.

My Drupal profile doesn't really give much of a hint about my involvement with Drupal. My biggest regret is not signing up for a d.o account sooner. I forget when I started using Drupal 4.7, but I liked it straight away. It was the first CMS which worked the way I thought a CMS should work.

Over time I have learned how to get Drupal to do what I want it to do. Due to the massive range of contrib modules I haven't got my hands very dirty hacking on Drupal - yet.

This year I have been involved in a major Drupal project which involves hosting around 2100 sites. Aegir has made a lot of this painless, especially with our 3,000 line install profile. Over the Christmas period I hope to find the time to blog about the setup, parts of it are pretty crazy.

I'll get around to upgrading my site to Drupal 6 one of these days when I get some time, that should coincide with a visual and content refresh. Feel free to check out some of my older Drupal related posts.