Solr Replication, Load Balancing, HAProxy and Drupal
I use Apache Solr for search on several projects, including a few with Drupal. Solr has built in support for replication and load balancing, unfortunately the load balancing is done on the client side and works best when using a persistent connection, which doesn’t make a lot of sense for PHP based webapps. In the case of Drupal, there has been a long discussion on a patch in the issue queue to enable Solr’s native load balancing, but things seem to have stalled.
In one instance I have Solr replicating from the primary to a secondary, with the plan to add additional secondary backends if the load justifies it. In order to get Drupal to write to the primary and read from either node I needed a proxy or load balancer. In my case the best lightweight http load balancer that would easily run on the web heads was haproxy. I could have run varnish in front of Solr and had it do the load balancing but that seemed like overkill at this stage.
Now when an update request hits HAProxy it directs it to the primary, but
for reads it balances the requests between the 2 nodes. To get this
setup running on ubuntu 9.10 with HAProxy 1.3.18, I used the following
/etc/haproxy/haproxy.cfg
on each of the web heads:
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
nbproc 4
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
maxconn 2000
balance roundrobin
stats enable
stats uri /haproxy?stats
frontend solr_lb
bind localhost:8080
acl primary_methods method POST DELETE PUT
use_backend primary_backend if primary_methods
default_backend read_backends
backend primary_backend
server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check
backend secondary_backend
server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check
backend read_backends
server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check
server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check
To ensure the configuration is working properly run
wget http://localhost:8080/solr -O -
on each of the web heads.
If you get a connection refused message HAProxy may not be running.
If you get a 503 error make sure solr/jetty/tomcat is running on
the Solr nodes. If you get some html output which mentions Solr,
then it should be working properly.
For Drupal’s apachesolr module to use this configuration, set the hostname to localhost and the port to 8080 in the module configuration page. Rebuild your search index and you should be right to go.
If you had a lot of index updates then you could consider making the primary write only and having 2 read only secondary back ends, just change the IP addresses to point to the right hosts.
For more information on Solr replication refer to the Solr wiki, for more information on configuring HAProxy refer to the manual. Thanks to Joe William and his blog post on load balancing CouchDB using haproxy which helped me get the configuration I needed after I decided what I wanted.