Solr Replication, Load Balancing, haproxy and Drupal

I use Apache Solr for search on several projects, including a few using Drupal. Solr has built in support for replication and load balancing, unfortunately the load balancing is done on the client side and works best when using a persistent connection, which doesn't make a lot of sense for php based webapps. In the case of Drupal, there has been a long discussion on a patch in the issue queue to enable Solr's native load balancing, but things seem to have stalled.

In one instance I have Solr replicating from the master to a slave, with the plan to add additional slaves if the load justifies it. In order to get Drupal to write to the master and read from either node I needed a proxy or load balancer. In my case the best lightweight http load balancer that would easily run on the web heads was haproxy. I could have run varnish in front of solr and had it do the load balancing but that seemed like overkill at this stage.

Now when an update request hits haproxy it directs it to the master, but for reads it balances the requests between the 2 nodes. To get this setup running on ubuntu 9.10 with haproxy 1.3.18, I used the following /etc/haproxy/haproxy.cfg on each of the web heads:

    log   local0
    log   local1 notice
    maxconn 4096
    nbproc 4
    user haproxy
    group haproxy

    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    maxconn 2000
    balance roundrobin
    stats enable
    stats uri /haproxy?stats

frontend solr_lb
    bind localhost:8080
    acl master_methods method POST DELETE PUT
    use_backend master_backend if master_methods
    default_backend read_backends

backend master_backend
    server solr-a weight 1 maxconn 512 check

backend slave_backend
    server solr-b weight 1 maxconn 512 check

backend read_backends
    server solr-a weight 1 maxconn 512 check
    server solr-b weight 1 maxconn 512 check

To ensure the configuration is working properly run

wget http://localhost:8080/solr -O -
on each of the web heads. If you get a connection refused message haproxy may not be running. If you get a 503 error make sure solr/jetty/tomcat is running on the solr nodes. If you get some html output which mentions Solr, then it should be working properly.

For Drupal's apachesolr module to use this configuration, simply set the hostname to localhost and the port to 8080 in the module configuration page. Rebuild your search index and you should be right to go.

If you had a lot of index updates then you could consider making the master write only and having 2 read only slaves, just change the IP addresses to point to the right hosts.

For more information on Solr replication refer to the Solr wiki, for more information on configuring haproxy refer to the manual. Thanks to Joe William and his blog post on load balancing couchdb using haproxy which helped me get the configuration I needed after I decided what I wanted.

Solr Replication, Load Balancing, haproxy and Drupal

Phine wrote:

Recently, I came across an interesting article which shared deep insights on built in concept of replication in Solr.U can refer to for detailed information.

Added Wed, 2010-03-31 23:51

Need to Clarify haproxy with multiple solr servers

Ashok wrote:


Please let us know how can configure haproxy with multiple solr servers and also let me know how can verify data are going to both servers.


Added Wed, 2013-07-17 17:21

security problem ahead

Glenn Plas wrote:

Be aware that anyone who can navigate to your solr core dashboard can actually wreck havoc and drop a core.

Added Tue, 2014-10-07 00:21

RE: security problem ahead

Dave wrote:

@Glenn the security issue you've identified is a problem for all Solr instances regardless of the use of haproxy. Various options are available for restricting access to the dashboard including jetty/tomcat/haproxy config or iptables.

Added Tue, 2014-10-07 05:49