haproxy

Tricks to Running HAProxy on pfSense Embedded

HAProxy is available as an addon module for pfSense 1.2.3. This makes it really easy to have pfSense control the gateway and load balancing. There are a couple of tricks to getting it all up and running.

Although everything looked good in the webgui HAProxy just wouldn't start. After logging in it seemed that there were 2 problems, firstly as mentioned in the forums the IP addresses must be an interface or CARP addresses not Virtual IPs for HAProxy to work and secondly the file descriptor limits have to be increased. To increase the file descriptor limits run the following commands from a shell on pfSense.

mount -o rw /dev/ufs/pfsense1  /
echo >> /etc/sysctl.conf
echo '# File descriptor limits for HAProxy' >> /etc/sysctl.conf
kern.maxfiles=2000011 >> /etc/sysctl.conf
kern.maxfilesperproc=2000011 >> /etc/sysctl.conf
sysctl kern.maxfiles=2000011
sysctl kern.maxfilesperproc=2000011
mount -o ro /dev/ufs/pfsense1  /

The mount commands are only needed if running on embedded pfSense to make the CF card writeable while we make the changes then make it read only again once we are done. The echo commands add the new limits to /etc/sysctl.conf so the settings persist and the sysctl commands make them apply now.

I haven't tested to see if the file descriptor issue effects the non embedded version of pfSense, feel free to let me (and others know) via the comments.

Solr Replication, Load Balancing, haproxy and Drupal

I use Apache Solr for search on several projects, including a few using Drupal. Solr has built in support for replication and load balancing, unfortunately the load balancing is done on the client side and works best when using a persistent connection, which doesn't make a lot of sense for php based webapps. In the case of Drupal, there has been a long discussion on a patch in the issue queue to enable Solr's native load balancing, but things seem to have stalled.

In one instance I have Solr replicating from the master to a slave, with the plan to add additional slaves if the load justifies it. In order to get Drupal to write to the master and read from either node I needed a proxy or load balancer. In my case the best lightweight http load balancer that would easily run on the web heads was haproxy. I could have run varnish in front of solr and had it do the load balancing but that seemed like overkill at this stage.

Now when an update request hits haproxy it directs it to the master, but for reads it balances the requests between the 2 nodes. To get this setup running on ubuntu 9.10 with haproxy 1.3.18, I used the following /etc/haproxy/haproxy.cfg on each of the web heads:

global
    log 127.0.0.1   local0
    log 127.0.0.1   local1 notice
    maxconn 4096
    nbproc 4
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    maxconn 2000
    balance roundrobin
    stats enable
    stats uri /haproxy?stats

frontend solr_lb
    bind localhost:8080
    acl master_methods method POST DELETE PUT
    use_backend master_backend if master_methods
    default_backend read_backends

backend master_backend
    server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check

backend slave_backend
    server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check

backend read_backends
    server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check
    server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check

To ensure the configuration is working properly run

wget http://localhost:8080/solr -O -
on each of the web heads. If you get a connection refused message haproxy may not be running. If you get a 503 error make sure solr/jetty/tomcat is running on the solr nodes. If you get some html output which mentions Solr, then it should be working properly.

For Drupal's apachesolr module to use this configuration, simply set the hostname to localhost and the port to 8080 in the module configuration page. Rebuild your search index and you should be right to go.

If you had a lot of index updates then you could consider making the master write only and having 2 read only slaves, just change the IP addresses to point to the right hosts.

For more information on Solr replication refer to the Solr wiki, for more information on configuring haproxy refer to the manual. Thanks to Joe William and his blog post on load balancing couchdb using haproxy which helped me get the configuration I needed after I decided what I wanted.