sysadmin

Making it Easier to Spawn php-cgi on Debian and Ubuntu

Apache is a great web server, but sometimes I need something a bit more lightweight. I already have a bunch of sites using lighttpd, but I'm planning on switching them to nginx. Both nginx and lighttpd use FastCGI for running php. Getting FastCGI up and running on Ubuntu (or Debian) involves a bit of manual work which can slow down deployment.

The normal process to get nginx and php-cgi up and running is to install the spawn-fcgi package, create a shell script such as /usr/local/bin/php-fastcgi to launch it, then a custom init script, after making both of these executable you need to run update-rc.d then finally should be right to go. Each of these manual steps increases the likelihood of mistakes being made.

Instead, I created a deb contains a configurable init script. It is pretty simple, the init script calls spawn-fcgi with the appropriate arguments. All of the configuration is handled in /etc/default/php-fastcgi. The main config options are:

  • ENABLED - Enable (or disable) the init script. default: 0 (disabled)
  • ADDRESS - The IP address to bind to. default: 127.0.0.1
  • PORT - The TCP Port to bind to. default: 9000
  • USER - The user the php scripts will be excuted as. default: www-data
  • GROUP - The group the php scripts will be executed as. default: www-data
  • PHP_CGI - The full path to the php-cgi binary. default: /usr/bin/php5-cgi

The last 3 variables are not in the defaults file as I didn't think many users would want/need to change them, feel free to add them in if you need them.

Once you have set ENABLED to 1, launch the init scipt by executing sudo /etc/init.d/php-fastcgi start. To check that it is running run sudo netstat -nplt | grep 9000 and you should see /usr/bin/php5-cgi listed. Now you can continue to configure your webserver.

The package depends on php5-cgi and spawn-fcgi, which is available in Debian testing/squeeze, unstable/sid, along with Ubuntu karmic and lucid. For earlier versions of ubuntu you can change the dependency in debian/control from spawn-fcgi to lighttpd and disable lighttpd once it is installed so you can get spawn-fcgi . I haven't tested this approach and wouldn't recommend it.

You can grab the http://davehall.com.au/sites/davehall.com.au/files/php-fastcgi_0.1-1_all.deb">binary package and install it using dpkg or build it yourself from the source tarball.

For more information on setting up nginx using php-cgi I recommend the linode howto - just skip the "Configure spawn-fcgi" step :)

Solr Replication, Load Balancing, haproxy and Drupal

I use Apache Solr for search on several projects, including a few using Drupal. Solr has built in support for replication and load balancing, unfortunately the load balancing is done on the client side and works best when using a persistent connection, which doesn't make a lot of sense for php based webapps. In the case of Drupal, there has been a long discussion on a patch in the issue queue to enable Solr's native load balancing, but things seem to have stalled.

In one instance I have Solr replicating from the master to a slave, with the plan to add additional slaves if the load justifies it. In order to get Drupal to write to the master and read from either node I needed a proxy or load balancer. In my case the best lightweight http load balancer that would easily run on the web heads was haproxy. I could have run varnish in front of solr and had it do the load balancing but that seemed like overkill at this stage.

Now when an update request hits haproxy it directs it to the master, but for reads it balances the requests between the 2 nodes. To get this setup running on ubuntu 9.10 with haproxy 1.3.18, I used the following /etc/haproxy/haproxy.cfg on each of the web heads:

global
    log 127.0.0.1   local0
    log 127.0.0.1   local1 notice
    maxconn 4096
    nbproc 4
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    maxconn 2000
    balance roundrobin
    stats enable
    stats uri /haproxy?stats

frontend solr_lb
    bind localhost:8080
    acl master_methods method POST DELETE PUT
    use_backend master_backend if master_methods
    default_backend read_backends

backend master_backend
    server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check

backend slave_backend
    server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check

backend read_backends
    server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check
    server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check

To ensure the configuration is working properly run

wget http://localhost:8080/solr -O -
on each of the web heads. If you get a connection refused message haproxy may not be running. If you get a 503 error make sure solr/jetty/tomcat is running on the solr nodes. If you get some html output which mentions Solr, then it should be working properly.

For Drupal's apachesolr module to use this configuration, simply set the hostname to localhost and the port to 8080 in the module configuration page. Rebuild your search index and you should be right to go.

If you had a lot of index updates then you could consider making the master write only and having 2 read only slaves, just change the IP addresses to point to the right hosts.

For more information on Solr replication refer to the Solr wiki, for more information on configuring haproxy refer to the manual. Thanks to Joe William and his blog post on load balancing couchdb using haproxy which helped me get the configuration I needed after I decided what I wanted.

Check Drupal Module Status Using Bash

When you run a lot of drupal sites it can be annoying to keep track of all of the modules contained in a platform and ensure all of them are up to date. One option is to setup a dummy site setup with all the modules installed and email notifications enabled, this is OK, but then you need to make sure you enable the additional modules every time you add something to your platform.

I wanted to be able to check the status of all of the modules in a given platform using the command line. I started scratching the itch by writing a simple shell script to use the drupal updates server to check for the status of all the modules. I kept on polishing it until I was happy with it, there are some bits of which are a little bit ugly, but that is mostly due to the limitations of bash. If I had to rewrite the it I would do it in PHP or some other language which understands arrays/lists and has http client and xml libraries.

The script supports excluding modules by using a extended grep regular expression pattern and nominating a major version of drupal. When there is a version mismatch it will be shown in bolded red, while modules where the versions match will be shown in green. The script filters out all dev and alpha releases, after all the script is designed for checking production sites. Adding support for per module update servers should be pretty easy to do, but I don't have modules to test this with.

To use the script, download it, save it somewhere handy, such as

~/bin/check-module-status.sh
, make it executable (run
chmod +x ~/bin/check-module-status.sh
). Now it is ready for you to run it -
~/bin/check-module-status.sh /path/to/drupal
and wait for the output.

Howto Setup a Private Package Repository with reprepro and nginx

As the number of servers I am responsible for grows, I have been trying to eliminate all non packaged software in production. Although ubuntu and Debian have massive software repositories, there are some things which just aren't available yet or are internal meta packages. Once the packages are built they need to be deployed to servers. The simplest way to do this is to run a private apt repository. There are a few options for building an apt repository, but the most popular and simplest seems to be reprepro. I used Sander Marechal and Lionel Porcheron's reprepro howtos as a basis for getting my repository up and running.

nginx is a lightweight http server (and reverse proxy). It performs very well serving static files, which is perfect for a package repository. I also wanted to minimise the memory footprint of the server, which made nginx appealing.

To install the packages we need, run the following command:

$ sudo apt-get install reprepro nginx 

Then it is time to configure reprepro. First we create our directory structure:

$ sudo mkdir -p /srv/reprepro/ubuntu/{conf,dists,incoming,indices,logs,pool,project,tmp}
$ cd /srv/reprepro/ubuntu/
$ sudo chown -R `whoami` . # changes the repository owner to the current user

Now we need to create some configuration files.

/srv/reprepro/ubuntu/conf/distributions

Origin: Your Name
Label: Your repository name
Codename: karmic
Architectures: i386 amd64 source
Components: main
Description: Description of repository you are creating
SignWith: YOUR-KEY-ID

/srv/reprepro/ubuntu/conf/options

ask-passphrase
basedir .

If you have a package ready to load, add it using the following command:

$ reprepro includedeb karmic /path/to/my-package_0.1-1.deb \
# change /path/to/my-package_0.1-1.deb to the path to your package

Once reprepro is setup and you have some packages loaded, you need to make it so you can serve the files over http. I run an internal dns zone called "internal" and so the package server will be configured to respond to packages.internal. You may need to change the server_name value to match your own environment. Create a file called

/etc/nginx/sites-available/vhost-packages.conf
with the following content:

server {
  listen 80;
  server_name packages.internal;

  access_log /var/log/nginx/packages-access.log;
  error_log /var/log/nginx/packages-error.log;

  location / {
    root /srv/reprepro;
    index index.html;
  }

  location ~ /(.*)/conf {
    deny all;
  }

  location ~ /(.*)/db {
    deny all;
  }
}

Next we need to increase the server_names_hash_bucket_size. Create a file called

/etc/nginx/conf.d/server_names_hash_bucket_size.conf
which should just contain the following line:

server_names_hash_bucket_size 64;

Note: Many sites advocate sticking this value in the http section of the

/etc/nginx/nginx.conf config
file, but in Debian and Ubuntu
/etc/nginx/conf.d/*.conf
is included in the http section. I think my method is cleaner for upgrading and clearly delineates the stock and custom configuration.

To enable and activate the new virtual host run the following commands:

$ cd /etc/nginx/sites-enabled
$ sudo ln -s ../sites-available/packages.internal.conf .
$ sudo service nginx reload

You should get some output that looks like this

Reloading nginx configuration: the configuration file /etc/nginx/nginx.conf syntax is ok
configuration file /etc/nginx/nginx.conf test is successful
nginx.

Now you can add the new repository to your machines. I recommend creating a file called

/etc/apt/sources.list.d/packages.internal.list
and put the following line in the file:

To make the machine aware of the new repository and associated packages, simply run:

$ sudo apt-get update

That's it. Now you have a lightweight package repository with a lightweight webserver - perfect for running in a virtual machine. Depending on your setup you could probably get away with using 256Mb of RAM and a few gig of disk.

Packaging Drush and Dependencies for Debian

Lately I have been trying to avoid non packaged software being installed on production servers. The main reason for this is to make it easier to apply updates. It also makes it easier to deploy new servers with meta packages when everything is pre packaged.

One tool which I am using a lot on production servers is Drupal's command line tool - drush. Drush is awesome it makes managing drupal sites so much easier, especially when it comes to applying updates. Drush is packaged for Debian testing, unstable and lenny backports by Antoine Beaupré (aka anarcat) and will be available in universe for ubuntu lucid. Drush depends on PEAR's Console_Table module and includes some code which automagically installs the dependency from PEAR CVS. The Debianised package includes the PEAR class in the package, which is handy, but if you are building your own debs from CVS or the nightly tarballs, the dependency isn't included. The auto installer only works if it can write to /path/to/drush/includes, which in these cases means calling drush as root, otherwise it spews a few errors about not being able to write the file then dies.

A more packaging friendly approach would be to build a debian package for PEAR Console_Table and have that as a dependency of the drush package in Debian. The problem with this approach is that drush currently only looks in /path/to/drush/includes for the PEAR class. I have submitted a patch which first checks if Table_Console has been installed via the PEAR installer (or other package management tool). Combine this with the Debian source package I have created for Table_Console (see the file attached at the bottom of the post), you can have a modular and apt managed instance of drush, without having to duplicate code.

I have discussed this approach with anarcat, he is supportive and hopefully it will be the approach adopted for drush 3.0.

Update The drush patch has been committed and should be included in 3.0alpha2.