Multi Core Apache Solr on Ubuntu 10.04 for Drupal with Auto Provisioning

Apache Solr is an excellent full text index search engine based on Lucene. Solr is increasingly being used in the Drupal community for search. I use it for search for a lot of my projects. Recently Steve Edwards at Drupal Connect blogged about setting up a mutli core Solr server on Ubuntu 9.10 (aka Karmic). Ubuntu 10.04LTS was released a couple of months ago and it makes the process a bit easier, as Apache Solr 1.4 has been packaged. An additional advantage of using 10.04LTS is that it is supported until April 2015, whereas suppport for 9.10 ends in 10 months - April 2011.

As an added bonus in this howto you will be able to auto provision solr cores just by calling the right URL.

In this tutorial I will be using Jetty rather than tomcat which some tutorials recommend, as Jetty performs well and generally uses less resources.

Install Solr and Jetty

Installing jetty and Solr just requires a simple command

$ sudo apt-get install solr-jetty openjdk-6-jdk

This will pull down Solr and all of the dependencies, which can be alot if you have a very stripped down base server.

Configuring Jetty

Configuring Jetty is very straight forward. First we backup the existing /etc/default/jetty file like so:

sudo cp -a /etc/default/jetty /etc/default/jetty.bak

Then simply change your /etc/default/jetty to be like this (the changes are highlighted):

# Defaults for jetty see /etc/init.d/jetty for more

# change to 0 to allow Jetty to start
NO_START=0
#NO_START=1

# change to 'no' or uncomment to use the default setting in /etc/default/rcS 
VERBOSE=yes

# Run Jetty as this user ID (default: jetty)
# Set this to an empty string to prevent Jetty from starting automatically
#JETTY_USER=jetty

# Listen to connections from this network host (leave empty to accept all connections)
#Uncomment to restrict access to localhost
#JETTY_HOST=$(uname -n)
JETTY_HOST=solr.example.com

# The network port used by Jetty
#JETTY_PORT=8080

# Timeout in seconds for the shutdown of all webapps
#JETTY_SHUTDOWN=30

# Additional arguments to pass to Jetty    
#JETTY_ARGS=

# Extra options to pass to the JVM         
#JAVA_OPTIONS="-Xmx256m -Djava.awt.headless=true"

# Home of Java installation.
#JAVA_HOME=

# The first existing directory is used for JAVA_HOME (if JAVA_HOME is not
# defined in /etc/default/jetty). Should contain a list of space separated directories.
#JDK_DIRS="/usr/lib/jvm/default-java /usr/lib/jvm/java-6-sun"

# Java compiler to use for translating JavaServer Pages (JSPs). You can use all
# compilers that are accepted by Ant's build.compiler property.
#JSP_COMPILER=jikes

# Jetty uses a directory to store temporary files like unpacked webapps
#JETTY_TMP=/var/cache/jetty

# Jetty uses a config file to setup its boot classpath
#JETTY_START_CONFIG=/etc/jetty/start.config

# Default for number of days to keep old log files in /var/log/jetty/
#LOGFILE_DAYS=14

If you don't include the JETTY_HOST entry Jetty will only bind to the local loopback interface, which is all you need if your drupal webserver is running on the same machine. If you set the JETTY_HOST make sure you configure your firewall to restrict access to the Solr server.

Configuring Solr

I am assuming you have already installed the Apache Solr module for Drupal somewhere. If you haven't, do that now, as you will need some config files which ship with it.

First we enable the multicore support in Solr by creating a file called /usr/share/solr/solr.xml with the following contents:

<solr persistent="true" sharedLib="lib">
 <cores adminPath="/admin/cores" shareSchema="true" adminHandler="au.com.davehall.solr.plugins.SolrCoreAdminHandler">
 </cores>
</solr>

You need to make sure the file is owned by the jetty user if you want it to be dymanically updated, otherwise change persistent="true" to persistent="false", don't include the adminHandler attribute and don't run the commands below. Also if you want to auto provision cores you will need to download the jar file attached to this post and drop it into the /usr/share/solr/lib directory (which you'll need to create).

sudo chown jetty:jetty /usr/share/solr
sudo chown jetty:jetty /usr/share/solr/solr.xml
sudo chmod 640 /usr/share/solr/solr.xml
sudo mkdir /usr/share/solr/cores
sudo chown jetty:jetty /usr/share/solr/cores

To keep your configuration centralised, symlink the file from /usr/share/solr to /etc/solr. Don't do it the other way, Solr will ignore the symlink.

sudo ln -s /usr/share/solr/solr.xml /etc/solr/

Solr needs to be configured for Drupal. First we backup the existing config file, just in case, like so:

sudo mv /etc/solr/conf/schema.xml /etc/solr/conf/schema.orig.xml
sudo mv /etc/solr/conf/solrconfig.xml /etc/solr/conf/solrconfig.orig.xml

Now we copy the Drupal Solr config files from where you installed the module

sudo cp /path/to/drupal-install/sites/all/modules/contrib/apachesolr/{schema,solrconfig}.xml /etc/solr/conf/

Solr needs the path to exist for each core's data files, so we create them with the following commands:

sudo mkdir -p /var/lib/solr/cores/{,subdomain_}example_com/{data,conf}
sudo chown -R jetty:jetty /var/lib/solr/cores/{,subdomain_}example_com

Each of the cores need their own configuration files. We could implement some hacks to use a common set of configuration files, but that will make life more difficult if we ever have to migrate some of cores. Just copy the common configuration for all the cores:

sudo bash -c 'for core in /var/lib/solr/cores/*; do cp -a /etc/solr/conf/ $core/; done'

If everything is configured correctly, we should just be able to start Jetty like so:

sudo /etc/init.d/jetty start

If you visit http://solr.example.com:8080/solr/admin/cores?action=STATUS you should get some xml that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">0</int>
	</lst>
	<lst name="status"/>
</response>

If you get the above output everything is working properly

If you enabled auto provisioning of Solr cores, you should now be able to create your first core. Point your browser at http://solr.example.com:8080/solr/admin/cores?action=CREATE&name=test1&i... If it works you should get output similar to the following:

<?xml version="1.0" encoding="UTF-8"?>
<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">1561</int>
	</lst>
	<str name="core">test1</str>
	<str name="saved">/usr/share/solr/solr.xml</str>
</response>

I would recommend using identifiable names for your cores, so for davehall.com.au I would call the core, "davehall_com_au" so I can easily find it later on.

Security Note: As anyone who can access your server can now provision solr cores, make sure you restrict access to port 8080 to only allow access from trusted IP addresses.

For more information on the commands available, refer to the Solr Core Admin API documenation on the Solr wik.

Next in this series will be how to use this auto provisioning setup to allow aegir to provision solr cores as sites are created.

AttachmentSize
dhc-solr-plugins.jar3.31 KB
dhc-solr-plugins-src.tar.gz2.41 KB

difference from built-in core admin?

PWolanin wrote:

From this write up, I'm not grasping exactly what aspect of the above uses your custom handler.

From our discussion in IRC, I thought the java was creating the instance dirs and copying over the conf files, but you seem to be doing that work manually in the shell?

Added Wed, 2010-06-30 10:47

RE: difference from built-in core admin?

Dave wrote:

The adminHandler attribute in the solr.xml tells solr to use the attached custom cpre admin handler class. The handler does all of the copying. This way I can use the standard core admin API.

The source is also attached if you want to see how it is done.

Added Wed, 2010-06-30 13:26

Error

Paul Hart wrote:

Hi Dave, Thanks for the great how-to, I followed all the steps including adding the plugin. But when I start Jetty & try to get the Status it fails (Err 500) because it cant seem to locate the plug-in?

Any suggestions would be greatly appreciated

Many Thanks Paul

Added Wed, 2010-07-28 17:59

nice tutorial

Drupal Themes Showcase wrote:

Thanks for this great tutorial.

Question: You wrote that jetty uses less resources. Do you know is there some "tomcat vs jetty" performance tests?

Added Fri, 2010-07-30 20:18

Re: Error

Dave wrote:

Sorry about that. I renamed the custom core admin handler class but didn't update my notes. I've updated the /etc/solr/solr.xml in the blog post and now it should work properly. Please let me know how you with it.

Added Sun, 2010-08-01 11:35

Couple questions about this process

Boden wrote:

Great post, I think Jetty may be the way for me to go after trying out solr-tomcat. I did have some issues after following these instructions on jetty, but it could have been from something random that I did.

Basically, I'm trying to get to multi-core setup with SOLR, with localhost and external access, but I do not need the auto-provisioning of cores.

My specific questions:

In /usr/share/solr/solr.xml, if not using the auto-provisioning, do you need to specify the core names and data directories? Like: solr persistent="false" sharedLib="lib" cores adminPath="/admin/cores" shareSchema="true" core name="site1" instanceDir="site1" core name="site1" instanceDir="site1" cores solr (with the appropriate containers, of course)

What would the 'instanceDir's be? Just "name_of_folder_for_site1" or "/var/lib/solr/cores/name_of_folder_for_site1"? Something else?

Being a bit of an Ubuntu noob, how does sudo bash -c 'for core in /var/lib/solr/cores/*; do cp -a /etc/solr/conf/ $core/; done' work? I understand what it does, I'm just wondering if there is another way to manually copy the default /etc/solr/conf/ to each of the core folders in /var/lib/solr/cores/.

Any advice would be greatly appreciated, I quickly found that this process can be a bit frustrating.

Thanks, Boden

Added Thu, 2010-09-02 02:56

Error "class not found"

Tim wrote:

Hi, I am getting this error. Please help

HTTP ERROR 500

Problem accessing /solr/admin/cores. Reason:

Severe errors in solr configuration.

Check your log files for more detailed information on what may be wrong.

If you want solr to continue after configuration errors, change:

false

in null

------------------------------------------------------------- org.apache.solr.common.SolrException: Error loading class 'au.com.davehall.solr.plugins.SolrCoreAdminHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newAdminHandlerInstance(SolrResourceLoader.java:423) at org.apache.solr.core.CoreContainer.createMultiCoreHandler(CoreContainer.java:596) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:237) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.jetty.start.daemon.Bootstrap.start(Bootstrap.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177) Caused by: java.lang.ClassNotFoundException: au.com.davehall.solr.plugins.SolrCoreAdminHandler at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:615) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359) ... 34 more Powered by Jetty://

Added Sun, 2011-05-22 04:30

Does this command need to be

Anonymous wrote:

Does this command need to be edited in any way?

sudo bash -c 'for core in /var/lib/solr/cores/*; do cp -a /etc/solr/conf/ $core/; done'

Do the words "subdomain" and "example" need to be edited? What should they be?

sudo mkdir -p /var/lib/solr/cores/{,subdomain_}example_com/{data,conf}

When I visit http://solr.example.com:8080/solr/admin/cores?action=STATUS

I get this: Oops! Firefox could not find solr.example.com:8080

I can not get this solr to work. This is very frustrating.

Added Thu, 2011-05-26 01:27

Search in Multiple Cores

Suneel Pandey wrote:

Hello Sir, I am using cores concept for searching but i have some question.

1.How to get Specific core name in response xml doc during search in multiple cores?.

I have searched on .net but not got ans please help me.

thank you

Regard,

Suneel Pandey

Added Mon, 2011-07-25 23:08

and then what? I tuple ))

Sanich wrote:

i have http://remote solr IP:8080/solr/ikona_dp_ua/admin/ it work and... which specify the Solr server URL? in D7 modules on ikona.dp.ua site??? I lost )) help!

Added Sun, 2011-08-21 10:45

Re: and then what?

Dave wrote:

I haven't started using Solr with D7. I will be doing that in the next couple of weeks. It should be http://solr.example.com:8080/solr/ikona_dp_ua/

Added Sat, 2011-08-27 14:47

solr for D7

Sanich wrote:

I have worked Solr and D7, - thanks to you too) but I want to have remote Solr, maybe you pay close attention to this issue? thanks))

Added Tue, 2011-08-30 05:00

i just wanted to say thankyou

Samer wrote:

i just wanted to say thankyou for this great tutorial..

it works very well with me on Ubuntu v11.x / Apachesolr 1.4.1

Thanks for your time.

Added Thu, 2011-11-10 18:36

How you remove a core?

Anonymous wrote:

How do you remove a core? When I did rm -R to the core folder,solr got broken.

Added Sun, 2012-02-19 14:41

RE: How you remove a core?

Dave wrote:

You have to also remove it from the solr.xml

Added Thu, 2012-03-08 09:57