1. Introduction
  2. How it works
  3. Requirements
  4. Installation & Configuration
  5. Integration with Condor

1. Introduction

The openModeller server (oM Server) 2.0 is a server implementation compatible with the openModeller web service API version 2.0. This document describes the software functionality, including installation and configuration instructions. For information about the previous version, please visit the corresponding page dedicated to oM Server 1.0.

The original version of this document was prepared for the BioVeL project with funds from the European Commission. This new version was also funded by the EUBrazil-OpenBio project with funds from CNPq and the European Commission.

2. How it works

The server was developed using gSOAP v2.8.15 in a way that it can run as a CGI application, a stand alone server (daemon) or a multi-threaded server. Although the software (om_server.cpp) is written in C++ and could therefore interact directly with the openModeller library, in most cases it follows a different approach to benefit from high throughput computing techniques. Since the most important operations can take a long time to be performed, the service protocol follows an asynchronous design. Therefore, the server is actually just a thin layer with minimum interaction with the openModeller library. For the most important operations the server basically extracts a piece of XML from the SOAP body and stores it into the file system to be processed independently later.

Each piece of XML stored into the file system happens to follow exactly the same input pattern needed to run the corresponding openModeller command-line tool. This way, a simple job scheduler writen in shell script is used to process the jobs. There are two versions of the script: an old version (scheduler.sh) created to be called from Cron in configurable intervals, and a new version (om_scheduler.sh) created to be run as a daemon. Both scripts can be configured to run the command-line tools locally (default) or use HTCondor to distribute jobs to Cluster computing nodes.

The following table summarizes the server front-end behaviour for each operation:

operation server behaviour
ping involves direct calls to a local openModeller library and requires access to the file system to check configuration
getAlgorithms involves direct call to a local openModeller library
getLayers requires access to the file system to look for available layers (uses GDAL library to identify valid rasters)
createModel
testModel
projectModel
evaluateModel
samplePoints
extracts XML from SOAP Body putting it on the file system and immediatly returning a ticket
runExperiment extracts XML from SOAP Body putting it on the file system, parsing it, creating all individual jobs and returning all related tickets
getProgress
getLog
getModel
getTestResult
getProjectionMetadata
getLayerAsUrl
getModelEvaluation
getSamplingResult
getResults
cancel
only requires access to the file system to read ticket information

The ticketing mechanism starts with the use of the mkstmp POSIX function that creates a temporary file with a 6-digit unique name. This file will be later used to also store the job log. After ticket creation, the XML extracted from the SOAP request is stored into the file system following the pattern model_req.ticket, test_req.ticket, proj_req.ticket, samp_req.ticket, eval_req.ticket or exp_req.ticket depending on the type of job. In the case of modelling experiments (runExperiment request), the default implementation parses and expands all jobs creating individual tickets following the pattern: model_pend.ticket, test_pend.ticket, proj_pend.ticket, samp_pend.ticket or eval_pend.ticket. Jobs that can be started are renamed to the *_req.ticket pattern when processed locally.

The job scheduler periodically checks the file system for "_req." jobs, giving priority to the oldest one. When a job request is processed, its name is immediatly replaced by "_proc.", and in the end there can be different results depending on the job: model_resp.ticket stores the model result, test_resp.ticket the model test result, stats.ticket the projection statistics, samp_resp.ticket the sampling points result and eval_resp.ticket the model evaluation result. For all operations, prog.ticket is used to store job progress, and done.ticket is a flag indicating that the job was finished. Projection jobs also result in a potential distribution map file named after the ticket, such as ticket.tif. The following examples illustrate how each command-line tool is used in this context:

# Example directories
OM_CONFIGURATION=/var/www/om_server/config/om.cfg
TICKET_DIR=/var/www/om_server/ws2/tickets
DISTMAP_DIR=/var/www/om_server/ws2/distmaps

# sample points job
om_pseudo --xml-req $TICKET_DIR/samp_proc.QWE123 --result $TICKET_DIR/samp_resp.QWE123 \
--log-file $TICKET_DIR/QWE123 --prog-file $TICKET_DIR/prog.QWE123 --config-file $OM_CONFIGURATION

# model creation job
om_model --xml-req $TICKET_DIR/model_proc.ABC123 --model-file $TICKET_DIR/model_resp.ABC123 \
--log-file $TICKET_DIR/ABC123 --prog-file $TICKET_DIR/prog.ABC123 --config-file $OM_CONFIGURATION

# model evaluation job
om_evaluate --xml-req $TICKET_DIR/eval_proc.ZXC123 --result $TICKET_DIR/model_resp.ZXC123 \
--log-file $TICKET_DIR/ZXC123 --prog-file $TICKET_DIR/prog.ZXC123 --config-file $OM_CONFIGURATION

# model testing job
om_test --xml-req $TICKET_DIR/test_proc.DEF456 --result $TICKET_DIR/test_resp.DEF456 \
--log-file $TICKET_DIR/DEF456 --prog-file $TICKET_DIR/prog.DEF456 --config-file $OM_CONFIGURATION

# model projection job
om_project --xml-req $TICKET_DIR/proj_proc.GHI789 --dist-map $DISTMAP_DIR/GHI789.img \
--stat-file $TICKET_DIR/stats.GHI789 --log-file $TICKET_DIR/GHI789 \
--prog-file $TICKET_DIR/prog.GHI789 --config-file $OM_CONFIGURATION

When each job request is stored into the file system, job metadata is also saved into a separate file named job.ticket. Job metadata files follow the key = value pattern, and may contain:

TYPE = Type of job (samp, mod, eval, test, proj or exp).
EXP  = Experiment ticket, in case the job belongs to an experiment.
       (not used for experiment job metadata)
JOBS = Comma-separated list of job tickets that belong to this experiment,
       in case the job is an experiment.
IDS  = Comma-separated list of job identifiers that belong to this experiment,
       in case the job is an experiment. Identifiers are provided by clients in
       the request.
NEXT = Comma-separated list of job tickets that can be triggered after this job.
       (not used for experiment job metadata)
PREV = Comma-separated list of job tickets on which this job depends prefixed by their role.
       Prefixes can be: presence_, absence_, model_ or lpt_
       (not used for experiment job metadata)

Job metadata files were included in this version to allow experiment management. The default implementation of oM Server is capable of controlling an experiment workflow by running the omws_manager command-line tool after each job that belongs to an experiment finishes. omws_manager basically checks if the job finished successfully or not, and depending on the result it triggers subsequent jobs or cancels the remaining jobs. This is only possible by inspecting job metadata.

3. Requirements

Operating system: Only tested on GNU/Linux machines (should probably run on other UNIX flavors without changes). Although gSOAP seems to be compliant with "Windows, Unix, Linux, Pocket PC, Mac OS X, TRU64, VxWorks, etc.", parts of the openModeller SOAP server code are probably not prepared to run on other platforms - these parts include sigpipe handling (if multi-threading is enabled) and unique temporary file name generation (mkstemp function).

Web Server: Only tested with Apache HTTP Server. The web server is used to run the server as a CGI and also to expose the distribution maps generated by the service, allowing them to be downloaded through HTTP.

Modelling engine: openModeller framework and openModeller command-line tools.

Additional dependencies to run openModeller: GDAL (better to use version >= 1.9), PROJ.4, Expat, SQLite3 (needed to run the AquaMaps algorithm), GSL (needed to run ENFA and CSM algorithms) and libcURL (needed to handle WCS and remote rasters).

Additional dependencies to install and compile openModeller: Subversion, g++, cmake, ccmake (requires libcurses) and development packages for most of the previous libraries (libgdal-dev, libexpat-dev, libsqlite3-dev, libgsl-dev, libcurl-dev).

Additional tools: Bash and cron.

Disk space: At least 50GB are recommended to store the environmental layers and other 50GB to store results.

Note: gSOAP doesn't need to be installed to run the service - all necessary files are already included in openModeller.

4. Installation & Configuration

4.1. Running the service as a CGI

The following requirements refer to the server being run as a CGI application, the job scheduler launching local command-line processes and openModeller being installed from source code. Although the openModeller framework and command-line tools may be available as packages for some GNU/Linux distributions, the openModeller server is currently not available in binary format, so it needs to be compiled. Additionally, at the moment this document is being written, there were a few important changes to the server code that are still not part of any openModeller release. For this reason the instructions show how to retrieve the latest (potentially unstable!) code and how to compile it.

Step 1: Install all necessary packages before installing openModeller

Note that package names vary between distributions. The following example refers to Linux MiNT 9:

apt-get install g++ make cmake cmake-curses-gui subversion gdal-bin \
libgdal1-dev proj libgsl0-dev libexpat1-dev libsqlite3-dev sqlite3 curl \
curl-devel

It may be possible that some of these libraries do not have official packages for a particular distribution, in which case it will be necessary to either find them in an unofficial repository or download the corresponding source code and install them manually.

At this point we also assume that the other necessary tools (Bash, Cron and Apache HTTP Server) are already present on the system.

Step 2: Check out the current openModeller source code

cd /usr/local/src
svn co http://svn.code.sf.net/p/openmodeller/svn/trunk/openmodeller

The first time you check out the source you will likely be prompted to accept the openmodeller.svn.sourceforge.net certificate. In this case press 'p' to accept it permanently.

Note: This was performed when the current revision was #5430.

Step 3: Prepare compilation environment

cd openmodeller 
mkdir build
cd build
ccmake ..

The last command should open the CMake ncurses GUI where you can configure various aspects of the build. Make sure that the option OM_BUILD_SERVICE is turned ON, as this enables the compilation of oM Server. When finished, press 'c' to configure, 'e' to dismiss any error messages that may appear and 'g' to generate the make files. Note that sometimes 'c' needs to be pressed several times before the 'g' option becomes available. After the 'g' generation is complete, press 'q' to exit the ccmake interactive dialog.

Step 4: Compile and install openModeller

make
make install

After installing you can run some of the command-line tools to check that openModeller is working, such as the following one which displays the available algorithms:

om_algorithm -l

Step 5: Create a specific user & group to run the modelling tasks

useradd -s /bin/bash modeller

This usually also creates a group "modeller" by default. If not, create the group manually.

Important: Make sure that the default shell for the user is bash, as shown in the previous command.

Step 6: Prepare the directory structure and files to run the service

mkdir /var/www/vhosts
mkdir /var/www/vhosts/modeller
mkdir /var/www/vhosts/modeller/ws2
mkdir /var/www/vhosts/modeller/ws2/cache
mkdir /var/www/vhosts/modeller/ws2/cgi
mkdir /var/www/vhosts/modeller/ws2/config
mkdir /var/www/vhosts/modeller/ws2/distmaps
mkdir /var/www/vhosts/modeller/ws2/tickets
cp /usr/local/src/openmodeller/build/src/soap/2.0/om /var/www/vhosts/modeller/ws2/cgi/om
cp /etc/openmodeller/server.conf /var/www/vhosts/modeller/ws2/config
cd /var/www/vhosts/modeller/ws2/
chown -R modeller.modeller cache cgi distmaps tickets
chmod g+w cache distmaps tickets
mkdir /tmp/om
chown -R modeller.modeller /tmp/om
chmod g+w /tmp/om
mkdir /var/log/om
chown modeller.modeller /tmp/om
chmod g+w /tmp/om
chown modeller.modeller /var/log/om
chmod g+w /var/log/om
mkdir /layers
cp /usr/local/src/openmodeller/examples/*.tif /layers
chown -R modeller.modeller /layers

Important: Most of the above structure can be changed at will, however the server configuration file (server.conf) needs to be in a directory called "config" parallel to the directory where the server program resides (this is currently hard coded in the server program). The configuration file must be readable by both the web server user and the modelling user, in case they are different users.

Step 7: Configure the web server

There are different ways to configure a web server and plenty of documentation about this, so we won't get into details here.

As an example, assuming Apache is being used you can create a virtual host for the service by editing the Apache configuration (usually httpd.conf) and including something like this in the end of the file:

<VirtualHost your_IP:your_port>
    ServerAdmin your_email
    SuexecUserGroup modeller modeller
    DocumentRoot /var/www/vhosts/modeller
    ServerName your_domain
    ErrorLog logs/modeller.error_log
    CustomLog logs/modeller.access_log common

    ScriptAlias /ws2/ "/var/www/vhosts/modeller/ws2/cgi/"
    <Directory "/var/www/vhosts/modeller/ws2/cgi/">
        AllowOverride None
        Options ExecCGI
        Order allow,deny
        Allow from all
    </Directory>

   Alias /maps "/var/www/vhosts/modeller/ws2/distmaps/"
    <Directory "/var/www/vhosts/modeller/ws2/distmaps/">
        AllowOverride None
        Options FollowSymLinks
        Order allow,deny
        Allow from all
    </Directory>
</VirtualHost>

Note that in this case SuexecUserGroup is configured so that the web server user is the same as the modelling user. If the users are different, job ticket files will only be readable by the web server user and the command line tools will fail to write the logs and exit with an error.

Step 8: Configure openModeller (optional)

If the server needs to be able to read WCS or remote rasters, it will be necessary to create a configuration file for openModeller to indicate a cache directory and to specify accepted data sources.

You can put the configuration file in the same directory of your service configuration, and you can use the same cache directory. This means editing /var/www/vhosts/modeller/ws2/config/om.cfg and typing:

CACHE_DIRECTORY = /var/www/vhosts/modeller/ws2/cache

# Include as many lines as necessary, always in lower case
# without protocol, port or path:
ALLOW_RASTER_SOURCE = 127.0.0.1
ALLOW_RASTER_SOURCE = cria.org.br

Step 9: Configure the openModeller server

Edit /var/www/vhosts/modeller/ws2/config/server.conf and check that all configurations are correct considering the directory structure previously created. OM_CONFIGURATION should point to the configuration file for openModller created on the last step.

The following table illustrates the settings given the above structure, including information about the necessary permissions in case there are different users for the web server and for the modelling tasks:

configuration value web server user modelling user
TICKET_DIRECTORY /var/www/vhosts/modeller/ws2/tickets/ RW RW
DISTRIBUTION_MAP_DIRECTORY /var/www/vhosts/modeller/ws2/distmaps/ R RW
LOG_DIRECTORY /var/log/om/ RW -
LAYERS_DIRECTORY /layers/ R R
CACHE_DIRECTORY /var/www/vhosts/modeller/ws2/cache/ RW -
PID_DIRECTORY /tmp/om/ - RW
OM_CONFIGURATION /etc/om/om.cfg R R

Also make sure that the paths to the binaries are correct (OM_BIN_DIR, GDAL_BIN_DIR and CAT_BIN, these last two in case you use the old scheduler version).

Finally, set the BASE_URL value according to the machine's IP address or domain and the web server configuration, such as: http://your_domain/maps

Step 10: Configure the scheduler

There are two options here:

alternative 1: Using the old scheduler version with Cron

Copy the job scheduler bash script to the modelling user directory

su - modeller
mkdir bin
cd bin
cp /etc/openmodeller/scheduler.sh ./

Make it executable if necessary:

chmod a+x *.sh

Then edit the cron configuration for the user:

crontab -e

And finally add:

0-59/1 * * * * /home/modeller/bin/scheduler.sh /var/www/vhosts/modeller/ws2/config/server.conf 0
0-59/1 * * * * /home/modeller/bin/scheduler.sh /var/www/vhosts/modeller/ws2/config/server.conf 10
0-59/1 * * * * /home/modeller/bin/scheduler.sh /var/www/vhosts/modeller/ws2/config/server.conf 20
0-59/1 * * * * /home/modeller/bin/scheduler.sh /var/www/vhosts/modeller/ws2/config/server.conf 30
0-59/1 * * * * /home/modeller/bin/scheduler.sh /var/www/vhosts/modeller/ws2/config/server.conf 40
0-59/1 * * * * /home/modeller/bin/scheduler.sh /var/www/vhosts/modeller/ws2/config/server.conf 50
0 1 1 * * /home/modeller/bin/cleanup.sh

This will make the server process new jobs every 10 seconds and delete all files older than a month every day. Please note that cron executes in a different environment than normal users, so if for instance you had to include a specific setting in /etc/profile to run openModeller (such as specifying LD_LIBRARY_PATH to find the algorithms), you will need extra steps to get cron working properly. Cron errors are usually sent by e-mail to the corresponding user, so you may want to check the user mailbox when testing the service.

alternative 2: Using the new scheduler version as a daemon

Copy the job scheduler bash script to the modelling user directory

su - modeller
mkdir bin
cd bin
cp /etc/openmodeller/om_scheduler ./

Start the daemon using a standard start/stop tool from your distribution (important: you need to pass the service configuration file as a parameter).

In both alternatives, you can create another script to delete all files that are older than a month, for example cleanup.sh, with the following content:

find /var/www/vhosts/modeller/ws2/tickets/ -name "*" -mtime +30 -exec rm {} \;
find /var/www/vhosts/modeller/ws2/distmaps/ -name "*.img" -mtime +30 -exec rm {} \;
find /var/www/vhosts/modeller/ws2/distmaps/ -name "*.png" -mtime +30 -exec rm {} \;
find /var/www/vhosts/modeller/ws2/distmaps/ -name "*.tif" -mtime +30 -exec rm {} \;
find /var/www/vhosts/modeller/ws2/distmaps/ -name "*.asc" -mtime +30 -exec rm {} \;

Step 11: Test the service

Go to another machine where openModeller is installed and run the service command-line client inside the src/soap directory:

perl sampleClient.pl --server=http://your_domain/ws2/om

4.2. Other ways to run the service

To run the server as a stand alone non-multi-threaded service, just start it passing as a parameter the port number and then another parameter indicating the number of threads (=1), e.g.:

./om 8085 1 &

To run the server as a stand alone multi-threaded service, just start it passing as a parameter the port number and then another parameter indicating the number of threads (>1), e.g.:

./om 8085 10 &

5. Integration with Condor

For more information about setting up a Condor cluster, please refer to the official Condor website. In a typical installation, the same machine running the openModeller server will be the Condor master node, which is responsible for distributing jobs to the working nodes. Working nodes don't need to have another instance of the service. They just need the openModeller dynamic library and command-line tools to process jobs.

Integration with Condor can be enabled with the CONDOR_INTEGRATION option in the openModeller server configuration file and by adjusting the other related configuration options. Special care must be taken so that each node has access to the same environmental layers - either by mounting the same remote partition on their file systems or by acessing their own local copy of the layers. The same applies to the openModeller version and available algorithms, as they need to be exactly the same in all nodes.