1. Introduction
  2. Quick start
  3. Exploring the example
  4. Additional topics

1. Introduction

This manual was prepared for developers who want to add new algorithms in openModeller. For more information about openModeller and ecological niche modelling, please visit http://openmodeller.sf.net.

More information about how to check out the current source code and how to compile it can be found in the following link:

http://openmodeller.sf.net/INSTALL.html

You will need to follow the installation instructions (checking out the source code from Subversion and compiling openModeller) before going through this document. From now, we will assume that the openModeller source files are already installed on your system in a directory that will be represented here by OM_INSTALL_DIR.

The directory where you can find all openModeller source code is:

OM_INSTALL_DIR/src

The main classes are located inside:

OM_INSTALL_DIR/src/openmodeller

While the algorithms reside in:

OM_INSTALL_DIR/src/algorithms

If you intend to make your algorithm available in an official openModeller release, please contact the openModeller developers. You can do this by subscribing and sending a message to the openModeller developers mailing list: https://lists.sourceforge.net/lists/listinfo/openmodeller-devel

Important: openModeller is a C++ framework that is currently only prepared to run algorithms that are written in the same language.

2. Quick start

2.1. Creating the files

Algorithms with just one header and one source file are usually created directly under OM_INSTALL_DIR/src/algorithms. In case there are more files or even copies of external libraries involved, then it is recommended to create a new directory under OM_INSTALL_DIR/src/algorithms. This document will cover the simplest case, so we will only create a "dummy.cpp" and its header "dummy.hh" directly in OM_INSTALL_DIR/src/algorithms.

Let's first take a look at the header file "dummy.hh":

/**
 * Declaration of a dummy algorithm.
 * 
 * @author Danilo J. S. Bellini (danilo estagio [at] gmail com)
 * $Id: $
 * 
 * LICENSE INFORMATION 
 * 
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version 2
 * of the License, or (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details:
 * 
 * http://www.gnu.org/copyleft/gpl.html
 */

#ifndef _OM_DUMMY_HH
#define _OM_DUMMY_HH

#include <openmodeller/om.hh>

class DummyAlgorithm : public AlgorithmImpl {

  public:

    DummyAlgorithm();

    int initialize();

    Scalar getValue( const Sample& x ) const;

};

#endif

This is the simplest form header for our new algorithm. If you are planning to release your algorithm with openModeller, please make sure that your license is compatible with GPL (the license used by openModeller). We recommend that all files start with specific comments indicating: purpose of the file, author name and contact, copyright and licensing information. The header shown here follows the same style of most algorithms in openModeller, so you can simply use the same approach.

If you are wondering about the $Id in the comments, this is one of the keywords recognized by Subversion. When the keyword Id is enabled for the file, whenever the file gets changed in the repository, that line is automatically updated with all essential information about the file version. The line is then expanded to something like this:

$Id: dummy.hh 4473 2008-07-18 11:41:01Z rdg $

So it will contain the file name, revision number, date/time when it was last modified and who made the last change.

Now let's continue with the dummy.cpp file:

/**
 * Definition of the dummy algorithm.
 * 
 * @author Danilo J. S. Bellini (danilo estagio [at] gmail com)
 * $Id: $
 * 
 * LICENSE INFORMATION 
 * 
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version 2
 * of the License, or (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details:
 * 
 * http://www.gnu.org/copyleft/gpl.html
 */

#include "dummy.hh"

////////////////////////////////////
///////////// Metadata /////////////

#define NUM_PARAM 1
#define TESTPAR "TestParameter"
#define TPARMIN 0.0
#define TPARMAX 5.0

// First parameter
static AlgParamMetadata parameters[NUM_PARAM] = {
  {
    TESTPAR, // Id
    "Test parameter", // Name
    Real, // Datatype
    "Test parameter for the dummy algorithm", // Overview
    "Test parameter used to demonstrate how parameters are declared", // Description
    1, // Not zero if the parameter has lower limit
    TPARMIN, // Parameter's lower limit
    1, // Not zero if the parameter has upper limit
    TPARMAX, // Parameter's upper limit
    "0.5" // Parameter's typical (default) value
  },
};

static AlgMetadata metadata = {

  "DUMMY", // Id
  "Dummy Algorithm", // Name
  "1.0", // Version
  "The dummy algorithm is cool!", // Overview
  "A dummy algorithm that was created for didactical purpose.", // Description
  "Danilo J. S. Bellini", // Algorithm author
  "None", // Bibliography
  "Danilo J. S. Bellini", // Code author
  "danilo . estagio @ gmail . com", // Code author's contact
  0, // Does not accept categorical data
  0, // Does not need (pseudo)absence points
  NUM_PARAM, parameters // Algorithm's parameters
};

///////////////////////////////////
///////////// Factory /////////////

OM_ALG_DLL_EXPORT 
AlgorithmImpl * 
algorithmFactory() {

  // Create an instance of the algorithm
  return new DummyAlgorithm();
}

OM_ALG_DLL_EXPORT 
AlgMetadata const *
algorithmMetadata() {

  return &metadata;
}

///////////////////////////////////
///////////// Methods /////////////

// Constructor
DummyAlgorithm::DummyAlgorithm() : 
AlgorithmImpl( &metadata ) {

  // Nothing here now
}

// Initialize model
int DummyAlgorithm::initialize() {

  return 0;
}

// Return the probability of presence given the environment conditions x
Scalar DummyAlgorithm::getValue( const Sample& x ) const {

  return 1.0;
}

This isn't the shortest .cpp file we can create, but it's close to it. Only the methods that are strictly required by openModeller have been included here. In you own implementation you will likely want to include additional helper methods that are required for algorithm logic.

2.2. Compiling

openModeller uses a cross-platform build environment called CMake. This means that you don't need to write any Makefile directly. Instead, you just need to edit configuration files called CMakeLists.txt that can be found in each directory. These files are used by CMake to generate the Makefiles for your platform.

Let's open the "CMakeLists.txt" file in the algorithms directory. You will notice that there are basically 3 steps to configure the compilation of your algorithm:

  1. Specify the source files.
  2. Add a new library target for your algorithm, linking it with openModeller.
  3. Add an installation instruction for your algorithm.

To specify the sources of our new algorithm, we just need to include in the appropriate section something like:

SET (LIBDUMMY_SRCS
     dummy.cpp
)

"LIBDUMMY_SRCS" is just a variable whose value is a list of source files. Then you need to add a new library target and link it with openModeller:

ADD_LIBRARY(dummy MODULE ${LIBDUMMY_SRCS})
TARGET_LINK_LIBRARIES(dummy openmodeller)

Here "dummy" is just the library name. Finally, the installation instruction will be:

INSTALL(TARGETS dummy RUNTIME DESTINATION ${OM_ALG_DIR} LIBRARY DESTINATION ${OM_ALG_DIR})

This is the simplest case. When the algorithm has its own directory, you need to specify it with SUBDIRS and then create another "CMakeLists.txt" file in that directory. You can conditionally compile an algorithm depending on the platform being used or on the presence of one or more specific libraries (in which case you'll also need to link with those libraries), you can easily add custom compiler flags, etc.

After making the changes in CMakeLists.txt for the dummy algorithm, recompile openModeller and install it again.

2.3. Testing

After installation, you can now test your algorithm using one of the interfaces. Let's use "om_console" with the sample request.txt file that you can find in the OM_INSTALL_DIR/examples directory. Actually, the new algorithm doesn't do anything special, but at least we can already see it on the list of available algorithms and we can try to run it:

$ om_console request.txt
openModeller version 0.6.1

Choose an algorithm between:
 [0] AquaMaps (beta version)
 [1] Bioclim
 [2] Climate Space Model
 [3] Dummy Algorithm
 [4] GARP (single run) - DesktopGARP implementation
 [5] GARP with best subsets - DesktopGARP implementation
 [6] Envelope Score
 [7] Environmental Distance
 [8] GARP (single run) - new openModeller implementation
 [9] GARP with Best Subsets - new openModeller implementation
 [10] Maximum Entropy
 [11] SVM (Support Vector Machines)
 [12] Quit

Option: 3
> Algorithm used: Dummy Algorithm
The dummy algorithm is cool!
* Parameter: Test parameter

Test parameter for the dummy algorithm.
TestParameter >= 0.000000
TestParameter <= 5.000000

Enter with value [0.5]:
[Info] Creating model
[Error] Algorithm could not be initialized.
[Info] Exception occurred: Algorithm could not be initialized.

As you can see, our algorithm couldn't be initialized (we'll see the reason below) but we have taken the first step: we added the new algorithm!

3. Exploring the example

3.1. Header file

Now, let's understand what we did. First take a look at dummy.hh (by the way, never forget to include the algorithm's header in the .cpp file!). It's really small, with only 10 lines of code that are strictly needed for compilation - real algorithms will have a bigger header file. Notice that all code is inside an "#ifndef" 'sentinel' C++ directive that avoids including this file more than once. For your algorithm you will need to change the name _OM_DUMMY_HH to a name of your choice (in the #ifdef and #define lines). We suggest to use the real file name in uppercase replacing the dot with an underscore and prepending the name with _OM_.

Algorithm header files must include om.hh to work. Also, the algorithm class (named DummyAlgorithm in our example) must inherit from the AlgorithmImpl class (included in om.hh). The constructor must be created explicitly and the methods "initialize" and "getValue" are needed for compilation.

3.2. Metadata

Although an algorithm may not need any extra parameters (see envelope_score.cpp, for instance) it can tell openModeller that it needs one or more parameters to run. Custom parameters must be specified in a static/global vector of AlgParamMetadata. In our example this vector was named "parameters" (see dummy.cpp). NUM_PARAM is a defined constant indicating the number of custom parameters needed by the algorithm, and TESTPAR is just another defined constant for the identifier of our single parameter. Since these values are used more than once in the program, it's better to keep them as defined constants. TESTPAR is specific to this example, so you should rename it when writing a real algorithm according to your parameter name.

All parameters are specified in that way and the comments on the side will help you remember what metadata need to be put in each line. Remember: DON'T remove any of the lines for each of your parameters, even if one of the values is null. The order and the number of lines are important (otherwise it won't comply with the AlgParamMetadata structure anymore). Each parameter is separated by comma from the others, so the entire block inside "parameters" can be copied below itself to create a second, third, etc. parameters. Just remember to update NUM_PARAM in this case.

If your algorithm has no parameters, define NUM_PARAM as 0 and then use the following line to define the parameters variable:

static AlgParamMetadata * parameters = 0;

Parameter datatypes can be one of the following: Integer, Real or String. Numeric parameters can have an associated valid range, so that user interfaces can validate the input before running the algorithm.

Parameters are part of the algorithm metadata, which also includes things such as version, description, author, bibliography, etc. Algorithm metadata is also declared in a static/global "AlgMetadata" datatype (named "metadata" in our case). The basic rules here are almost the same for each parameter: keep all lines, adjusting the content to your needs.

It is very important to pay special attention to description and bibliography so that users can understand how the algorithm works, in which way it is different from existing algorithms, and where they can find more information about it.

If you need to make any change to an existing algorithm, you should always remember to increase the version number - especially if you change the logic of the algorithm, fix a bug or just update the metadata.

3.3. Dynamic link

Algorithms in openModeller are dynamically loaded as plugins. The two functions defined after metadata (under the "Factory" comment) enable this dynamic connection between the algorithm and openModeller. The first one is used to return an instance of the algorithm and the other is used to return the algorithm metadata. The only thing you will need to change here is the name "DummyAlgorithm" - just replace it with the class name of your real algorithm.

3.4. Algorithm initialization

Now that we know how to insert a new algorithm in openModeller and how to specify its metadata, the next step is to make the dummy algorithm work. When we first tried to run the algorithm, it failed with an "Algorithm could not be initialized" exception. The reason is simply because our initialize method always returns 0 (false).

Let's change the initilize method in dummy.cpp to make it return 1 (true) under certain conditions. One of the typical actions taken by algorithms during initialization is to check if all mandatory parameters were passed and if their values are correct. Our parameter "TESTPAR" will be stored in a "_testpar" property that you should declare in the DummyAlgorithm class, better as private, so add these lines just before the "};" line in dummy.hh:

  private:

    Scalar _testpar;

Why is the datatype "Scalar"? Scalar is defined by openModeller to encapsulate the type used for real numbers. You can find this definition in om_defs.hh (src/openmodeller) where it's currently associated with the "double" datatype.

Now let's go back to dummy.cpp and rewrite the initialize method. The method "getParameter" that is implemented in the parent class can be used to get the parameter value:

int DummyAlgorithm::initialize() {

  if ( ! getParameter( TESTPAR, &_testpar ) ) {

    Log::instance()->error( "Parameter '" TESTPAR "' not specified.\n" );
    return 0;
  }

  return 1;
}

For numeric parameters with a valid range, remember to also verify if the value is within the expected range (maximum and minimum).

3.5. Complete run

After making these changes, recompile the algorithm, install the new version, and if you run om_console again you should now see the following output:

Enter with value [0.5]:
[Info] Creating model
[Info] Finished creating model
[Info] Projecting model
[Info] Finished projecting model
[Info] 
Model statistics
[Info] Accuracy:           100.00%
[Info] Omission error:       0.00% (0/65)
[Info] AUC:                  0.76
[Info] Percentage of cells predicted present:   100.00%
[Info] Total number of cells: 64681
[Info] Done.

Now that's a complete run. If you look again at the getValue method you'll see that it always returns 1.0, which means that no matter what are the environmental conditions, there's always "100% of chance to find the species". For this reason the resulting map is "white" and all cells are predicting presence.

4. Additional topics

4.1. Model creation

Our example focused only on including a new algorithm in openModeller. No attempt was made to generate a real model. In openModeller, a model can be seen as a function that returns the probability of finding a species in a location with certain environmental conditions. Actually the framework can be used with virtually anything whose distribution depends on the layers that were provided as input. Therefore, creating a model means creating that function. The notion of a function here is completely generic - each algorithm is free to build its own representation of the potential distribution, making use of any kind of resources: custom properties and methods, external libraries, web services, local databases, etc.

But how can a model be created in the previous example if the only input provided was a custom parameter? Actually, the most important input is already captured by the parent class (AlgorithmImpl) and made available in a property called "_samp". "_samp" is an instance of a SamplerPtr. In openModeller, the sampler is responsible for connecting the two basic inputs of a modelling experiment: a set of occurrence points and a set of rasters.

Non-iterative algorithms, i.e., algorithms that can directly generate a model in a single step, can create the model during algorithm initialization (see bioclim.cpp for an example) or can use the "iterate" method in a single step (see svm/svm_alg.cpp for an example). Iterative algorithms must implement the iteration in the "iterate" method (see garp/garp.cpp for an example). During model creation, this method will be called as many times as necessary by the parent class until the "done" method returns true. For this reason algorithms usually define a property "_done", which is returned by the "done" method. As soon the model is created "_done" should be set to "true" (remember to initialize "_done" with "false"). The iterate method must return 1, unless something wrong happens - in which case you should return 0.

Models can make use of any number of custom properties to be stored.

When model creation involves many steps and can take a long time, you should use the "getProgress" method to return a value between 0 and 1 to give an indication of how far the algorithm is in the task of model creation.

4.2. Using the sampler

All essential data to create a model can be found in the "_samp" property:

int number_of_layers = _samp->numIndependent();

int number_of_presence_points = _samp->numPresence();

int number_of_absence_points = _samp->numAbsence();

OccurrencesPtr presence_points = _samp->getPresences();

OccurrencesPtr absence_points = _samp->getAbsences();

// This is how you can loop over the presence points
for ( int i = 0 ; i < number_of_presence_points ; i++ ) {

  Sample environment_conditions = (*presence_points)[i]->environment();

  // You probably won't need this, but it's good to know
  Coord longitude = (*presence_points)[i]->x();
  Coord latitude = (*presence_points)[i]->y();
}

// Another way to iteate over points
OccurrencesImpl::const_iterator it = presence_points->begin();
OccurrencesImpl::const_iterator end = presence_points->end();

while ( it != end ) {

  Sample environment_conditions = (*it)->environment();
}

Sample is basically a vector where each value corresponds to the value of a layer in that specific location (lat/long). You can access the individual values through the "[]" operator. The size of the vector should always match the number of independent variables. There are many pointwise operators available for samples: "+=", "-=", "*=", "/=", "&=", "|=", besides the operators "equals", "sqr", "sqrt", "norm" and "dotProduct".

4.3. Normalization

Depending on the type of model being created, it is frequently necessary to normalize the environment values before working with them, otherwise layers with greater values will jut out from the others during model creation. There are two normalization strategies available in openModeller (others can be implemented in the future):

AlgorithmImpl already has a property ("_normalizerPtr") to store a normalizer object, so when an algorithm needs to use normalization it can simply instantiate the desired normalizer class in the constructor:

DummyAlgorithm::DummyAlgorithm() : 
AlgorithmImpl( &metadata ) {

  // This will force environment values to be scaled between 0 and 1
  // using the maximum and minimum values of the entire layer as reference
  // (opposed to using the maximum and minimum values of the input points)
  _normalizerPtr = new ScaleNormalizer( 0.0, 1.0, true );
}

4.4. Model projection

During model projection, openModeller will iterate over each cell of the resulting distribution map, read the corresponding environmental values from each layer and then call the "getValue" method of the algorithm. This method must always return a probability between 0 and 1 that will be written in the corresponding cell of the distribution map.

Therefore, "getValue" receives a sample as a parameter, which contains the environment conditions at a specific location. The algorithm should be able to use its model to calculate a probability for the specified sample.

4.5. Serialization

As mentioned before, algorithms can make use any number of custom properties to store all necessary data for representing their models. However, developers must also figure out a way of representing the same data in XML, which is the format used by openModeller to serialize objects. Otherwise, models won't last too long if they are only stored in memory. In our example, om_console was used to run the model and immediately project it, so it was not necessary to deal with serialization at all. It is very important to be able to store models so that they can be loaded at any time in the future to project results in different scenarios (past conditions, future conditions, different geographical regions, etc.).

In openModeller, serialization and deserialization can be implemented in the methods "_getConfiguration" and "_setConfiguration" respectively. These methods receive a ConfigurationPtr as a parameter. ConfigurationPtr encapsulates XML reading and writing, so all you need to know is the API of this class.

The following example only illustrates the main methods available in ConfigurationPtr (again, no real model is involved, you can look at the existing algorithms to see how different they are when representing their models).

// Use this method for model serialization
void DummyAlgorithm::_getConfiguration( ConfigurationPtr& config ) const
{
  // You may want to use a similar condition to avoid serialization when 
  // something went wrong
  if ( ! _done )
    return;

  // These two lines create a new XML element called "Dummy"
  ConfigurationPtr model_config( new ConfigurationImpl("Dummy") );
  config->addSubsection( model_config );

  // You can add as many XML attributes you want in your element.
  // addNameValue accepts different value types (std::string, int, double, 
  // Sample, etc.)
  model_config->addNameValue( "NumLayers", _number_of_layers );

  // You can also add any sub elements if necessary
  ConfigurationPtr extra_config( new ConfigurationImpl("SubElement") );
  model_config->addSubsection( extra_config );
}

// Use this method for model deserialization
void DummyAlgorithm::_setConfiguration( const ConstConfigurationPtr& config )
{
  // Find element "Dummy"
  ConstConfigurationPtr model_config = config->getSubsection( "Dummy", false );

  if ( ! model_config )
    return;

  // Read attributes from the element. Check the documentation to know all
  // flavors of getAttributeAs...
  _number_of_layers = model_config->getAttributeAsInt( "NumLayers", 0 );

  // Read a subelement
  ConstConfigurationPtr extra_config = model_config->getSubsection( "SubElement", false );

  // You may want to use a similar way of indicating when you're done loading 
  // a model. In this case, the _done property is the same one returned by the 
  // "done" method, which is used to indicate that your algorithm is ready to 
  // use a model.
  _done = true;
}

That's all for now.

If you have any questions, comments or suggestions please contact the openModeller developers mailing list: openmodeller-devel@lists.sourceforge.net