openModeller id: MAXENT

**Current version:** 1.0 **Developer(s):** Elisangela S. da C. Rodrigues, Renato De Giovanni, Daniel Bolgheroni

**Accepts Categorical Maps:** no

**Requires absence points:** yes

**Author(s):** Steven J. Phillips, Miroslav Dudík, Robert E. Schapire

The principle of maximum entropy is a method for analyzing available qualitative information in order to determine a unique epistemic probability distribution. It states that the least biased distribution that encodes certain given information is that which maximizes the information entropy (content retrieved from Wikipedia on the 19th of May, 2008: http://en.wikipedia.org/wiki/Maximum_entropy). This implementation in openModeller follows the same approach of Maxent (Phillips et al. 2004). It was compared with Maxent 3.3.3e through a standard experiment using all possible combinations of parameters, generating models with the same number of iterations, at least a 90% rate of matching best features considering all iterations, distribution maps with a correlation (r) greater than 0.999 and no difference in the final loss. However, previous implementations of this algorithm (before version 1.0) used to generate quite different results. The first versions were based on an existing third-party Maximum Entropy library which produced low quality models compared with all other algorithms. After that, the algorithm was re-written a couple of times by Elisangela Rodrigues as part of her Doctorate. Finally, the EUBrazil-OpenBio project funded the remaining work to make this algorithm compatible with Maxent. Please note that not all functionality available from Maxent is available here - in particular the possibility of using collecting bias and categorical maps is not present, as well as many specific parameters for advanced users. However, you should be able to get compatible results for all other available parameters.

1) Jaynes, E.T. (1957) Information Theory and Statistical Mechanics. In Physical Review, Vol. 106, #4 (pp 620-630). 2) Berger, A. L., Pietra, S. A. D. and Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39-71. 3) Darroch, J.N. and Ratcliff, D. (1972) Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, Vol. 43: pp 1470-1480. 4) Malouf, R. (2003) A comparison of algorithms for maximum entropy parameter estimation. Proceedings of the Sixth Conference on Natural Language Learning. 5) Phillips, S.J., Dudík, M. and Schapire, R.E. (2004) A maximum entropy approach to species distribution modeling. Proceedings of the Twenty-First International Conference on Machine Learning, pp 655-662.

**Number of background points**

openModeller id: NumberOfBackgroundPoints

Number of background points to be generated.

**Data type:** integer **Domain:** [0.0, 10000.0] **Typical value:** 10000

** Use absence points as background**

openModeller id: UseAbsencesAsBackground

When absence points are provided, this parameter can be used to instruct the algorithm to use them as background points. This would prevent the algorithm to randomly generate them, also facilitating comparisons between different algorithms.

**Data type:** integer **Domain:** [0.0, 1.0] **Typical value:** 0

**Include input points in the background**

openModeller id: IncludePresencePointsInBackground

Include input points in the background: 0=No, 1=Yes.

**Data type:** integer **Domain:** [0.0, 1.0] **Typical value:** 1

**Number of iterations**

openModeller id: NumberOfIterations

Number of iterations.

**Data type:** integer **Domain:** [1.0, oo] **Typical value:** 500

**Terminate tolerance**

openModeller id: TerminateTolerance

Tolerance for detecting model convergence.

**Data type:** real **Domain:** [0.0, oo] **Typical value:** 0.00001

**Output format**

openModeller id: OutputFormat

Output format: 1 = Raw, 2 = Logistic.

**Data type:** integer **Domain:** [1.0, 2.0] **Typical value:** 2

**Quadratic features**

openModeller id: QuadraticFeatures

Enable quadratic features (0=no, 1=yes)

**Data type:** integer **Domain:** [0.0, 1.0] **Typical value:** 1

**Product features**

openModeller id: ProductFeatures

Enable product features (0=no, 1=yes)

**Data type:** integer **Domain:** [0.0, 1.0] **Typical value:** 1

**Hinge features**

openModeller id: HingeFeatures

Enable hinge features (0=no, 1=yes)

**Data type:** integer **Domain:** [0.0, 1.0] **Typical value:** 1

**Threshold features**

openModeller id: ThresholdFeatures

Enable threshold features (0=no, 1=yes)

**Data type:** integer **Domain:** [0.0, 1.0] **Typical value:** 1

**Auto features**

openModeller id: AutoFeatures

Enable auto features (0=no, 1=yes)

**Data type:** integer **Domain:** [0.0, 1.0] **Typical value:** 1

**Product/threshold threshod**

openModeller id: MinSamplesForProductThreshold

Number of samples at which product and threshold features start being used (only when auto features is enabled).

**Data type:** integer **Domain:** [1.0, oo] **Typical value:** 80

**Quadratic threshold**

openModeller id: MinSamplesForQuadratic

Number of samples at which quadratic features start being used (only when auto features is enabled).

**Data type:** integer **Domain:** [1.0, oo] **Typical value:** 10

**Hinge threshold**

openModeller id: MinSamplesForHinge

Number of samples at which hinge features start being used (only when auto features is enabled).

**Data type:** integer **Domain:** [1.0, oo] **Typical value:** 15

The following image shows a sample model in the environmental space (temperature x precipitation) generated with the standard dataset used for tests (*Thalurania furcata boliviana* localities dataset):

fig. 1: Maxent model with default parameters. |