SVM (Support Vector Machines)

openModeller id: SVM

Current version: 0.5    Developer(s): Renato De Giovanni in collaboration with Ana Carolina Lorena

Accepts Categorical Maps: no

Requires absence points: no

Author(s): Vladimir N. Vapnik

Description

Support vector machines map input vectors to a higher dimensional space where a maximal separating hyperplane is constructed. Two parallel hyperplanes are constructed on each side of the hyperplane that separates the data. The separating hyperplane is the hyperplane that maximises the distance between the two parallel hyperplanes. An assumption is made that the larger the margin or distance between these parallel hyperplanes the better the generalisation error of the classifier will be. The model produced by support vector classification only depends on a subset of the training data, because the cost function for building the model does not care about training points that lie beyond the margin. Content retrieved from Wikipedia on the 13th of June, 2007: http://en.wikipedia.org/w/index.php?title=Support_vector_machine&oldid=136646498. The openModeller implementation of SVMs makes use of the libsvm library version 2.85: Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Release history: version 0.1: initial release version 0.2: New parameter to specify the number of pseudo-absences to be generated; upgraded to libsvm 2.85; fixed memory leaks version 0.3: when absences are needed and the number of pseudo absences to be generated is zero, it will default to the same number of presences version 0.4: included missing serialization of C version 0.5: the indication if the algorithm needed normalized environmental data was not working when the algorithm was loaded from an existing model.

Bibliography

1) Vapnik, V. (1995) The Nature of Statistical Learning Theory. SpringerVerlag. 2) Schölkopf, B., Smola, A., Williamson, R. and Bartlett, P.L.(2000). New support vector algorithms. Neural Computation, 12, 1207-1245. 3) Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola A.J. and Williamson, R.C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13, 1443-1471. 4) Cristianini, N. & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press.

Parameters

SVM type

openModeller id: SvmType

Type of SVM: 0 = C-SVC, 1 = Nu-SVC, 2 = one-class SVM

Data type: integer  Domain: [0.0, 2.0]  Typical value: 0

Kernel type

openModeller id: KernelType

Type of kernel function: 0 = linear: u'*v , 1 = polynomial: (gamma*u'*v + coef0)^degree , 2 = radial basis function: exp(-gamma*|u-v|^2)

Data type: integer  Domain: [0.0, 4.0]  Typical value: 2

Degree

openModeller id: Degree

Degree in kernel function (only for polynomial kernels).

Data type: integer  Domain: [0.0, oo]  Typical value: 3

Gamma

openModeller id: Gamma

Gamma in kernel function (only for polynomial and radial basis kernels). When set to zero, the default value will actually be 1/k, where k is the number of layers.

Data type: real  Domain: [oo, oo]  Typical value: 0

Coef0

openModeller id: Coef0

Coef0 in kernel function (only for polynomial kernels).

Data type: real  Domain: [oo, oo]  Typical value: 0

Cost

openModeller id: C

Cost (only for C-SVC types).

Data type: real  Domain: [0.001, oo]  Typical value: 1

Nu

openModeller id: Nu

Nu (only for Nu-SVC and one-class SVM).

Data type: real  Domain: [0.001, 1.0]  Typical value: 0.5

Probabilistic output

openModeller id: ProbabilisticOutput

Indicates if the output should be a probability instead of a binary response (only available for C-SVC and Nu-SVC).

Data type: integer  Domain: [0.0, 1.0]  Typical value: 1

Number of pseudo-absences

openModeller id: NumberOfPseudoAbsences

Number of pseudo-absences to be generated (only for C-SVC and Nu-SVC when no absences have been provided). When absences are needed, a zero parameter will default to the same number of presences.

Data type: integer  Domain: [0.0, oo]  Typical value: 0


Sample models

The following images show two models in the environmental space (temperature x precipitation) generated with the same presence points (Thalurania furcata boliviana localities dataset) but with different parameters. Since SVM C-SVC needs absence points, the first model included a set of pseudo-absence points that were randomly generated in areas (environmental space) distant from the presence points:

nichenicheniche
fig. 1: SVM C-SVC with default parameters. Pseudo-absence points are displayed in red.fig. 2: SVM one-class with Nu=0.5fig. 3: SVM one-class with Nu=0.05