QBoard » Artificial Intelligence & ML » AI and ML - Conceptual » When should I use support vector machines as opposed to artificial neural networks?

When should I use support vector machines as opposed to artificial neural networks?

  • I know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes).
    But here is where I'm unclear -- do all of these claims of superiority hold for just the case of a 2 class decision problem or do they go further? (I assume they hold for non-linearly separable classes or else no-one would care)
    So a sample of some of the cases I'd like to be cleared up:
    • Are SVMs better than ANNs with many classes?
    • in an online setting?
    • What about in a semi-supervised case like reinforcement learning?
    • Is there a better unsupervised version of SVMs?
    I don't expect someone to answer all of these lil' subquestions, but rather to give some general bounds for when SVMs are better than the common ANN equivalents (e.g. FFBP, recurrent BP, Boltzmann machines, SOMs, etc.) in practice, and preferably, in theory as well.
      October 12, 2021 5:10 PM IST
    0
  • Are SVMs better than ANN with many classes? You are probably referring to the fact that SVMs are in essence, either either one-class or two-class classifiers. Indeed they are and there's no way to modify a SVM algorithm to classify more than two classes.
    The fundamental feature of a SVM is the separating maximum-margin hyperplane whose position is determined by maximizing its distance from the support vectors. And yet SVMs are routinely used for multi-class classification, which is accomplished with a processing wrapper around multiple SVM classifiers that work in a "one against many" pattern--i.e., the training data is shown to the first SVM which classifies those instances as "Class I" or "not Class I". The data in the second class, is then shown to a second SVM which classifies this data as "Class II" or "not Class II", and so on. In practice, this works quite well. So as you would expect, the superior resolution of SVMs compared to other classifiers is not limited to two-class data.
    As far as i can tell, the studies reported in the literature confirm this, e.g., In the provocatively titled paper Sex with Support Vector Machines substantially better resolution for sex identification (Male/Female) in 12-square pixel images, was reported for SVM compared with that of a group of traditional linear classifiers; SVM also outperformed RBF NN, as well as large ensemble RBF NN). But there seem to be plenty of similar evidence for the superior performance of SVM in multi-class problems: e.g., SVM outperformed NN in protein-fold recognition, and in time-series forecasting.
    My impression from reading this literature over the past decade or so, is that the majority of the carefully designed studies--by persons skilled at configuring and using both techniques, and using data sufficiently resistant to classification to provoke some meaningful difference in resolution--report the superior performance of SVM relative to NN. But as your Question suggests, that performance delta seems to be, to a degree, domain specific.
    For instance, NN outperformed SVM in a comparative study of author identification from texts in Arabic script; In a study comparing credit rating prediction, there was no discernible difference in resolution by the two classifiers; a similar result was reported in a study of high-energy particle classification.
    I have read, from more than one source in the academic literature, that SVM outperforms NN as the size of the training data decreases.
    Finally, the extent to which one can generalize from the results of these comparative studies is probably quite limited. For instance, in one study comparing the accuracy of SVM and NN in time series forecasting, the investigators reported that SVM did indeed outperform a conventional (back-propagating over layered nodes) NN but performance of the SVM was about the same as that of an RBF (radial basis function) NN.
    [Are SVMs better than ANN] In an Online setting? SVMs are not used in an online setting (i.e., incremental training). The essence of SVMs is the separating hyperplane whose position is determined by a small number of support vectors. So even a single additional data point could in principle significantly influence the position of this hyperplane.
    What about in a semi-supervised case like reinforcement learning? Until the OP's comment to this answer, i was not aware of either Neural Networks or SVMs used in this way--but they are.
    The most widely used- semi-supervised variant of SVM is named Transductive SVM (TSVM), first mentioned by Vladimir Vapnick (the same guy who discovered/invented conventional SVM). I know almost nothing about this technique other than what's it is called and that is follows the principles of transduction (roughly lateral reasoning--i.e., reasoning from training data to test data). Apparently TSV is a preferred technique in the field of text classification.
    Is there a better unsupervised version of SVMs? I don't believe SVMs are suitable for unsupervised learning. Separation is based on the position of the maximum-margin hyperplane determined by support vectors. This could easily be my own limited understanding, but i don't see how that would happen if those support vectors were unlabeled (i.e., if you didn't know before-hand what you were trying to separate). One crucial use case of unsupervised algorithms is when you don't have labeled data or you do and it's badly unbalanced. E.g., online fraud; here you might have in your training data, only a few data points labeled as "fraudulent accounts" (and usually with questionable accuracy) versus the remaining >99% labeled "not fraud." In this scenario, a one-class classifier, a typical configuration for SVMs, is the a good option. In particular, the training data consists of instances labeled "not fraud" and "unk" (or some other label to indicate they are not in the class)--in other words, "inside the decision boundary" and "outside the decision boundary."
    I wanted to conclude by mentioning that, 20 years after their "discovery", the SVM is a firmly entrenched member in the ML library. And indeed, the consistently superior resolution compared with other state-of-the-art classifiers is well documented.
    Their pedigree is both a function of their superior performance documented in numerous rigorously controlled studies as well as their conceptual elegance. W/r/t the latter point, consider that multi-layer perceptrons (MLP), though they are often excellent classifiers, are driven by a numerical optimization routine, which in practice rarely finds the global minimum; moreover, that solution has no conceptual significance. On the other hand, the numerical optimization at the heart of building an SVM classifier does in fact find the global minimum. What's more that solution is the actual decision boundary.
    Still, i think SVM reputation has declined a little during the past few years.
    The primary reason i suspect is the NetFlix competition. NetFlix emphasized the resolving power of fundamental techniques of matrix decomposition and even more significantly t*he power of combining classifiers. People combined classifiers long before NetFlix, but more as a contingent technique than as an attribute of classifier design. Moreover, many of the techniques for combining classifiers are extraordinarily simple to understand and also to implement. By contrast, SVMs are not only very difficult to code (in my opinion, by far the most difficult ML algorithm to implement in code) but also difficult to configure and implement as a pre-compiled library--e.g., a kernel must be selected, the results are very sensitive to how the data is re-scaled/normalized, etc.
      October 13, 2021 1:17 PM IST
    0
  • I loved Doug's answer. I would like to add two comments.

    1) Vladimir Vapnick also co-invented the VC dimension which is important in learning theory.

    2) I think that SVMs were the best overall classifiers from 2000 to 2009, but after 2009, I am not sure. I think that neural nets have improved very significantly recently due to the work in Deep Learning and Sparse Denoising Auto-Encoders. I thought I saw a number of benchmarks where they outperformed SVMs. See, for example, slide 31 of

    http://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf

    A few of my friends have been using the sparse auto encoder technique. The neural nets build with that technique significantly outperformed the older back propagation neural networks. I will try to post some experimental results at artent.net if I get some time.

      October 15, 2021 1:52 PM IST
    0
  • Support vector machine (SVM) and artificial neural network (ANN) systems were applied to a drug/nondrug classification problem as an example of binary decision problems in early-phase virtual compound filtering and screening. The results indicate that solutions obtained by SVM training seem to be more robust with a smaller standard error compared to ANN training.

    Generally, the SVM classifier yielded slightly higher prediction accuracy than ANN, irrespective of the type of descriptors used for molecule encoding, the size of the training data sets, and the algorithm employed for neural network training. The performance was compared using various different descriptor sets and descriptor combinations based on the 120 standard Ghose-Crippen fragment descriptors, a wide range of 180 different properties and physicochemical descriptors from the Molecular Operating Environment (MOE) package, and 225 topological pharmacophore (CATS) descriptors. For the complete set of 525 descriptors cross-validated classification by SVM yielded 82% correct predictions (Matthews cc = 0.63), whereas ANN reached 80% correct predictions (Matthews cc = 0.58). Although SVM outperformed the ANN classifiers with regard to overall prediction accuracy, both methods were shown to complement each other, as the sets of true positives, false positives (overprediction), true negatives, and false negatives (underprediction) produced by the two classifiers were not identical. The theory of SVM and ANN training is briefly reviewed.
      October 25, 2021 2:21 PM IST
    0
  • Just answering only this question here. Unsupervised learning can be done by so-called one-class support vector machines. Again, similar to normal SVMs, there is an element that promotes sparsity. In normal SVMs only a few points are considered important, the support vectors. In one-class SVMs again only a few points can be used to either:
    1. "separate" a dataset as far from the origin as possible, or
    2. define a radius as small as possible.
    The advantages of normal SVMs carry over to this case. Compared to density estimation only a few points need to be considered. The disadvantages carry over as well.
      October 28, 2021 6:18 PM IST
    0