QBoard » Artificial Intelligence & ML » AI and ML - Others » Randomness in Artificial Intelligence & Machine Learning

Randomness in Artificial Intelligence & Machine Learning

  • This question came to my mind while working on 2 projects in AI and ML. What If I'm building a model (e.g. Classification Neural Network,K-NN, .. etc) and this model uses some function that includes randomness. If I don't fix the seed, then I'm going to get different accuracy results every time I run the algorithm on the same training data. However, If I fix it then some other setting might give better results.

    Is averaging a set of accuracies enough to say that the accuracy of this model is xx % ?

    I'm not sure If this is the right place to ask such a question/open such a discussion.

      July 31, 2021 4:38 PM IST
    0
  • There are models which are naturally dependent on randomness (e.g., random forests) and models which only use randomness as part of exploring the space (e.g., initialisation of values for neural networks), but actually have a well-defined, deterministic, objective function.

    For the first case, you will want to use multiple seeds and report average accuracy, std. deviation, and the minimum you obtained. It is often good if you have a way to reproduce this, so just use multiple fixed seeds.

    For the second case, you can always tell, just on the training data, which run is best (although it might actually not be the one which gives you the best test accuracy!). Thus, if you have the time, it is good to do say, 10 runs, and then evaluate on the one with the best training error (or validation error, just never evaluate on testing for this decision). You can go a level up and do multiple multiple runs and get a standard deviation too. However, if you find that this is significant, it probably means you weren't trying enough initialisations or that you are not using the right model for your data.

      August 20, 2021 12:42 PM IST
    0
  • Simple answer, yes, you randomize it and use statistics to show the accuracy. However, it's not sufficient to just average a handful of runs. You need, at a minimum, some notion of the variability as well. It's important to know whether "70%" accurate means "70% accurate for each of 100 runs" or "100% accurate once and 40% accurate once".

    If you're just trying to play around a bit and convince yourself that some algorithm works, then you can just run it 30 or so times and look at the mean and standard deviation and call it a day. If you're going to convince anyone else that it works, you need to look into how to do more formal hypothesis testing.

      October 20, 2021 12:56 PM IST
    0
  • Randomness is a big part of machine learning. Randomness is used as a tool or a feature in preparing data and in learning algorithms that map input data to output data in order to make predictions. The source of randomness in machine learning is a mathematical trick called a pseudorandom number generator
      August 16, 2021 3:29 PM IST
    0