QBoard » Artificial Intelligence & ML » AI and ML - Conceptual » What are the 3 types of machine learning bias?

What are the 3 types of machine learning bias?

  • What are the 3 types of machine learning bias?
      August 13, 2021 3:46 PM IST
    0
  • Bias in machine learning can be applied when collecting the data to build the models. It can come with testing the outputs of the models to verify their validity. Bias machine learning can even be applied when interpreting valid or invalid results from an approved data model. Nearly all of the common machine learning biased data types come from our own cognitive biases. Some examples include Anchoring bias, Availability bias, Confirmation bias, and Stability bias.

    Anchoring bias occurs when choices on metrics and data are based on personal experience or preference for a specific set of data. By “anchoring” to this preference, models are built on the preferred set, which could be incomplete or even contain incorrect data leading to invalid results. Because this is the “preferred” standard, realizing the outcome is invalid or contradictory and can be hard to discover.

    Availability bias, similar to anchoring, is when the data set contains information based on what the modeler’s most aware of. For example, if the facility collecting the data specializes in a particular demographic or comorbidity, the data set will be heavily weighted towards that information. If this set is then applied elsewhere, the generated model may recommend incorrect procedures or ignore possible outcomes because of the limited availability of the original data source.

    Confirmation bias leads to the tendency to choose source data or model results that align with currently held beliefs or hypotheses. The generated results and output of the model can also strengthen the confirmation bias of the end-user, leading to bad outcomes.

    Stability bias is driven by the belief that large changes typically do not occur, so non-conforming results are ignored, thrown out or re-modeled to conform back to the expected behavior. Even if we are feeding our models good data, the results may not align with our beliefs. It can be easy to ignore the real results.

      August 16, 2021 3:43 PM IST
    0
  • There are four distinct types of machine learning bias that we need to be aware of and guard against.
    • Sample bias. Sample bias is a problem with training data. ...
    • Prejudice bias. Prejudice bias is a result of training data that is influenced by cultural or other stereotypes. ...
    • Measurement bias. ...
    • Algorithm bias.
      August 14, 2021 10:03 PM IST
    0
  • 1. Sample Bias
    We all have to consider sampling bias on our training data as a result of human input. Machine learning models are predictive engines that train on a large mass of data based on the past. They are made to predict based on what they have been trained to predict. These predictions are only as reliable as the human collecting and analyzing the data. The decision makers have to remember that if humans are involved at any part of the process, there is a greater chance of bias in the model.
    The sample data used for training has to be as close a representation of the real scenario as possible. There are many factors that can bias a sample from the beginning and those reasons differ from each domain (i.e. business, security, medical, education etc.)

    2. Prejudice Bias
    This again is a cause of human input. Prejudice occurs as a result of cultural stereotypes in the people involved in the process. Social class, race, nationality, gender can creep into a model that can completely and unjustly skew the results of your model. Unfortunately it is not hard to believe that it may have been the intention or just neglected throughout the whole process.
    Involving some of these factors in statistical modelling for research purposes or to understand a situation at a point in time is completely different to predicting who should get a loan when the training data is skewed against people of a certain race, gender and/or nationality.

    3. Confirmation Bias
    This is a well-known bias that has been studied in the field of psychology and directly applicable to how it can affect a machine learning process. If the people of intended use have a pre-existing hypothesis that they would like to confirm with machine learning (there are probably simple ways to do it depending on the context) the people involved in the modelling process might be inclined to intentionally manipulate the process towards finding that answer. I would personally think it is more common than we think just because heuristically, many of us in industry might be pressured to get a certain answer before even starting the process than just looking to see what the data is actually saying.

    4. Group attribution Bias
    This type of bias results from when you train a model with data that contains an asymmetric view of a certain group. For example, in a certain sample dataset if the majority of a certain gender would be more successful than the other or if the majority of a certain race makes more than another, your model will be inclined to believe these falsehoods. There is label bias in these cases. In actuality, these sorts of labels should not make it into a model in the first place. The sample used to understand and analyse the current situation cannot just be used as training data without the appropriate pre-processing to account for any potential unjust bias. Machine learning models are becoming more ingrained in society without the ordinary person even knowing which makes group attribution bias just as likely to punish a person unjustly because the necessary steps were not taken to account for the bias in the training data.
      August 15, 2021 10:31 PM IST
    0