• 1
  • 2
  • 3
  • 4
  • 5
Editor Rating
  • 1
  • 2
  • 3
  • 4
  • 5
User Ratings
Based on 0 reviews

Major Concepts

Business Statistics


Business Statistics is the science of decision making  by analyzing and interpreting data using several statistical methods.

Probability Distributions

Probability is the long run average of a random event occurring. Probability distribution is a rule that identifies possible outcome of a random variable and assigns a probability to each. There are 2 types of probability distributions - Discrete and Continuous.

Discrete Distribution has a finite number of outcome values, e.g. face value of a card, number of bad cheques received by a bank, number of absent employees.

Continuous Distribution has all possible outcomes in some range, e.g. sales per month, height of students of a class. These are nicer to deal with and are good approximations when there are a large number of possible values.


A histogram represents the frequency distribution, i.e. how many observations take the value within a certain interval - e.g. GMAT scores


Requirements for a Discrete Probability Function

  • Probabilities should be between 0 and 1, inclusively


  • Total of all probabilities should be equal to 1


For example, the hourly sales of google pixel on flipkart can be shown as below,


Expected Value or Mean of a random variable is the weighted average of its values.


By the above formula, the mean number of google pixel phones sold per hour in the above graphic can be calculated as, (0*0.05 + 1*0.3 + 2*0.2 + 3*0.4 + 4*0.05) = 2.1

Variance and Standard Deviation

Variance (σ2) is the weighted average of  the squared deviations from the mean. Variance measures how far a dataset is spread out. The technical definition is “The average of the squared differences from the mean”, but all it really does is give a general idea of the spread of the data. A value of zero means that there is no variability. The probabilities serve as weights and units are square of the units of the variable.


Standard Deviation (σ) : The Square root of variance is the standard deviation. While variation gives rough idea of spread of a dataset, the standard deviation is more concrete, giving exact distances from the mean. It has same units as the variable.



Famous Probability Distribution Functions

Uniform Distribution is characterized by flat distribution. Imagine rolling a fair die, the outcomes 1 to 6 are equally likely. It can be defined for any number of outcomes n or even as a continuous distribution.


Bernoulli Distribution  has two lines of equal height, representing the two equally probable outcomes of 0 and 1 at either end. Eg: Two discrete outcomes - tails or heads.


Binomial Distribution is the sum of outcome of things that follow a bernoulli distribution. If the need is for counting the number of successes in things that act like a coin flip, where each flip is independent and has the same probability of success. Binomial distribution is denoted by the notation b(k;n,p); 

b(k;n,p) = C(n,k) pk qn-k, where C(n,k) is known as the binomial coefficient.


Hypergeometric Distribution is similar count like binomial but without the replacement of events. It is definitely close to binomial distribution but not the same, because the probability of success changes as the trial is measured without replacement.


Poisson Distribution is also the distribution of a count - the count of times something happened. It is parameterized not by a probability p and number of trials n but by an average rate λ, which in this analogy is simply the constant value of np. The poisson distribution is what one can use when trying to count events over a time given the continuous rate of events occuring.

When things like packets arrive at routers, or customers arrive at a store, or things wait in some kind of queue, think poisson.


Geometric Distribution is the probability of success in Nth trial



Exponential Distribution - Given events whose count per time follows a poisson distribution, then the time between events follows an exponential distribution with the same rate parameter λ. The exponential distribution should come to mind when thinking of “time until event”, maybe “time until failure”.



Normal Distribution


  • It is a continuous and symmetrical distribution

  • The graph of the pdf (probability density function) is a bell shaped curve

  • The normal random variable takes value from -∞ to +∞

  • It is symmetric and centered around the mean (which also the median and mode)

  • Area under the curve sums to 1

  • Any normal distribution can be specified with just two parameters - the mean (μ) and the standard deviation (σ)

  • This is represented as X ~ N(μ,σ)


Coefficient of Skewness

Skewness is the measure of asymmetry in one variable of a data.


  • If Skew < 0, the distribution is negatively skewed (skewed to the left)

  • If Skew = 0, the distribution is symmetric (not skewed)

  • If Skew >0, the distribution is positively skewed (skewed to the right)



A positive kurtosis value  means that there is too little data in the tails. A negative value means there is too much data in the tails. The heaviness or lightness in the tails means that the data looks less peaked (or more peaked).

Kurtosis is measured against the standard normal distribution. The standard normal distribution has a kurtosis of 3, so if the values are closer to that then the graph is nearly normal. These nearly normal distributions are called mesokurtic.


Excess kurtosis is just kurt-3. For example, the excess for the normal distribution is 3-3=0.

  • Negative excess means there is less of a  peak (more data in the tails)

  • Positive excess means there is more of a peak (less data in the tails)

Normal distribution with a width of 1, 2 and 3 standard deviations



Z-Scores, Standard Normal Distribution

For every value of (x) of the random variable X, we can calculate its Z-score.


If X~N(μ,σ2), then Z-scores have a normal distribution with μ=0, σ=1 i.e. Z~N(0,1). This is called Standard Normal Distribution. Z-table only gives less than probabilities.




Probability Calculation for Normal Distribution

Suppose GMAT scores can be reasonably modelled using a normal distribution with μ = 711 and σ = 29, P(X<=680) can be calculated as follows

  1. Calculate Z-score corresponding to 680; Z = (680-711)/29 = -1.06

  2. Calculate the probabilities using Z-tables, P(Z <= -1.06) = 0.14

Calculating P(697<=X<=740)

  1. Calculate P(X<=740) - P(X<=697)

  2. P(X<=740) = P(Z<=1) = 0.84

  3. P(X<=697) = P(Z<=-0.5) = 0.31

  4. P(697<=X<=740) = 0.84 - 0.31 = 0.53


Normal-Quantile (Q-Q) plot


  • Nearly normal if the data is along the diagonal reference line on the plot

  • Deviations often likely at extremes, and the bands help judge the severity of the deviation

Central Limiting Theorem

The Central Limiting Theorem states that the distribution of the sample mean

  • Will be normal when the distribution of the data in the population is normal

  • Will be approximately normal even if the distribution of the data in the population is not normal, if the sample size is fairly large


Applications of Central Limit Theorem (CLT)

  • CLT has some very important applications in statistical practice. Many practice in statistics, such as involving hypothesis testing, regression or confidence intervals, make some assumptions concerning the population that the data was obtained from. One assumption that is initially made in a statistics course is that the populations we work with are normally distributed

  • The assumption that data is from a normal distribution simplifies matters but seems a little unrealistic. Just a little work with real world data outliers, skewness, multiple peaks and asymmetry show up quite routinely. We can get around the problem from a population that is not normal. The use of an appropriate sample size and the central limit theorem help us to get around the problem of data from populations that are not normal

  • Thus, even though we might know the shape of the distribution where our data comes from, the central limit theorem says that we can treat the sampling distribution as if it were normal. Of course, in order for the conclusions of the theorem to hold, we do need a sample size that is large enough. Exploratory data analysis can help us to determine how large of a sample is necessary for a given situation