QBoard » Statistical modeling » Stats - Conceptual » How to fit a dataset with a sum of two distributions?

How to fit a dataset with a sum of two distributions?

  • Hello!
    I want to fit a dataset with a sum of two distribution: Gaussin + Poisson.
    The dataset can have up to 3000 numbers, this should be enough for reasonable fitting. Is there any convenient way to do it without programming? For example, with Origin software? Or RStudio?
      June 12, 2019 12:04 PM IST
    0
  • RStudio is only an IDE for R, and R is a programming language. It is surely possible in R, but that includes "programming".
    Let the numeric vector x contain your data.
    define the density, possibly as a weighted mix of a gaussian and a poisson:
    dens <- function(x, mu, sigma, rate, p) p*dnorm(x, mu, sigma) + (1-p)*dpois(x,rate)
    The parameters can be found by maximizing the likelihood of the data. A convenient function doing this job ist the function fitDist from the MASS package, so you get the parameter values with
    MASS::fitDist(dens, start=c(mu=a, sigma=b, rate=c, p=d))
    where a,b,c,d are sensible starting values for the parameters. Note that b>0, c>0 and 0<1
    There could be problems with the fit because of the restrictions of some of the parameters. This can be avoided by using transformations that are not restricted, like the logarithms of sigma and rate and the log odds ratio of p.
      June 12, 2019 12:06 PM IST
    0
  • I had assumed that this case was similar to rolling dice. The value from the first die is distributed Uniform. The value for all other die follow the Uniform distribution (ideally, and no cheating). So I now take the sum of several dice. If I look at the distribution of the sum of enough dice I will get the Gaussian distribution.
    I could do the same thing with random number generators. To get my first value I will take a number from a Gaussian distribution and a value from a Poisson distribution and add them together. I can do this over and over and plot a histogram of the results. At about 1,000,000 values one gets a clear picture of what the distribution looks like.
    I agree that the result will be neither Gaussian nor Poisson.
      September 6, 2021 1:27 PM IST
    0
  • I'm not sure if original problem is correctly stated.  Gaussian distribution is continuous while Poisson's is discrete.  To put them on equal footing one has to discretize original data, say by producing a histogram.  Not only gaussian is no longer gaussian after such transformation but another question arises: what is the correct number of bins in resulting histogram?
      June 14, 2019 12:52 PM IST
    0