QBoard » Statistical modeling » Stats - Tech » How do I calculate percentiles with python/numpy?

How do I calculate percentiles with python/numpy?

  • Is there a convenient way to calculate percentiles for a sequence or single-dimensional numpy array?

    I am looking for something similar to Excel's percentile function.

    I looked in NumPy's statistics reference, and couldn't find this. All I could find is the median (50th percentile), but not something more specific.

      September 19, 2020 11:59 AM IST
    0
  • You might be interested in the SciPy Stats package. It has the percentile function you're after and many other statistical goodies.

    percentile() is available in numpy too.

    import numpy as np
    a = np.array([1,2,3,4,5])
    p = np.percentile(a, 50) # return 50th percentile, e.g median.
    print p
    3.0
      September 19, 2020 1:10 PM IST
    0
  • here it is..

    import numpy as np
    a = [154, 400, 1124, 82, 94, 108]
    print np.percentile(a,95) # gives the 95th percentile
      September 19, 2020 1:12 PM IST
    0
  • Here's how to do it without numpy, using only python to calculate the percentile.

    import math
    
    def percentile(data, percentile):
        size = len(data)
        return sorted(data)[int(math.ceil((size * percentile) / 100)) - 1]
    
    p5 = percentile(mylist, 5)
    p25 = percentile(mylist, 25)
    p50 = percentile(mylist, 50)
    p75 = percentile(mylist, 75)
    p95 = percentile(mylist, 95)
     
      September 19, 2020 1:47 PM IST
    0
  • To calculate the percentile of a series, run:

    from scipy.stats import rankdata
    import numpy as np
    
    def calc_percentile(a, method='min'):
        if isinstance(a, list):
            a = np.asarray(a)
        return rankdata(a, method=method) / float(len(a))

    For example:

    a = range(20)
    print {val: round(percentile, 3) for val, percentile in zip(a, calc_percentile(a))}
    >>> {0: 0.05, 1: 0.1, 2: 0.15, 3: 0.2, 4: 0.25, 5: 0.3, 6: 0.35, 7: 0.4, 8: 0.45, 9: 0.5, 10: 0.55, 11: 0.6, 12: 0.65, 13: 0.7, 14: 0.75, 15: 0.8, 16: 0.85, 17: 0.9, 18: 0.95, 19: 1.0}
      September 19, 2020 1:49 PM IST
    0
  • By the way, there is a pure-Python implementation of percentile function, in case one doesn't want to depend on scipy. The function is copied below:

    ## {{{ http://code.activestate.com/recipes/511478/ (r1)
    import math
    import functools
    
    def percentile(N, percent, key=lambda x:x):
        """
        Find the percentile of a list of values.
    
        @parameter N - is a list of values. Note N MUST BE already sorted.
        @parameter percent - a float value from 0.0 to 1.0.
        @parameter key - optional key function to compute value from each element of N.
    
        @return - the percentile of the values
        """
        if not N:
            return None
        k = (len(N)-1) * percent
        f = math.floor(k)
        c = math.ceil(k)
        if f == c:
            return key(N[int(k)])
        d0 = key(N[int(f)]) * (c-k)
        d1 = key(N[int(c)]) * (k-f)
        return d0+d1
    
    # median is 50th percentile.
    median = functools.partial(percentile, percent=0.5)
    ## end of http://code.activestate.com/recipes/511478/ }}}
      September 19, 2020 2:42 PM IST
    0