I'm working on an application that requires a great deal of stastical processing and output as images in a .net desktop application. The problems, including generating the output... moreI'm working on an application that requires a great deal of stastical processing and output as images in a .net desktop application. The problems, including generating the output images, seem like a natural fit for R http://www.r-project.org/
Is there a wrapper, API, SDK, or port that will allow me to call R from .net?
How would you go about coding an interactive website to display stats/graphs. Say I wanted to create something interactive for people to look at Stackoverflow stats - something... moreHow would you go about coding an interactive website to display stats/graphs. Say I wanted to create something interactive for people to look at Stackoverflow stats - something that looks like awstats / google analytics but allows you to drill down to stats/graphs like:
All questions: total, by hour of day, by day of week (interesting timezone challenge there, or just stick to UTC).
Tags (e.g C# questions, app-engine questions): totals, by hour of day, by day of week
Select a user: totals, by hour of day, by day of week
Extra cool: the ability to add x number of users / tag, date ranges.
Is the answer "code it yourself"? I guess I could pre-crunch a lot of data and find a library to create the graphs for me.
Or is there a library/package suited to this sort of thing? I've spent some time looking at datamining applications (Splunk, SQL Server Analysis Services). But these look like interactive applications to build up queries, not something to create interactive output.
I'm not attached to any... less
I need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type:
comb =... moreI need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type:
comb = calculate_combinations(n, r)
I need the number of possible combinations, not the actual combinations, so itertools.combinations does not interest me.Finally, I want to avoid using factorials, as the numbers I'll be calculating the combinations for can get too big and the factorials are going to be monstrous.This seems like a REALLY easy to answer question, however I am being drowned in questions about generating all the actual combinations, which is not what I want. less
I've got some multivariate data of beauty vs ages. The ages range from 20-40 at intervals of 2 (20, 22, 24....40), and for each record of data, they are given an age and a beauty... moreI've got some multivariate data of beauty vs ages. The ages range from 20-40 at intervals of 2 (20, 22, 24....40), and for each record of data, they are given an age and a beauty rating from 1-5. When I do boxplots of this data (ages across the X-axis, beauty ratings across the Y-axis), there are some outliers plotted outside the whiskers of each box.
I want to remove these outliers from the data frame itself, but I'm not sure how R calculates outliers for its box plots. Below is an example of what my data might look like. less
How can I plot the empirical CDF of an array of numbers in matplotlib in Python? I'm looking for the cdf analog of pylab's "hist" function.
One thing I can think of is:
from... moreHow can I plot the empirical CDF of an array of numbers in matplotlib in Python? I'm looking for the cdf analog of pylab's "hist" function.
One thing I can think of is:
from scipy.stats import cumfreq
a = array() # my array of numbers
num_bins = 20
b = cumfreq(a, num_bins)
plt.plot(b)
Is that correct though? Is there an easier/better way?
thanks.
I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against... moreI can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).For example, with this data:
print 'y x1 x2 x3 x4 x5 x6 x7'
for t in texts:
print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
.format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)
(output for above:)
y x1 x2 x3 x4 x5 x6 x7
-6.0 -4.95 -5.87 -0.76 14.73 4.02 0.20 0.45
-5.0 -4.55 -4.52 -0.71 13.74 4.47 0.16 0.50
-10.0 -10.96 -11.64 -0.98 15.49 4.18 0.19 0.53
-5.0 -1.08 -3.36 0.75 24.72 4.96 0.16 0.60
-8.0 -6.52 -7.45 -0.86 16.59 4.29 0.10 0.48
-3.0 -0.81 -2.36 -0.50 22.44 4.81 0.15 0.53
-6.0 -7.01 -7.33 -0.33 13.93... less
I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against... moreI can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).For example, with this data:
print 'y x1 x2 x3 x4 x5 x6 x7'
for t in texts:
print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
.format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)
(output for above:)
y x1 x2 x3 x4 x5 x6 x7
-6.0 -4.95 -5.87 -0.76 14.73 4.02 0.20 0.45
-5.0 -4.55 -4.52 -0.71 13.74 4.47 0.16 0.50
-10.0 -10.96 -11.64 -0.98 15.49 4.18 0.19 0.53
-5.0 -1.08 -3.36 0.75 24.72 4.96 0.16 0.60
-8.0 -6.52 -7.45 -0.86 16.59 4.29 0.10 0.48
-3.0 -0.81 -2.36 -0.50 22.44 4.81 0.15 0.53
-6.0 -7.01 -7.33 -0.33 13.93... less
I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this function
def normalize(v):
norm = np.linalg.norm(v)
... moreI would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this function
def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
Is there something like that in sklearn or numpy?This function works in a situation where v is the 0 vector.
I want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit... moreI want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit a regression for each state so that at the end I have a vector of lm responses. I can imagine doing for loop for each state then doing the regression inside the loop and adding the results of each regression to a vector. That does not seem very R-like, however. In SAS I would do a 'by' statement and in SQL I would do a 'group by'. What's the R way of doing this? less
I'm looking for some good tools/scripts that allow me to generate a few statistics from a git repository. I've seen this feature on some code hosting sites, and they contained... moreI'm looking for some good tools/scripts that allow me to generate a few statistics from a git repository. I've seen this feature on some code hosting sites, and they contained information like...1.commits per author2.commits per day/week/year/etc.3.lines of code over time4.graphs5.... much moreBasically I just want to get an idea how much my project grows over time, which developer commits most code, and so on.
I have a problem involving a collection of continuous probability distribution functions, most of which are determined empirically (e.g. departure times, transit times). What I... moreI have a problem involving a collection of continuous probability distribution functions, most of which are determined empirically (e.g. departure times, transit times). What I need is some way of taking two of these PDFs and doing arithmetic on them. E.g. if I have two values x taken from PDF X, and y taken from PDF Y, I need to get the PDF for (x+y), or any other operation f(x,y).
An analytical solution is not possible, so what I'm looking for is some representation of PDFs that allows such things. An obvious (but computationally expensive) solution is monte-carlo: generate lots of values of x and y, and then just measure f(x, y). But that takes too much CPU time.
I did think about representing the PDF as a list of ranges where each range has a roughly equal probability, effectively representing the PDF as the union of a list of uniform distributions. But I can't see how to combine them.
Does anyone have any good solutions to this problem?
Edit: The goal is to create a mini-language (aka Domain... less
It's been quite a while since I did any statistics so I am struggling with the definitions of a Poisson distribution. What I understand by the "rate is constant" is that if a... moreIt's been quite a while since I did any statistics so I am struggling with the definitions of a Poisson distribution. What I understand by the "rate is constant" is that if a customer purchases 1 thing on average in a week, they purchase 4 things on average in a four-week period. Is this correct?
Where I believe I am confused is with the final sentence. Is this saying that the time between a customers purchases would grow exponentially as time goes on? Doesn't this contradict the idea that we have a constant rate of purchase? less
I want to get started on HMM's, but don't know how to go about it. Can people here, give me some basic pointers, where to look?
More than just the theory, I like to do a lot of... moreI want to get started on HMM's, but don't know how to go about it. Can people here, give me some basic pointers, where to look?
More than just the theory, I like to do a lot of hands-on. So, would prefer resources, where I can write small code snippets to check my learning, rather than just dry text.