I have completed my graduate public policy program but it was not at all tech heavy - some economics and econometrics but not requiring any CS knowledge. A good portion of the... moreI have completed my graduate public policy program but it was not at all tech heavy - some economics and econometrics but not requiring any CS knowledge. A good portion of the research jobs in DC require a basic level of programming knowledge. Mostly they want people who can perform advanced search and retrieval functions with large datasets and save stuff in different formats within their servers. And, they want STATA/stats knowledge, which I have some of.
My question is this: where is the best place to start learning some programming to get to this level? For instance, is Java, SQL, VBA or something else the best thing and most useful for these purposes? And, how much math do I need to write and run simple requests?
Thanks less
I need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type:
comb =... moreI need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type:
comb = calculate_combinations(n, r)
I need the number of possible combinations, not the actual combinations, so itertools.combinations does not interest me.Finally, I want to avoid using factorials, as the numbers I'll be calculating the combinations for can get too big and the factorials are going to be monstrous.This seems like a REALLY easy to answer question, however I am being drowned in questions about generating all the actual combinations, which is not what I want. less
How would you go about coding an interactive website to display stats/graphs. Say I wanted to create something interactive for people to look at Stackoverflow stats - something... moreHow would you go about coding an interactive website to display stats/graphs. Say I wanted to create something interactive for people to look at Stackoverflow stats - something that looks like awstats / google analytics but allows you to drill down to stats/graphs like:
All questions: total, by hour of day, by day of week (interesting timezone challenge there, or just stick to UTC).
Tags (e.g C# questions, app-engine questions): totals, by hour of day, by day of week
Select a user: totals, by hour of day, by day of week
Extra cool: the ability to add x number of users / tag, date ranges.
Is the answer "code it yourself"? I guess I could pre-crunch a lot of data and find a library to create the graphs for me.
Or is there a library/package suited to this sort of thing? I've spent some time looking at datamining applications (Splunk, SQL Server Analysis Services). But these look like interactive applications to build up queries, not something to create interactive output.
I'm not attached to any... less
My PHP SQL Statement is failing due to pound (#) sign. How can I get around this. (Other than fixing the database name?)
$sql = "SELECT CMCD, TK#, TECH, STATS from LIB.TICKET... moreMy PHP SQL Statement is failing due to pound (#) sign. How can I get around this. (Other than fixing the database name?)
$sql = "SELECT CMCD, TK#, TECH, STATS from LIB.TICKET FETCH FIRST 10 ROWS ONLY ";
$rs = odbc_exec($conn,$sql);
I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against... moreI can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).For example, with this data:
print 'y x1 x2 x3 x4 x5 x6 x7'
for t in texts:
print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
.format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)
(output for above:)
y x1 x2 x3 x4 x5 x6 x7
-6.0 -4.95 -5.87 -0.76 14.73 4.02 0.20 0.45
-5.0 -4.55 -4.52 -0.71 13.74 4.47 0.16 0.50
-10.0 -10.96 -11.64 -0.98 15.49 4.18 0.19 0.53
-5.0 -1.08 -3.36 0.75 24.72 4.96 0.16 0.60
-8.0 -6.52 -7.45 -0.86 16.59 4.29 0.10 0.48
-3.0 -0.81 -2.36 -0.50 22.44 4.81 0.15 0.53
-6.0 -7.01 -7.33 -0.33 13.93... less
Let's assume I have some data I obtained empirically:
from scipy import stats
size = 10000
x = 10 * stats.expon.rvs(size=size) + 0.2 *... moreLet's assume I have some data I obtained empirically:
from scipy import stats
size = 10000
x = 10 * stats.expon.rvs(size=size) + 0.2 * np.random.uniform(size=size)
It is exponentially distributed (with some noise) and I want to verify this using a chi-squared goodness of fit (GoF) test. What is the simplest way of doing this using the standard scientific libraries in Python (e.g. scipy or statsmodels) with the least amount of manual steps and assumptions?I can fit a model with:
param = stats.expon.fit(x)
plt.hist(x, normed=True, color='white', hatch='/')
plt.plot(grid, distr.pdf(np.linspace(0, 100, 10000), *param))
It is very elegant to calculate the Kolmogorov-Smirnov test.
>>> stats.kstest(x, lambda x : stats.expon.cdf(x, *param))
(0.0061000000000000004, 0.85077099515985011)
However, I can't find a good way of calculating the chi-squared test.There is a chi-squared GoF function in statsmodel, but it assumes a discrete distribution (and the exponential distribution is... less
I have applied simple forecasting models such as Naive Forecast, Moving Average, Simple Exponential Smoothing, Holts Linear Trend Model on 2018 sales data of a salesperson.
All... moreI have applied simple forecasting models such as Naive Forecast, Moving Average, Simple Exponential Smoothing, Holts Linear Trend Model on 2018 sales data of a salesperson.
All the model resulted in flatten or prediction line flattens at zero. Could be it be an issue with data? as most of the data is flatten at zero.
model = ARIMA(train_log, order=(0, 1, 2))
output = model.fit(disp=-1)
#convert fitted values in to series
output_series=pd.Series(output.fittedvalues, copy=True)
print(output_series.head())
#Calc Cumm sum
output_series_cumsum= output_series.cumsum()
print(output_series.head())
#convert to predicted ARIMA vlaues to original format
convert_output = np.exp(output_tr_log)
plt.title('RMSE: %.4f'% (np.sqrt(np.dot(convert_output, train_log))/len(train_log))
Date Sales
---- -----
2018-01-27 1
2018-01-30 ... less
I'm trying to do a little bit of distribution plotting and fitting in Python using SciPy for stats and matplotlib for the plotting. I'm having good luck with some things like... moreI'm trying to do a little bit of distribution plotting and fitting in Python using SciPy for stats and matplotlib for the plotting. I'm having good luck with some things like creating a histogram:
seed(2)
alpha=5
loc=100
beta=22
data=ss.gamma.rvs(alpha,loc=loc,scale=beta,size=5000)
myHist = hist(data, 100, normed=True)
Brilliant!
I can even take the same gamma parameters and plot the line function of the probability distribution function (after some googling):
rv = ss.gamma(5,100,22)
x = np.linspace(0,600)
h = plt.plot(x, rv.pdf(x))
How would I go about plotting the histogram myHist with the PDF line h superimposed on top of the histogram? I'm hoping this is trivial, but I have been unable to figure it out. less
I want to produce 100 random numbers with normal distribution (with µ=10, σ=7) and then draw a quantity diagram for these numbers.How can I produce random numbers with a... moreI want to produce 100 random numbers with normal distribution (with µ=10, σ=7) and then draw a quantity diagram for these numbers.How can I produce random numbers with a specific distribution in Excel 2010?One more question:When I produce, for example, 20 random numbers with RANDBETWEEN(Bottom,Top), the numbers change every time the sheet recalculates. How can I keep this from happening?
Periodically I program sloppily. Ok, I program sloppily all the time, but sometimes that catches up with me in the form of out of memory errors. I start exercising a little... morePeriodically I program sloppily. Ok, I program sloppily all the time, but sometimes that catches up with me in the form of out of memory errors. I start exercising a little discipline in deleting objects with the rm() command and things get better. I see mixed messages online about whether I should explicitly call gc() after deleting large data objects. Some say that before R returns a memory error it will run gc() while others say that manually forcing gc is a good idea.Should I run gc() after deleting large objects in order to ensure maximum memory availability? less