What exactly is the difference between Apache's Mesos and Google's Kubernetes? I understand both are server cluster management software. Can anyone elaborate where the main... moreWhat exactly is the difference between Apache's Mesos and Google's Kubernetes? I understand both are server cluster management software. Can anyone elaborate where the main differences are - when would which framework be preferred?
Why would you want to use Kubernetes on top of Mesosphere?
According to Learning SparkKeep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of repartition() called coalesce() that... moreAccording to Learning SparkKeep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of repartition() called coalesce() that allows avoiding data movement, but only if you are decreasing the number of RDD partitions.One difference I get is that with repartition() the number of partitions can be increased/decreased, but with coalesce() the number of partitions can only be decreased.If the partitions are spread across multiple machines and coalesce() is run, how can it avoid data movement? less
Does tensorflow have something similar to scikit learn's one hot encoder for processing categorical data? Would using a placeholder of tf.string behave as categorical data?
I... moreDoes tensorflow have something similar to scikit learn's one hot encoder for processing categorical data? Would using a placeholder of tf.string behave as categorical data?
I realize I can manually pre-process the data before sending it to tensorflow, but having it built in is very convenient.
I need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type:
comb =... moreI need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type:
comb = calculate_combinations(n, r)
I need the number of possible combinations, not the actual combinations, so itertools.combinations does not interest me.Finally, I want to avoid using factorials, as the numbers I'll be calculating the combinations for can get too big and the factorials are going to be monstrous.This seems like a REALLY easy to answer question, however I am being drowned in questions about generating all the actual combinations, which is not what I want. less
I've got some multivariate data of beauty vs ages. The ages range from 20-40 at intervals of 2 (20, 22, 24....40), and for each record of data, they are given an age and a beauty... moreI've got some multivariate data of beauty vs ages. The ages range from 20-40 at intervals of 2 (20, 22, 24....40), and for each record of data, they are given an age and a beauty rating from 1-5. When I do boxplots of this data (ages across the X-axis, beauty ratings across the Y-axis), there are some outliers plotted outside the whiskers of each box.
I want to remove these outliers from the data frame itself, but I'm not sure how R calculates outliers for its box plots. Below is an example of what my data might look like. less
I need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days.
I tried putting them in a... moreI need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days.
I tried putting them in a table and doing it in batches of 100. 4 days later, this is still running with only 297268 rows deleted. (I had to select 100 id's from an ID table, delete where IN that list, delete from ids table the 100 I selected).
I tried:
DELETE FROM tbl WHERE id IN (select * from ids)
That's taking forever, too. Hard to gauge how long, since I can't see it's progress till done, but the query was still running after 2 days.
Just kind of looking for the most effective way to delete from a table when I know the specific ID's to delete, and there are millions of IDs. less
I am trying to display a tree graph of my class hierarchy using networkx. I have it all graphed correctly, and it displays fine. But as a circular graph with crossing edges, it... moreI am trying to display a tree graph of my class hierarchy using networkx. I have it all graphed correctly, and it displays fine. But as a circular graph with crossing edges, it is a pure hierarchy, and it seems I ought to be able to display it as a tree.
I have googled this extensively, and every solution offered involves using pygraphviz... but" PyGraphviz does not work with Python 3 (documentation from the pygraphviz site)."
Has anyone been able to get a tree graph display in Python 3?
I was looking for alternative ways to save a trained model in PyTorch. So far, I have found two alternatives.
torch.save() to save a model and torch.load() to load a... moreI was looking for alternative ways to save a trained model in PyTorch. So far, I have found two alternatives.
torch.save() to save a model and torch.load() to load a model.
model.state_dict() to save a trained model and model.load_state_dict() to load the saved model.
I have come across to this discussion where approach 2 is recommended over approach 1.
My question is, why the second approach is preferred? Is it only because torch.nn modules have those two function and we are encouraged to use them?
How can I plot the empirical CDF of an array of numbers in matplotlib in Python? I'm looking for the cdf analog of pylab's "hist" function.
One thing I can think of is:
from... moreHow can I plot the empirical CDF of an array of numbers in matplotlib in Python? I'm looking for the cdf analog of pylab's "hist" function.
One thing I can think of is:
from scipy.stats import cumfreq
a = array() # my array of numbers
num_bins = 20
b = cumfreq(a, num_bins)
plt.plot(b)
Is that correct though? Is there an easier/better way?
thanks.
I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against... moreI can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).For example, with this data:
print 'y x1 x2 x3 x4 x5 x6 x7'
for t in texts:
print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
.format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)
(output for above:)
y x1 x2 x3 x4 x5 x6 x7
-6.0 -4.95 -5.87 -0.76 14.73 4.02 0.20 0.45
-5.0 -4.55 -4.52 -0.71 13.74 4.47 0.16 0.50
-10.0 -10.96 -11.64 -0.98 15.49 4.18 0.19 0.53
-5.0 -1.08 -3.36 0.75 24.72 4.96 0.16 0.60
-8.0 -6.52 -7.45 -0.86 16.59 4.29 0.10 0.48
-3.0 -0.81 -2.36 -0.50 22.44 4.81 0.15 0.53
-6.0 -7.01 -7.33 -0.33 13.93... less
I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against... moreI can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).For example, with this data:
print 'y x1 x2 x3 x4 x5 x6 x7'
for t in texts:
print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
.format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)
(output for above:)
y x1 x2 x3 x4 x5 x6 x7
-6.0 -4.95 -5.87 -0.76 14.73 4.02 0.20 0.45
-5.0 -4.55 -4.52 -0.71 13.74 4.47 0.16 0.50
-10.0 -10.96 -11.64 -0.98 15.49 4.18 0.19 0.53
-5.0 -1.08 -3.36 0.75 24.72 4.96 0.16 0.60
-8.0 -6.52 -7.45 -0.86 16.59 4.29 0.10 0.48
-3.0 -0.81 -2.36 -0.50 22.44 4.81 0.15 0.53
-6.0 -7.01 -7.33 -0.33 13.93... less
What are the benefits of using either Hadoop or HBase or Hive ?
From my understanding, HBase avoids using map-reduce and has a column oriented storage on top of... moreWhat are the benefits of using either Hadoop or HBase or Hive ?
From my understanding, HBase avoids using map-reduce and has a column oriented storage on top of HDFS. Hive is a sql-like interface for Hadoop and HBase.
I would also like to know how Hive compares with Pig.
When I run sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) I get InternalError: Blas SGEMM launch failed. Here is the full error and stack... moreWhen I run sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) I get InternalError: Blas SGEMM launch failed. Here is the full error and stack trace:
InternalErrorTraceback (most recent call last)
<ipython-input-9-a3261a02bdce> in <module>()
1 batch_xs, batch_ys = mnist.train.next_batch(100)
----> 2 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
338 try:
339 result = self._run(None, fetches, feed_dict, options_ptr,
--> 340 run_metadata_ptr)
341 if run_metadata:
342 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
562 try:
563 results = self._do_run(handle, target_list, unique_fetches,
--> 564 ... less
ANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for supervised machine learning and classification. It's not often clear which method... moreANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for supervised machine learning and classification. It's not often clear which method is better for a particular project, and I'm certain the answer is always "it depends." Often, a combination of both along with Bayesian classification is used.
These questions on Stackoverflow have already been asked regarding ANN vs SVM:
ANN and SVM classification
what the difference among ANN, SVM and KNN in my classification question
Support Vector Machine or Artificial Neural Network for text processing?
In this question, I'd like to know specifically what aspects of an ANN (specifically, a Multilayer Perceptron) might make it desirable to use over an SVM? The reason I ask is because it's easy to answer the opposite question: Support Vector Machines are often superior to ANNs because they avoid two major weaknesses of ANNs:
(1) ANNs often converge on local minima rather than global minima, meaning that they are... less
I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.
tf.clip_by_value(t, clip_value_min,... moreI would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.
tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)
This is an example that could be used but where do I introduce this ? In the def of RNN
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)
But this doesn't make sense as the tensor _X is the input and not the grad what is to be clipped?
Do I have to define my own Optimizer for this or is there a simpler option? less
I am confused about the method view() in the following code snippet.
class Net(nn.Module):
def __init__(self):
super(Net,... moreI am confused about the method view() in the following code snippet.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2,2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16*5*5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
My confusion is regarding the following line.
x = x.view(-1, 16*5*5)
What does tensor.view() function do? I have seen its usage in many places, but I can't understand how it interprets its parameters.What happens if I give negative values as parameters to the view() function? For example, what happens if I call, tensor_variable.view(1, 1, -1)?Can... less
I'm trying to use scikit-learn's LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder... moreI'm trying to use scikit-learn's LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder object for each column; I'd rather just have one big LabelEncoder objects that works across all my columns of data.Throwing the entire DataFrame into LabelEncoder creates the below error. Please bear in mind that I'm using dummy data here; in actuality I'm dealing with about 50 columns of string labeled data, so need a solution that doesn't reference any columns by name.
import pandas
from sklearn import preprocessing
Traceback (most recent call last): File "", line 1, in File "/Users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/label.py", line 103, in fit y = column_or_1d(y, warn=True) File "/Users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 306, in... less
I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this function
def normalize(v):
norm = np.linalg.norm(v)
... moreI would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this function
def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
Is there something like that in sklearn or numpy?This function works in a situation where v is the 0 vector.
I don't understand which accuracy in the output to use to compare my 2 Keras models to see which one is better.
Do I use the "acc" (from the training data?) one or the "val acc"... moreI don't understand which accuracy in the output to use to compare my 2 Keras models to see which one is better.
Do I use the "acc" (from the training data?) one or the "val acc" (from the validation data?) one?
There are different accs and val accs for each epoch. How do I know the acc or val acc for my model as a whole? Do I average all of the epochs accs or val accs to find the acc or val acc of the model as a whole?Model 1 Output
Train on 970 samples, validate on 243 samples
Epoch 1/20
0s - loss: 0.1708 - acc: 0.7990 - val_loss: 0.2143 - val_acc: 0.7325
Epoch 2/20
0s - loss: 0.1633 - acc: 0.8021 - val_loss: 0.2295 - val_acc: 0.7325
Epoch 3/20
0s - loss: 0.1657 - acc: 0.7938 - val_loss: 0.2243 - val_acc: 0.7737
Epoch 4/20
0s - loss: 0.1847 - acc: 0.7969 - val_loss: 0.2253 - val_acc: 0.7490
Epoch 5/20
0s - loss: 0.1771 - acc: 0.8062 - val_loss: 0.2402 - val_acc: 0.7407
Epoch 6/20
0s - loss: 0.1789 - acc: 0.8021 - val_loss: 0.2431 - val_acc: 0.7407
Epoch 7/20
0s - loss: 0.1789 - acc: 0.8031 -... less
Suppose I have a Tensorflow tensor. How do I get the dimensions (shape) of the tensor as integer values? I know there are two methods, tensor.get_shape() and tf.shape(tensor),... moreSuppose I have a Tensorflow tensor. How do I get the dimensions (shape) of the tensor as integer values? I know there are two methods, tensor.get_shape() and tf.shape(tensor), but I can't get the shape values as integer int32 values.
For example, below I've created a 2-D tensor, and I need to get the number of rows and columns as int32 so that I can call reshape() to create a tensor of shape (num_rows * num_cols, 1). However, the method tensor.get_shape() returns values as Dimension type, not int32.
import tensorflow as tf
import numpy as np
I'd like to take data of the form
before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2'))
attr type
1 1 ... moreI'd like to take data of the form
before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2'))
attr type
1 1 foo_and_bar
2 30 foo_and_bar_2
3 4 foo_and_bar
4 6 foo_and_bar_2
and use split() on the column "type" from above to get something like this:
attr type_1 type_2
1 1 foo bar
2 30 foo bar_2
3 4 foo bar
4 6 foo bar_2
I came up with something unbelievably complex involving some form of apply that worked, but I've since misplaced that. It seemed far too complicated to be the best way. I can use strsplit as below, but then unclear how to get that back into 2 columns in the data frame.
> strsplit(as.character(before$type),'_and_')
"foo" "bar"
"foo" "bar_2"
"foo" "bar"
"foo" "bar_2"
Thanks for any pointers. I've not quite groked R lists just yet. less
I was wondering if it was possible to save a partly trained Keras model and continue the training after loading the model again.
The reason for this is that I will have more... moreI was wondering if it was possible to save a partly trained Keras model and continue the training after loading the model again.
The reason for this is that I will have more training data in the future and I do not want to retrain the whole model again.
The functions which I am using are:
#Partly train model
model.fit(first_training, first_classes, batch_size=32, nb_epoch=20)
#Save partly trained model
model.save('partly_trained.h5')
#Load partly trained model
from keras.models import load_model
model = load_model('partly_trained.h5')
#Continue training
model.fit(second_training, second_classes, batch_size=32, nb_epoch=20)
Edit 1: added fully working example
With the first dataset after 10 epochs the loss of the last epoch will be 0.0748 and the accuracy 0.9863.
After saving, deleting and reloading the model the loss and accuracy of the model trained on the second dataset will be 0.1711 and 0.9504 respectively.
Is this caused by the new training data or by a completely re-trained model?
"""
Model... less