I am totally new to Machine Learning and I have been working with unsupervised learning technique.
Image shows my sample Data(After all Cleaning) Screenshot : Sample Data
I have... moreI am totally new to Machine Learning and I have been working with unsupervised learning technique.
Image shows my sample Data(After all Cleaning) Screenshot : Sample Data
I have this two Pipline built to Clean the Data:
num_attribs = list(housing_num)
cat_attribs =
print(type(num_attribs))
num_pipeline = Pipeline()
cat_pipeline = Pipeline()
Then I did the union of this two pipelines and the code for the same is shown below :
from sklearn.pipeline import FeatureUnion
full_pipeline = FeatureUnion(transformer_list=)
Now I am trying to do fit_transform on the Data But Its showing Me the Error.
Code for Transformation:
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
Error message:
fit_transform() takes 2 positional arguments but 3 were given
Having issue filtering my result dataframe with an or condition. I want my result df to extract all column var values that are above 0.25 and below -0.25.This logic below gives me... moreHaving issue filtering my result dataframe with an or condition. I want my result df to extract all column var values that are above 0.25 and below -0.25.This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. What is happening here? not sure where to use the suggested a.empty(), a.bool(), a.item(),a.any() or a.all().
result = result[(result > 0.25) or (result
I'm going through the ML Class on Coursera on Logistic Regression and also the Manning Book Machine Learning in Action. I'm trying to learn by implementing everything in... moreI'm going through the ML Class on Coursera on Logistic Regression and also the Manning Book Machine Learning in Action. I'm trying to learn by implementing everything in Python.I'm not able to understand the difference between the cost function and the gradient. There are examples on the net where people compute the cost function and then there are places where they don't and just go with the gradient descent function w :=w - (alpha) * (delta)w * f(w).What is the difference between the two if any?
This is my problem: Cousera course on Apllied Data Science in Python I am doing Assigment 2.
Question 1 Which country has won the most gold medals in summer games? This function... moreThis is my problem: Cousera course on Apllied Data Science in Python I am doing Assigment 2.
Question 1 Which country has won the most gold medals in summer games? This function should return a single string value.
This my code:
def answer_one():
return df[df == df.index(0)
answer_one()
This is the error which I am getting:
NameError: name 'df' is not defined
I'm working on machine learning problem and want to use linear regression as learning algorithm. I have implemented 2 different methods to find parameters theta of linear... moreI'm working on machine learning problem and want to use linear regression as learning algorithm. I have implemented 2 different methods to find parameters theta of linear regression model: Gradient (steepest) descent and Normal equation. On the same data they should both give approximately equal theta vector. However they do not.
Both theta vectors are very similar on all elements but the first one. That is the one used to multiply vector of all 1 added to the data.
Here is how the thetas look like (fist column is output of Gradient descent, second output of Normal equation):
Grad desc Norm eq -237.7752 -4.6736 -5.8471 -5.8467 9.9174 9.9178 2.1135 2.1134 -1.5001 -1.5003 -37.8558 -37.8505 -1.1024 -1.1116 -19.2969 -19.2956 66.6423 66.6447 297.3666 296.7604 -741.9281 -744.1541 296.4649 296.3494 146.0304 144.4158 -2.9978 -2.9976 -0.8190 -0.8189
What can cause the difference in theta(1, 1) returned by gradient descent compared to theta(1, 1) returned by normal equation? Do I have bug in my... less
I'm trying to make some plots in python 3 for a data science project, and I'm having an issue where there is no color behind the text on my axes when I save it. Here's my code... moreI'm trying to make some plots in python 3 for a data science project, and I'm having an issue where there is no color behind the text on my axes when I save it. Here's my code with an example plot:
plt.plot(play_num_2019, home_prob_2019, color = getColor(home_teams_2019))
plt.plot(play_num_2019, away_prob_2019, color = getColor(away_teams_2019))
plt.xlabel("Play Number")
plt.ylabel("Win Probability")
plt.legend([home_teams_2019, away_teams_2019)
fig = plt.figure()
fig.patch.set_facecolor('xkcd:white')
I have five text files that I input to a CountVectorizer. When specifying min_df and max_df to the CountVectorizer instance what does the min/max document frequency exactly means?... moreI have five text files that I input to a CountVectorizer. When specifying min_df and max_df to the CountVectorizer instance what does the min/max document frequency exactly means? Is it the frequency of a word in its particular text file or is it the frequency of the word in the entire overall corpus (5 txt files)?
How is it different when min_df and max_df are provided as integers or as floats?
The documentation doesn't seem to provide a thorough explanation nor does it supply an example to demonstrate the use of min_df and/or max_df. Could someone provide an explanation or example demonstrating min_df or max_df. less
I cannot find any resources about wether one of the following three methods for getting a list of column names is preferred over the others. The first and simplest, seems to work... moreI cannot find any resources about wether one of the following three methods for getting a list of column names is preferred over the others. The first and simplest, seems to work with my current example. Is there any reason I should not use it ?
>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame(np.random.rand(5,3)) >>> df.columns RangeIndex(start=0, stop=3, step=1) >>> list(df.columns) >>> df.columns.get_values().tolist() >>> list(df.columns.get_values())
Update
Performance - related answer here: https://stackoverflow.com/a/27236748/605328 less
what is the benefit of using Gradient Descent in the linear regression space? looks like the we can solve the problem (finding theta0-n that minimum the cost func) with analytical... morewhat is the benefit of using Gradient Descent in the linear regression space? looks like the we can solve the problem (finding theta0-n that minimum the cost func) with analytical method so why we still want to use gradient descent to do the same thing?
I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be... moreI want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.
For example, if I'm given a DataFrame like this:
>>> my_dataframe
y gdp cap
0 1 2 5
1 2 3 9
2 8 7 2
3 3 4 7
4 6 7 7
5 4 8 3
6 8 2 8
7 9 9 10
8 6 6 4
9 10 10 7
I would get a list like this:
>>> header_list
I have been driving myself crazy trying to install xgboost in python on windows 10. I have looked through several suggested articles but still can't seem to find a proper... moreI have been driving myself crazy trying to install xgboost in python on windows 10. I have looked through several suggested articles but still can't seem to find a proper solution. If anyone has done this before kindly share your method other suggestions are also welcome.
I am developing a application in python which gives job recommendation based on the resume uploaded. I am trying to tokenize resume before processing further. I want to tokenize... moreI am developing a application in python which gives job recommendation based on the resume uploaded. I am trying to tokenize resume before processing further. I want to tokenize group of words. For example Data Science is a keyword when i tokenize i will get data and science separately. How to overcome this situation. Is there any library which does these extraction in python?
I have a machine learning classification problem with 80% categorical variables. Must I use one hot encoding if I want to use some classifier for the classification? Can i pass... moreI have a machine learning classification problem with 80% categorical variables. Must I use one hot encoding if I want to use some classifier for the classification? Can i pass the data to a classifier without the encoding?
I am trying to do the following for feature selection:
I read the train file:
num_rows_to_read = 10000
train_small = pd.read_csv("../../dataset/train.csv", nrows=num_rows_to_read)
I change the type of the categorical features to 'category':
non_categorial_features =
for categorical_feature in list(train_small.columns):
if categorical_feature not in non_categorial_features:
train_small = train_small.astype('category')
I use one hot encoding
train_small_with_dummies = pd.get_dummies(train_small, sparse=True)
The problem is that the 3'rd part often get stuck, although I am using a strong machine.
Thus, without the one hot encoding I can't do any feature selection, for determining the importance of the features.
What do you recommend?
I'm trying to predict age from a given picture. I built the model below but the problem is that I'm getting very large loss value with low accuracy while fitting the model.I... moreI'm trying to predict age from a given picture. I built the model below but the problem is that I'm getting very large loss value with low accuracy while fitting the model.I think the problem is choosing the wrong loss function (here mean_squared_error). What can be the problem here?import tensorflow as tffrom tensorflow import kerasX = X.reshape(-1, image_size, image_size, 1)model = keras.models.Sequential()model.add(keras.layers.Conv2D(32, (5, 5), activation='relu', input_shape=X.shape))model.add(keras.layers.MaxPooling2D((2, 2)))model.add(keras.layers.Conv2D(32, (3, 3), activation='relu'))model.add(keras.layers.MaxPooling2D(2, 2))model.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))model.add(keras.layers.Flatten())model.add(keras.layers.Dense(60, activation='relu'))model.add(keras.layers.Dropout(0.4))model.add(keras.layers.Dense(1, activation='softmax'))model.compile(optimizer='adam', loss=keras.losses.mean_squared_error, metrics=)model.fit(X, Y, epochs=170, shuffle=True,... less
When I trained my neural network with Theano or Tensorflow, they will report a variable called "loss" per epoch.
How should I interpret this variable? Higher loss is better or... moreWhen I trained my neural network with Theano or Tensorflow, they will report a variable called "loss" per epoch.
How should I interpret this variable? Higher loss is better or worse, or what does it mean for the final performance (accuracy) of my neural network?