I'm trying to use scikit-learn's LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder... moreI'm trying to use scikit-learn's LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder object for each column; I'd rather just have one big LabelEncoder objects that works across all my columns of data.Throwing the entire DataFrame into LabelEncoder creates the below error. Please bear in mind that I'm using dummy data here; in actuality I'm dealing with about 50 columns of string labeled data, so need a solution that doesn't reference any columns by name.
import pandas
from sklearn import preprocessing
Traceback (most recent call last): File "", line 1, in File "/Users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/label.py", line 103, in fit y = column_or_1d(y, warn=True) File "/Users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 306, in... less
I don't understand which accuracy in the output to use to compare my 2 Keras models to see which one is better.
Do I use the "acc" (from the training data?) one or the "val acc"... moreI don't understand which accuracy in the output to use to compare my 2 Keras models to see which one is better.
Do I use the "acc" (from the training data?) one or the "val acc" (from the validation data?) one?
There are different accs and val accs for each epoch. How do I know the acc or val acc for my model as a whole? Do I average all of the epochs accs or val accs to find the acc or val acc of the model as a whole?Model 1 Output
Train on 970 samples, validate on 243 samples
Epoch 1/20
0s - loss: 0.1708 - acc: 0.7990 - val_loss: 0.2143 - val_acc: 0.7325
Epoch 2/20
0s - loss: 0.1633 - acc: 0.8021 - val_loss: 0.2295 - val_acc: 0.7325
Epoch 3/20
0s - loss: 0.1657 - acc: 0.7938 - val_loss: 0.2243 - val_acc: 0.7737
Epoch 4/20
0s - loss: 0.1847 - acc: 0.7969 - val_loss: 0.2253 - val_acc: 0.7490
Epoch 5/20
0s - loss: 0.1771 - acc: 0.8062 - val_loss: 0.2402 - val_acc: 0.7407
Epoch 6/20
0s - loss: 0.1789 - acc: 0.8021 - val_loss: 0.2431 - val_acc: 0.7407
Epoch 7/20
0s - loss: 0.1789 - acc: 0.8031 -... less
I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS.
I am trying to allocate memory for a numpy array with... moreI'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS.
I am trying to allocate memory for a numpy array with shape (156816, 36, 53806) with
np.zeros((156816, 36, 53806), dtype='uint8')
and while I'm getting an error on Ubuntu OS
>>> import numpy as np
>>> np.zeros((156816, 36, 53806), dtype='uint8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (156816, 36, 53806) and data type uint8
I'm not getting it on MacOS:
>>> import numpy as np
>>> np.zeros((156816, 36, 53806), dtype='uint8')
array(,
,
,
...,
,
,
,
I am training on 970 samples and validating on 243 samples.
How big should batch size and number of epochs be when fitting a model in Keras to optimize the val_acc? Is there any... moreI am training on 970 samples and validating on 243 samples.
How big should batch size and number of epochs be when fitting a model in Keras to optimize the val_acc? Is there any sort of rule of thumb to use based on data input size?
I am totally new to Machine Learning and I have been working with unsupervised learning technique.
Image shows my sample Data(After all Cleaning) Screenshot : Sample Data
I have... moreI am totally new to Machine Learning and I have been working with unsupervised learning technique.
Image shows my sample Data(After all Cleaning) Screenshot : Sample Data
I have this two Pipline built to Clean the Data:
num_attribs = list(housing_num)
cat_attribs =
print(type(num_attribs))
num_pipeline = Pipeline()
cat_pipeline = Pipeline()
Then I did the union of this two pipelines and the code for the same is shown below :
from sklearn.pipeline import FeatureUnion
full_pipeline = FeatureUnion(transformer_list=)
Now I am trying to do fit_transform on the Data But Its showing Me the Error.
Code for Transformation:
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
Error message:
fit_transform() takes 2 positional arguments but 3 were given
I don't know if this is a right place to ask this question, but a community dedicated to Data Science should be the most appropriate place in my opinion.
I have just started with... moreI don't know if this is a right place to ask this question, but a community dedicated to Data Science should be the most appropriate place in my opinion.
I have just started with Data Science and Machine learning. I am looking for long term project ideas which I can work on for like 8 months.
A mix of Data Science and Machine learning would be great.
A project big enough to help me understand the core concepts and also implement them at the same time would be very beneficial.
I'm trying to follow some of the best practices of the "open science" movement. In my thesis, I've performed all of the analyses in R (a non-proprietary, open-source program... more
I'm trying to follow some of the best practices of the "open science" movement. In my thesis, I've performed all of the analyses in R (a non-proprietary, open-source program for analyzing data), and my datasets are in the non-proprietary CSV format.
I would like to be as transparent as possible, by sharing my datasets and R analysis/code files with my thesis committee, and ultimately with the public once my thesis is finalized and placed in a repository. How can I best do this?
I was thinking about uploading my files to the Open Science Framework (http://osf.io) and citing them with a regular HTTPS link. Once my thesis is finalized, I would then "freeze" them on the OSF website (as I understand, this would prevent post-hoc changes), then get a DOI that points to the frozen files and cite that.
Are there any better options? less
I am currently working as a Data-Scientist. I am planning to appear for a few data-science interviews in distant future and I am aiming to have in-depth Statistical(at par with... moreI am currently working as a Data-Scientist. I am planning to appear for a few data-science interviews in distant future and I am aiming to have in-depth Statistical(at par with Statistics grads) and Machine-Learning knowledge. Can you guys suggest the best course of books/videos to prepare myself?