I'm having trouble with some of the concepts in machine learning through neural networks. One of them is backpropagation. In the weight updating equation,
delta_w = a*(t -... moreI'm having trouble with some of the concepts in machine learning through neural networks. One of them is backpropagation. In the weight updating equation,
delta_w = a*(t - y)*g'(h)*x
t is the "target output", which would be your class label, or something, in the case of supervised learning. But what would the "target output" be for unsupervised learning?Can someone kindly provide an example of how you'd use BP in unsupervised learning, specifically for clustering of classification?Thanks in advance.
From my understanding, Hbase is the Hadoop database and Hive is the data warehouse.
Hive allows to create tables and store data in it, you can also map your existing HBase tables to Hive and operate on them.
why we should use hbase if hive do all that? can we use hive by itself? I'm confused :(
I am trying to collect data from a .txt file and add it into a matrix in Matlab for plotting purposes, but there seems to be an error when collecting the data. It seems to be... moreI am trying to collect data from a .txt file and add it into a matrix in Matlab for plotting purposes, but there seems to be an error when collecting the data. It seems to be happening with the time record.
I am using the following code snippet.
Error using textscan
Unable to parse the format character vector at position 16 ==> %{HH:MM:SS}T %f %f %f %f %f %f %f %d %d %d %f %f %f %f %f %f %f %f %f %f
%f %f %f %f... less
I am learning RPA using tools such as UiPath and BluePrism. Can someone explain me what is surface automation(SA) techniques in RPA or direct me to the path where i can read more... moreI am learning RPA using tools such as UiPath and BluePrism. Can someone explain me what is surface automation(SA) techniques in RPA or direct me to the path where i can read more regarding SA techniques?
How does surface automation help in automating flash objects used in website ?
Thanks, vds1
This question pops up in my mind as in many developing countries at present internet connectivity is very poor or no connectivity at all and the customer base is very huge, in... moreThis question pops up in my mind as in many developing countries at present internet connectivity is very poor or no connectivity at all and the customer base is very huge, in this case how IoT can help in making life easy?
How is the convolution operation carried out when multiple channels are present at the input layer? (e.g. RGB)
After doing some reading on the architecture/implementation of a CNN... moreHow is the convolution operation carried out when multiple channels are present at the input layer? (e.g. RGB)
After doing some reading on the architecture/implementation of a CNN I understand that each neuron in a feature map references NxM pixels of an image as defined by the kernel size. Each pixel is then factored by the feature maps learned NxM weight set (the kernel/filter), summed, and input into an activation function. For a simple grey scale image, I imagine the operation would be something adhere to the following pseudo code:
for i in range(0, image_width-kernel_width+1):
for j in range(0, image_height-kernel_height+1):
for x in range(0, kernel_width):
for y in range(0, kernel_height):
sum += kernel * image
feature_map = act_func(sum)
sum = 0.0
However I don't understand how to extend this model to handle multiple channels. Are three separate weight sets required per feature map, shared between each colour?
Referencing this tutorial's... less
Is it possible to delete or insert a step in a sklearn.pipeline.Pipeline object?I am trying to do a grid search with or without one step in the Pipeline object. And wondering... moreIs it possible to delete or insert a step in a sklearn.pipeline.Pipeline object?I am trying to do a grid search with or without one step in the Pipeline object. And wondering whether I can insert or delete a step in the pipeline. I saw in the Pipeline source code, there is a self.steps object holding all the steps. We can get the steps by named_steps(). Before modifying it, I want to make sure, I do not cause unexpected effects.Here is a example code:
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators =
clf = Pipeline(estimators)
clf
Is it possible that we do something like steps = clf.named_steps(), then insert or delete in this list? Does this cause undesired effect on the clf object? less
What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?
In my opinion, 'VALID' means there will be no zero padding outside the edges when we... moreWhat is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?
In my opinion, 'VALID' means there will be no zero padding outside the edges when we do max pool.
According to A guide to convolution arithmetic for deep learning, it says that there will be no padding in pool operator, i.e. just use 'VALID' of tensorflow. But what is 'SAME' padding of max pool in tensorflow?
I know this is not a new concept by any stretch in R, and I have browsed the High Performance and Parallel Computing Task View. With that said, I am asking this question from a... moreI know this is not a new concept by any stretch in R, and I have browsed the High Performance and Parallel Computing Task View. With that said, I am asking this question from a point of ignorance as I have no formal training in Computer Science and am entirely self taught.
Recently I collected data from the Twitter Streaming API and currently the raw JSON sits in a 10 GB text file. I know there have been great strides in adapting R to handle big data, so how would you go about this problem? Here are just a handful of the tasks that I am looking to do:
Read and process the data into a data frame
Basic descriptive analysis, including text mining (frequent terms, etc.)
Plotting
Is it possible to use R entirely for this, or will I have to write some Python to parse the data and throw it into a database in order to take random samples small enough to fit into R.
Simply, any tips or pointers that you can provide will be greatly appreciated. Again, I won't take offense if you describe solutions at a 3rd... less
Is there an easy way to transfer to a tableau format(only keeps the parent level once)? I understand that the target format is not good in data science. But it makes the report... moreIs there an easy way to transfer to a tableau format(only keeps the parent level once)? I understand that the target format is not good in data science. But it makes the report easy to read. I can easily write it to excel and send the report to my boss..
library(data.table)
(dt <- data.table(Parent_Product=c("A","A","A","B","B","B"),
Sub_Product=c("red","red","blue","yellow","pink","pink"),
Sub_Product1=c(1,2,3,4,5,6),
Value=c(100,200,300,400,500,600)))
#> Parent_Product Sub_Product Sub_Product1 Value
#> 1: A red 1 100
#> 2: A red 2 200
#> 3: A blue 3 300
#> 4: B yellow 4 400
#> 5: B pink 5 500
#> 6: B pink 6 600
(target_dt <- data.table(Parent_Product=c("A",NA,NA,"B",NA,NA),
Sub_Product=c("red",NA,"blue","yellow","pink",NA),
... less
I have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean distance... moreI have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean distance between each other less than a threshold "T".
I do not know how many clusters exist. At the end, there may be individual vectors existing that are not part of any cluster because its euclidean distance is not less than "T" with any of the vectors in the space.
What existing algorithms / approach should be used here?
Currently I use the following code:
callbacks =
model.fit(X_train.astype('float32'), Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
shuffle=True, verbose=1,... moreCurrently I use the following code:
callbacks =
model.fit(X_train.astype('float32'), Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
shuffle=True, verbose=1, validation_data=(X_valid, Y_valid),
callbacks=callbacks)
It tells Keras to stop training when loss didn't improve for 2 epochs. But I want to stop training after loss became smaller than some constant "THR":
if val_loss < THR:
break
is their any possible way to call all the R packages/libraries and functions (packages like raster, rgdal, maptools etc.) in .net framework so that i am able to access all the features of R and run R script using .Net frontend....
I'm working on an application that requires a great deal of stastical processing and output as images in a .net desktop application. The problems, including generating the output... moreI'm working on an application that requires a great deal of stastical processing and output as images in a .net desktop application. The problems, including generating the output images, seem like a natural fit for R http://www.r-project.org/
Is there a wrapper, API, SDK, or port that will allow me to call R from .net?
I have to deal with very big data (Point clouds generally more than 30 000 000 points) using Matlab. I can read ascii data using textscan function. After reading, I need to detect... moreI have to deal with very big data (Point clouds generally more than 30 000 000 points) using Matlab. I can read ascii data using textscan function. After reading, I need to detect invalid data (points with 0,0,0 coordinates) and then I need to do some mathematical operations on each point or each line in the data. In my way, first I read data with textscan and then I assign this data to a matrix. Secondly, I use for loops for detecting invalid points and doing some mathematical operations on each point or line in the data. A sample of my code is shown as below. According to profile tool of Matlab textscan takes 37% and line
transformed_list((i:i),(1:4)) = coordinate_list((i:i),(1:4))*t_matrix;
takes 35% of all computation time.I tried it with another point cloud (stores around 5 500 000) and profile tool reported same results. Is there a way of avoiding for loops, or is there another way of speeding up this computation?
fileID = fopen('C:\Users\Mustafa\Desktop\ptx_all_data\dede5.ptx'); ... less
I am trying to extract Experience field from the text field. But after converting PDF to Text file there appears few extra lines due to which I am not able to extract the data... moreI am trying to extract Experience field from the text field. But after converting PDF to Text file there appears few extra lines due to which I am not able to extract the data properly. Below is the text field yielded after the conversion. Can someone please tell me how to extract the Experience field from this file?
The below code works perfectly for those text files where there will be no blank lines.
with open('E:/cvparser/sampath.txt', 'r', encoding = 'utf-8') as f:
exp_summary_flag = False
exp_summary = ''
for line in f:
if line.startswith('EXPERIENCE'):
exp_summary_flag = True
elif exp_summary_flag:
exp_summary += line
if not line.strip(): break
print(exp_summary)
Here is the text file which I got after conversion using pdfminer.
Sampath XYZ
i have some data and Y variable is a factor - Good or Bad. I am building a Support vector machine using 'train' method from 'caret' package. Using 'train' function i was able to... morei have some data and Y variable is a factor - Good or Bad. I am building a Support vector machine using 'train' method from 'caret' package. Using 'train' function i was able to finalize values of various tuning parameters and got the final Support vector machine . For the test data i can predict the 'class'. But when i try to predict probabilities for test data, i get below error (for example my model tells me that 1st data point in test data has y='good', but i want to know what is the probability of getting 'good' ...generally in case of support vector machine, model will calculate probability of prediction..if Y variable has 2 outcomes then model will predict probability of each outcome. The outcome which has the maximum probability is considered as the final solution)
**Warning message: In probFunction(method, modelFit, ppUnk) : kernlab class probability calculations failed; returning NAs**
sample code as below
library(caret) trainset <- data.frame( class=factor(c("Good", "Bad", "Good",... less
Right now I'm importing a fairly large CSV as a dataframe every time I run the script. Is there a good solution for keeping that dataframe constantly available in between runs... moreRight now I'm importing a fairly large CSV as a dataframe every time I run the script. Is there a good solution for keeping that dataframe constantly available in between runs so I don't have to spend all that time waiting for the script to run?
I have being toying with the idea of creating software “Robots” to help on different areas of the development process, repetitive task, automatable task, etc.I have quite a... moreI have being toying with the idea of creating software “Robots” to help on different areas of the development process, repetitive task, automatable task, etc.I have quite a few ideas where to begin.My problem is that I work mostly alone, as a freelancer, and work tends to pill up, and I don’t like to extend or “blow” deadline dates.I have investigated and use quite a few productivity tools. I have investigated CodeGeneration and I am projecting a tool to generate portions of code. I use codeReuse techniques. Etc.Any one as toughs about this ? as there any good articles. less
Using machine learning in R while generating formula ~. ,data,what does . indicatefor example
fit <- svm(factor(outcome)~., data= train, probability= T)
pre <- predict(fit, test,... moreUsing machine learning in R while generating formula ~. ,data,what does . indicatefor example
fit <- svm(factor(outcome)~., data= train, probability= T)
pre <- predict(fit, test, decision.value= T, probability= T)
I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support.... moreI have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible as a piece of software for numerous other reasons.
One day I hope to replace my use of SAS with python and pandas, but I currently lack an out-of-core workflow for large datasets. I'm not talking about "big data" that requires a distributed network, but rather files too large to fit in memory but small enough to fit on a hard-drive.
My first thought is to use HDF store to hold large datasets on disk and pull only the pieces I need into dataframes for analysis. Others have mentioned MongoDB as an easier to use alternative. My question is this:
What are some best-practice workflows for accomplishing the following:
Loading flat files into a permanent, on-disk database structure
Querying that database to retrieve data to feed into a pandas data structure
Updating the database after manipulating pieces in... less
I want to set some of my model frozen. Following the official docs:
with torch.no_grad():
linear = nn.Linear(1, 1)
... moreI want to set some of my model frozen. Following the official docs:
with torch.no_grad():
linear = nn.Linear(1, 1)
linear.eval()
print(linear.weight.requires_grad)
But it prints True instead of False. If I want to set the model in eval mode, what should I do?