I am working with a very large data set which I am downloading from an Oracle data base. The Data frame has about 21 millions rows and 15 columns. My OS is windows xp (32-bit), I... moreI am working with a very large data set which I am downloading from an Oracle data base. The Data frame has about 21 millions rows and 15 columns. My OS is windows xp (32-bit), I have 2GB RAM. Short-term I cannot upgrade my RAM or my OS (it is at work, it will take months before I get a decent pc).
library(RODBC) sqlQuery(Channel1,"Select * from table1",stringsAsFactor=FALSE)
I get here already stuck with the usual "Cannot allocate xMb to vector". I found some suggestion about using the ff package. I would appreciate to know if anybody familiar with the ff package can tell me if it would help in my case. Do you know another way to get around the memory problem? Would a 64-bit solution help? Thanks for your suggestions. less
I have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same output... moreI have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same output for any given input.
I did take a look at Artificial neural networks benchmark, but my network implementation is built for neurons with the same activation function for each neuron, i.e. no constant neurons.
To make sure the problem wasn't just due to my choice of training set (1218 board states and moves generated by a genetic algorithm), I tried to train the network to reproduce XOR. The logistic activation function was used. Instead of using the derivative, I multiplied the error by output*(1-output) as some sources suggested that this was equivalent to using the derivative. I can put the Haskell source on HPaste, but it's a little embarrassing to look at. The network has 3 layers: the first layer has 2 inputs and 4 outputs, the second has 4 inputs and 1 output, and the third has 1 output. Increasing to 4 neurons in the... less
I have being toying with the idea of creating software “Robots” to help on different areas of the development process, repetitive task, automatable task, etc.I have quite a... moreI have being toying with the idea of creating software “Robots” to help on different areas of the development process, repetitive task, automatable task, etc.I have quite a few ideas where to begin.My problem is that I work mostly alone, as a freelancer, and work tends to pill up, and I don’t like to extend or “blow” deadline dates.I have investigated and use quite a few productivity tools. I have investigated CodeGeneration and I am projecting a tool to generate portions of code. I use codeReuse techniques. Etc.Any one as toughs about this ? as there any good articles. less
First of all, sorry for my english skill.
I'm a high school student from South Korea who's doing project with Azure IoT Hub.
I am working on a project where a raspberry pi device... moreFirst of all, sorry for my english skill.
I'm a high school student from South Korea who's doing project with Azure IoT Hub.
I am working on a project where a raspberry pi device is sending values to an Azure IoT Hub. I would like to save this data in Azure Table Storage as this data will be used by some other services (Azure WebApp for example).
So I tried to save raspberry pi values in Azure Table Storage. But when I add endpoints of IoT Hub, I just can use only blob storage container
of course i still don't understand about iot hub please don't look so bad.
In a nutshell
I want to send raspberry pi values to Azure Table Storage and not Blob Storage however only option available to me is Blob Storage when I am setting endpoints for Azure IoT Hub.
How to send values to Table Storage via Azure IoT Hub.
by any chance, my logic for Azure is completely wrong? less
This is my problem: Cousera course on Apllied Data Science in Python I am doing Assigment 2.
Question 1 Which country has won the most gold medals in summer games? This function... moreThis is my problem: Cousera course on Apllied Data Science in Python I am doing Assigment 2.
Question 1 Which country has won the most gold medals in summer games? This function should return a single string value.
This my code:
def answer_one():
return df[df == df.index(0)
answer_one()
This is the error which I am getting:
NameError: name 'df' is not defined
Situation now: I have a data warehouse job profile that publishes .txt file in Data folder every day in the morning. I open Tableau workbook which automatically updates data... moreSituation now: I have a data warehouse job profile that publishes .txt file in Data folder every day in the morning. I open Tableau workbook which automatically updates data visualisations because of union I made. I save this workbook as extract and collages without Tableau Desktop can view it via Tableau Reader.
What I need: This reporting format is heavily dependent on me and I need to automate this.
Is this even possible without Tableau Server?
I need to ETL data into my Cloud SQL instance. This data comes from API calls. Currently, I'm running a custom Java ETL code in Kubernetes with Cronjobs that makes request to... moreI need to ETL data into my Cloud SQL instance. This data comes from API calls. Currently, I'm running a custom Java ETL code in Kubernetes with Cronjobs that makes request to collect this data and load it on Cloud SQL. The problem comes with managing the ETL code and monitoring the ETL jobs. The current solution may not scale well when more ETL processes are incorporated. In this context, I need to use an ETL tool.
My Cloud SQL instance contains two types of tables: common transactional tables and tables that contains data that comes from the API. The second type is mostly read-only in a "operational database perspective" and a huge part of the tables are bulk updated every hour (in batch) to discard the old data and refresh the values.
Considering this context, I noticed that Cloud Dataflow is the ETL tool provided by GCP. However, it seems that this tool is more suitable for big data applications that needs to do complex transformations and ingest data in multiple formats. Also, in Dataflow, the... less
It seems like R is really designed to handle datasets that it can pull entirely into memory. What R packages are recommended for signal processing and machine learning on very... moreIt seems like R is really designed to handle datasets that it can pull entirely into memory. What R packages are recommended for signal processing and machine learning on very large datasets that can not be pulled into memory?
If R is simply the wrong way to do this, I am open to other robust free suggestions (e.g. scipy if there is some nice way to handle very large datasets)
I am new at this concept, and still learning. I have total 10 TB json files in AWS S3, 4 instances(m3.xlarge) in AWS EC2 (1 master, 3 worker). I am currently using spark with... moreI am new at this concept, and still learning. I have total 10 TB json files in AWS S3, 4 instances(m3.xlarge) in AWS EC2 (1 master, 3 worker). I am currently using spark with python on Apache Zeppelin.I am reading files with the following command;hcData=sqlContext.read.option("inferSchema","true").json(path)In zeppelin interpreter settings:
master = yarn-client
spark.driver.memory = 10g
spark.executor.memory = 10g
spark.cores.max = 4
It takes 1 minute to read 1GB approximately. What can I do more for reading big data more efficiently?
Should I do more on coding?
Should I increase instances?
Should I use another notebook platform?
'ascii' codec can't encode character u'\u2019' in position 80: ordinal not in range(128)
Traceback (most recent call last):
File "C:\Program Files... more'ascii' codec can't encode character u'\u2019' in position 80: ordinal not in range(128)
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1102, in __call__
return handler.dispatch()
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File... less
I have installed Python version 3.5 and 3.6 and anaconda.
The following error occures when trying to install tensorflow following the steps... moreI have installed Python version 3.5 and 3.6 and anaconda.
The following error occures when trying to install tensorflow following the steps here https://www.tensorflow.org/install/install_windows unsing anaconda
(tensorflow) C:> pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.0.1-cp35-cp35m-win_amd64.whl
tensorflow-1.0.1-cp35-cp35m-win_amd64.whl is not a supported wheel on this platform.
As I am new to Python, I do not know how to circumvent this probelm. I am using Win10 with 64bit.
Thanks a lot and best,
Martin less
This simple code that simply tries to replace semicolons (at i-specified postions) by colons does not work:
for i in range(0,len(line)): if (line==";" and i in rightindexarray):... moreThis simple code that simply tries to replace semicolons (at i-specified postions) by colons does not work:
for i in range(0,len(line)): if (line==";" and i in rightindexarray): line=":"
It gives the error
line=":" TypeError: 'str' object does not support item assignment
How can I work around this to replace the semicolons by colons? Using replace does not work as that function takes no index- there might be some semicolons I do not want to replace.
Example
In the string I might have any number of semicolons, eg "Hei der! ; Hello there ;!;"
I know which ones I want to replace (I have their index in the string). Using replace does not work as I'm not able to use an index with it. less
I want to transform the string 'one two three' into one_two_three.
I've tried "_".join('one two three'), but that gives me o_n_e_ _t_w_o_ _t_h_r_e_e_...
how do I insert... moreI want to transform the string 'one two three' into one_two_three.
I've tried "_".join('one two three'), but that gives me o_n_e_ _t_w_o_ _t_h_r_e_e_...
how do I insert the "_" only at spaces between words in a string?
am developing (for my senior project) a dumbbell that is able to classify and record different exercises. The device has to be able to classify a range of these exercises based... more am developing (for my senior project) a dumbbell that is able to classify and record different exercises. The device has to be able to classify a range of these exercises based on the data given from an IMU (Inertial Measurement Unit). I have acceleration, gyroscope, compass, pitch, yaw, and roll data.
I am leaning towards using an Artificial Neural Network in order to do this, but am open to other suggestions as well. Ultimately I want to pass in the IMU data into the network and have it tell me what kind of exercise it is (Bicep curl, incline fly etc...).
If I use an ANN, what kind should I use (recurrent or not) and how should I implement it? I am not sure how to get the network to recognize an exercise when I am passing it a continuous stream of data. I was thinking about constantly performing an FFT on a portion of the inputs and sending a set number of frequency magnitudes into the network, but am not sure if that will work either. Any suggestions/comments? less
Are there any benchmarks that can be used to check if implementation of ANN is correct?
I want to have some input and output data, and some information like:- The output of... moreAre there any benchmarks that can be used to check if implementation of ANN is correct?
I want to have some input and output data, and some information like:- The output of Feedforward neural network with 3 layers should be correct in 90% of test data.
I need this information to be sure that this kind of ANN is able to deal with such problem.
I know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes).
But here... moreI know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes).
But here is where I'm unclear -- do all of these claims of superiority hold for just the case of a 2 class decision problem or do they go further? (I assume they hold for non-linearly separable classes or else no-one would care)
So a sample of some of the cases I'd like to be cleared up:
Are SVMs better than ANNs with many classes?
in an online setting?
What about in a semi-supervised case like reinforcement learning?
Is there a better unsupervised version of SVMs?
I don't expect someone to answer all of these lil' subquestions, but rather to give some general bounds for when SVMs are better than the common ANN equivalents (e.g. FFBP, recurrent BP, Boltzmann machines, SOMs, etc.) in practice, and preferably, in theory as well. less
I know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes).
But here... moreI know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes).
But here is where I'm unclear -- do all of these claims of superiority hold for just the case of a 2 class decision problem or do they go further? (I assume they hold for non-linearly separable classes or else no-one would care)
So a sample of some of the cases I'd like to be cleared up:
Are SVMs better than ANNs with many classes?
in an online setting?
What about in a semi-supervised case like reinforcement learning?
Is there a better unsupervised version of SVMs?
I don't expect someone to answer all of these lil' subquestions, but rather to give some general bounds for when SVMs are better than the common ANN equivalents (e.g. FFBP, recurrent BP, Boltzmann machines, SOMs, etc.) in practice, and preferably, in theory as well. less
I've been using var.test and bartlett.test to check basic ANOVA assumptions, among others, homoscedascity (homogeniety, equality of variances). Procedure is quite simple for... moreI've been using var.test and bartlett.test to check basic ANOVA assumptions, among others, homoscedascity (homogeniety, equality of variances). Procedure is quite simple for One-Way ANOVA:
bartlett.test(x ~ g) # where x is numeric, and g is a factor
var.test(x ~ g)
But, for 2x2 tables, i.e. Two-Way ANOVA's, I want to do something like this:
bartlett.test(x ~ c(g1, g2)) # or with list; see latter:
var.test(x ~ list(g1, g2))
Of course, ANOVA assumptions can be checked with graphical procedures, but what about "an arithmetic option"? Is that, at all, manageable? How do you test homoscedascity in Two-Way ANOVA?
I have a non-public Tableau Dashboard that I load through aspx file, which supplies the required authentication, username and password, and allows the site visitor to view the... moreI have a non-public Tableau Dashboard that I load through aspx file, which supplies the required authentication, username and password, and allows the site visitor to view the tableau on the website, with the ticket it receives.
So, now, I want the Tableau to load on the website with filters already applied through the Javascript API.
Or, how can I use the "onFirstInteractive" option of the Javascript? The problem is I don't need to use the Javascript API to load the Tableau but just need it to apply filters on the Dashboard.
function tableauFilter (){
var placeholderDiv = document.getElementById("viz2"); // Don't need this
var url = "https://public.tableau.com/views/Test_1228/Dashboard1" // Don't need this either
var options = {
onFirstInteractive : function(FilterName,Value){ //This is what I want to be able to use
activesheet = viz.getWorkbook().getActiveSheet();
I'm aware of this question, but it is for an outdated function.Let's say I'm trying to predict whether a person will visit country 'X' given the countries they have already... moreI'm aware of this question, but it is for an outdated function.Let's say I'm trying to predict whether a person will visit country 'X' given the countries they have already visited and their income.I have a training data set in a pandas DataFrame that's in the following format.Each row represents a different person, each unrelated to the others in matrix.The first 10 columns are all names of countries and the values in the column are binary (1 if they have visited that country or 0 if they haven't).Column 11 is their income. It's a continuous decimal variable.Lastly, column 12 is another binary table that says yes they have visited 'X' or not.So essentially, if I have a 100,000 people in my dataset, then I have a dataframe of dimensions 100,000 x 12. I want to be able to properly pass this into a linear classifier using tensorflow. But not sure even how to approach this.I am trying to pass the data into this function
estimator = LinearClassifier(
n_classes=n_classes, feature_columns=,... less
**New to Tableau
I am trying to create a new column for that changes a cell to say 'Open' when it is equal to 0.
Currently I have this, but I can't compare an int and a string.
IF... more**New to Tableau
I am trying to create a new column for that changes a cell to say 'Open' when it is equal to 0.
Currently I have this, but I can't compare an int and a string.
IF = 0
THEN = 'Open'
ELSE
END
Any way to possibly do this without changing the SQL?
What are the differences between Apache Spark SQLContext and HiveContext ?
Some sources say that since the HiveContext is a superset of SQLContext developers should always use... moreWhat are the differences between Apache Spark SQLContext and HiveContext ?
Some sources say that since the HiveContext is a superset of SQLContext developers should always use HiveContext which has more features than SQLContext. But the current APIs of each contexts are mostly same.
What are the scenarios which SQLContext/HiveContext is more useful ?.
Is HiveContext more useful only when working with Hive ?.
Or does the SQLContext is all that needs in implementing a Big Data app using Apache Spark ?
I am looking for so guidance and tips in understanding what would it take to do a reasonable Hadoop Proof of Concept in the Cloud? I am a complete noob to the Big Data Analytics... moreI am looking for so guidance and tips in understanding what would it take to do a reasonable Hadoop Proof of Concept in the Cloud? I am a complete noob to the Big Data Analytics world and I will be more than happy for some suggestions that you might have based on your experience?
I am facing the problem where tensorflow is not running in the jupyter notebook it is showing me
No module named tensorflow
But it is running ine anaconda prompt how to fix this
Google just released Cloud Firestore, their new Document Database for apps.
I have been reading the documentation but I don't see a lot of differences between Firestore and... moreGoogle just released Cloud Firestore, their new Document Database for apps.
I have been reading the documentation but I don't see a lot of differences between Firestore and Firebase DB.
The main point is that Firestore uses documents and collections which allow the easy use of querying compared to Firebase, which is a traditional noSQL database with a JSON base.
I would like to know a bit more about their differences, or usages, or whether Firestore just came to replace Firebase DB?