I'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for... moreI'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for so many years.
I think of Data Lake as something very linked to big data, but where is the line between using Data Lake technolgies vs SSIS?
Is there any advantage of using Data Lake technologies with 25MB ~100MB ~ 300MB files? Parallelism? flexibility? Extensible in the future? Is there any performance gain when the files to be loaded are not so big as U-SQL best scenario...
What are your thoughts? Would it be like using a hammer to crack a nut? Please, don't hesitate to ask me any questions to clarify the situation. Thanks in advance!!
21/03 EDIT More clarifications:
has to be on the cloud
the reason I considered about using ADL is because there is no substitution for SSIS in the cloud. There is ADF, but it's not the same, it orchestrates the data, but it's not so flexible as SSIS
I thought I could use U-SQL for some... less
I am developing a application in python which gives job recommendation based on the resume uploaded. I am trying to tokenize resume before processing further. I want to tokenize... moreI am developing a application in python which gives job recommendation based on the resume uploaded. I am trying to tokenize resume before processing further. I want to tokenize group of words. For example Data Science is a keyword when i tokenize i will get data and science separately. How to overcome this situation. Is there any library which does these extraction in python?
I'd like to stop various messages that are coming on spark shell.I tried to edit the log4j.properties file in order to stop these message.Here are the contents of... moreI'd like to stop various messages that are coming on spark shell.I tried to edit the log4j.properties file in order to stop these message.Here are the contents of log4j.properties
# Define the root logger with appender file
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
But messages are still getting displayed on the console.
Here are some example messages
15/01/05 15:11:45 INFO SparkEnv: Registering BlockManagerMaster
15/01/05 15:11:45 INFO DiskBlockManager: Created local... less
I'm new into the ML Scene and I want to create a phonegap app involving Tensorflow but I'm unsure where to start or if this is even possible. Can anyone give me a hand (Probably... moreI'm new into the ML Scene and I want to create a phonegap app involving Tensorflow but I'm unsure where to start or if this is even possible. Can anyone give me a hand (Probably by linking me to some resources)? My app will just use tensor flow image recognition (probably pre-trained).
Thanks, Felix. (This is a repost of this same question in the data science category which failed to garner a response)
I'm trying to make some plots in python 3 for a data science project, and I'm having an issue where there is no color behind the text on my axes when I save it. Here's my code... moreI'm trying to make some plots in python 3 for a data science project, and I'm having an issue where there is no color behind the text on my axes when I save it. Here's my code with an example plot:
plt.plot(play_num_2019, home_prob_2019, color = getColor(home_teams_2019))
plt.plot(play_num_2019, away_prob_2019, color = getColor(away_teams_2019))
plt.xlabel("Play Number")
plt.ylabel("Win Probability")
plt.legend([home_teams_2019, away_teams_2019)
fig = plt.figure()
fig.patch.set_facecolor('xkcd:white')
We have over 100m rows in big query of analytics data. Each record is an event attached to an id.
A simplification:
ID EventId Timestamp
Is it possible to flatten this to one... moreWe have over 100m rows in big query of analytics data. Each record is an event attached to an id.
A simplification:
ID EventId Timestamp
Is it possible to flatten this to one table holding rows like:
ID timestamp-period event1 event2 event3 event4
Where the event columns hold the counts of the number of events for that id in that time period?
So far, i've managed to do it on small data sets with 2 queries. One to create rows that hold counts for an individual event id and another to flatten these in to one row after. The reason I haven't yet been able to do this accross the whole data set is that bigquery runs out of resources - not entirely sure why.
These two queries look something like this:
SELECT VideoId, date_1, IF(EventId = 1, INTEGER(count), 0) AS user_play, IF(EventId = 2, INTEGER(count), 0) AS auto_play, IF(EventId = 3, INTEGER(count), 0) AS pause, IF(EventId = 4, INTEGER(count), 0) AS replay, IF(EventId = 5, INTEGER(count), 0) AS stop, IF(EventId = 6, INTEGER(count), 0) AS seek,... less
I'm pretty much new in Python object oriented programming and I have trouble understanding the super() function (new style classes) especially when it comes to multiple... moreI'm pretty much new in Python object oriented programming and I have trouble understanding the super() function (new style classes) especially when it comes to multiple inheritance.
For example if you have something like:
class First(object): def __init__(self): print "first" class Second(object): def __init__(self): print "second" class Third(First, Second): def __init__(self): super(Third, self).__init__() print "that's it"
What I don't get is: will the Third() class inherit both constructor methods? If yes, then which one will be run with super() and why?
And what if you want to run the other one? I know it has something to do with Python method resolution order (MRO). less
In a Python 3.5 notebook, backed by an Apache Spark service, I had installed BigDL 0.2 using pip. When removing that installation and trying to install version 0.3 of BigDL, I... moreIn a Python 3.5 notebook, backed by an Apache Spark service, I had installed BigDL 0.2 using pip. When removing that installation and trying to install version 0.3 of BigDL, I get this error: (linebreaks added for readability)
AssertionError: Multiple .dist-info directories:
/gpfs/fs01/user/scbc-4dbab79416a6ec-4cf890276e2b/.local/lib/python3.5/site-packages/BigDL-0.3.0.dist-info,
/gpfs/fs01/user/scbc-4dbab79416a6ec-4cf890276e2b/.local/lib/python3.5/site-packages/BigDL-0.2.0.dist-info
However, neither of these directories exists:
!ls -al /gpfs/fs01/user/scbc-4dbab79416a6ec-4cf890276e2b/.local/lib/python3.5/site-packages/
total 0
drwx------ 2 scbc-4dbab79416a6ec-4cf890276e2b users 4096 Nov 8 06:12 .
drwx------ 3 scbc-4dbab79416a6ec-4cf890276e2b users 4096 Nov 8 06:12 .. less
Having read this question, I would like to ask additional questions:
The Cluster Manager is a long-running service, on which node it is... moreHaving read this question, I would like to ask additional questions:
The Cluster Manager is a long-running service, on which node it is running?
Is it possible that the Master and the Driver nodes will be the same machine? I presume that there should be a rule somewhere stating that these two nodes should be different?
In case where the Driver node fails, who is responsible of re-launching the application? and what will happen exactly? i.e. how the Master node, Cluster Manager and Workers nodes will get involved (if they do), and in which order?
Similarly to the previous question: In case where the Master node fails, what will happen exactly and who is responsible of recovering from the failure? less
In the Tensorflow ML Basics with Keras tutorial for making a basic text classification, when preparing the trained model for export, the tutorial suggests including the... moreIn the Tensorflow ML Basics with Keras tutorial for making a basic text classification, when preparing the trained model for export, the tutorial suggests including the TextVectorization layer into the Model so it can "process raw strings". I understand why to do this.
But then the code snippet is:
export_model = tf.keras.Sequential()
Why when preparing the model for export, does the tutorial also include a new activation layer layers.Activation('sigmoid')? Why not incorporate this layer into the original model? less
In every project I've tried to create in Android Studio, all usages of R are marked in red with the error message "cannot resolve symbol R", but the compilation succeeds and the... moreIn every project I've tried to create in Android Studio, all usages of R are marked in red with the error message "cannot resolve symbol R", but the compilation succeeds and the application runs. This is really annoying, as it blocks auto-completion and shows huge red waved lines all over my code.
I'm running Android Studio 1.7.0 and creating the project with default settings. A screenshot is attached:
This is my build.gradle:
buildscript {
repositories {
mavenCentral()
}
dependencies {
classpath 'com.android.tools.build:gradle:0.4'
}
}
apply plugin: 'android'
I'm kicking tires on BI tools, including, of course, Tableau. Part of my evaluation includes correlating the SQL generated by the BI tool with my actions in the tool.
Tableau has... moreI'm kicking tires on BI tools, including, of course, Tableau. Part of my evaluation includes correlating the SQL generated by the BI tool with my actions in the tool.
Tableau has me mystified. My database has 2 billion things; however, no matter what I do in Tableau, the query Redshift reports as having been run is "Fetch 10000 in SQL_CURxyz", i.e. a cursor operation. In the screenshot below, you can see the cursor ids change, indicating new queries are being run -- but you don't see the original queries.
Is this a Redshift or Tableau quirk? Any idea how to see what's actually running under the hood? And why is Tableau always operating on 10000 records at a time? less
I trained quora question pair detection with LSTM but training accuracy is very low and always changes when i train. I dont understand what mistake i did.
I tried changing loss... moreI trained quora question pair detection with LSTM but training accuracy is very low and always changes when i train. I dont understand what mistake i did.
I tried changing loss and optimiser and with increased epoch.
import numpy as np
from numpy import array
from keras.callbacks import ModelCheckpoint
import keras
from keras.optimizers import SGD
import tensorflow as tf
from sklearn import preprocessing
import xgboost as xgb
from keras import backend as K
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from keras.preprocessing.text import Tokenizer , text_to_word_sequence
from keras.preprocessing.sequence import pad_sequences
from keras.layers.embeddings import Embedding
from keras.models import Sequential, model_from_json, load_model
from keras.layers import LSTM, Dense, Input, concatenate, Concatenate, Activation, Flatten
from keras.models import Model
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from... less
I'm currently learning how to use new Cloud Functions for Firebase and the problem I'm having is that I can't access the function I wrote through an AJAX request. I get the "No... moreI'm currently learning how to use new Cloud Functions for Firebase and the problem I'm having is that I can't access the function I wrote through an AJAX request. I get the "No 'Access-Control-Allow-Origin'" error. Here's an example of the function I wrote:
exports.test = functions.https.onRequest((request, response) => {
response.status(500).send({test: 'Testing functions'});
})
The function sits in this url: https://us-central1-fba-shipper-140ae.cloudfunctions.net/test
Firebase docs suggests to add CORS middleware inside the function, I've tried it but it's not working for me: https://firebase.google.com/docs/functions/http-events
This is how I did it:
var cors = require('cors');
exports.test = functions.https.onRequest((request, response) => {
cors(request, response, () => {
response.status(500).send({test: 'Testing functions'});
})
})
What am I doing wrong? I would appreciate any help with this.UPDATE:Doug Stevenson's answer helped. Adding ({origin: true}) fixed the issue, I... less
We are trying to use Amazon Web Services Internet of Things (AWS IoT) to send messages from/to a Web Browser (e.g: . Given that the AWS IoT supports JavaScript we expect that... moreWe are trying to use Amazon Web Services Internet of Things (AWS IoT) to send messages from/to a Web Browser (e.g: . Given that the AWS IoT supports JavaScript we expect that this is possible ...
We have searched at the AWS IoT Documentation but only found server-side examples (which expose AWS secrets/keys...)
Are there any good working examples or tutorials for using AWS IoT to send/receive messages via WebSockets/MQTT in the browser (e.g: authenticating with AWS Cognito)? Thanks!
I did come across a mini tutorial for data preprocessing using spark here: http://ampcamp.berkeley.edu/big-data-mini-course/featurization.html
However, this discusses only about... moreI did come across a mini tutorial for data preprocessing using spark here: http://ampcamp.berkeley.edu/big-data-mini-course/featurization.html
However, this discusses only about textfile parsing. Is there a way to parse xml files from spark system?
I have been driving myself crazy trying to install xgboost in python on windows 10. I have looked through several suggested articles but still can't seem to find a proper... moreI have been driving myself crazy trying to install xgboost in python on windows 10. I have looked through several suggested articles but still can't seem to find a proper solution. If anyone has done this before kindly share your method other suggestions are also welcome.
I have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same output... moreI have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same output for any given input.
I did take a look at Artificial neural networks benchmark, but my network implementation is built for neurons with the same activation function for each neuron, i.e. no constant neurons.
To make sure the problem wasn't just due to my choice of training set (1218 board states and moves generated by a genetic algorithm), I tried to train the network to reproduce XOR. The logistic activation function was used. Instead of using the derivative, I multiplied the error by output*(1-output) as some sources suggested that this was equivalent to using the derivative. I can put the Haskell source on HPaste, but it's a little embarrassing to look at. The network has 3 layers: the first layer has 2 inputs and 4 outputs, the second has 4 inputs and 1 output, and the third has 1 output. Increasing to 4 neurons in the... less
Take a look at Wordle: http://www.wordle.net/
It's much better looking than any other word cloud generators I've... moreContext
Take a look at Wordle: http://www.wordle.net/
It's much better looking than any other word cloud generators I've seen
Note: the source is not available - read the FAQ: http://www.wordle.net/faq#code
My Questions
Is there an algorithm available that does what Wordle does?
If no, what are some alternatives that produces similar kinds of output?
I am trying to export some data from Big Query. This is done by first saving the table and then exporting it to google cloud storage. This used to work just fine but recently... moreI am trying to export some data from Big Query. This is done by first saving the table and then exporting it to google cloud storage. This used to work just fine but recently apparently some tables have nested schemas so exporting as csv does not work anymore. Exporting as a JSON should work, and the export job claims to succeed, but data is not available on google cloud storage. Anyone experiencing similar issues? Is Google having some problems?
I'm trying to iterate over the words of a string.
The string can be assumed to be composed of words separated by whitespace.
Note that I'm not interested in C string functions or... moreI'm trying to iterate over the words of a string.
The string can be assumed to be composed of words separated by whitespace.
Note that I'm not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.
The best solution I have right now is:
#include <iostream> #include <sstream> #include <string> using namespace std; int main() { string s = "Somewhere down the road"; istringstream iss(s); do { string subs; iss >> subs; cout << "Substring: " << subs << endl; } while (iss); }
Is there a more elegant way to do this? less
I'm currently coding a basic neural network that is supposed to calculate a XOR, using backpropagation. However, it instead outputs the average of its target outputs. (A XOR... moreI'm currently coding a basic neural network that is supposed to calculate a XOR, using backpropagation. However, it instead outputs the average of its target outputs. (A XOR returning {0,1,1,0}, that is 0.5).
I followed both the following articles and can't find my error. That guy supposedly had the same problem, but never found an answer.
Anyway, here's my code:
network.c
void initialise_network(Network *network) { assert(network != NULL); network->inputs = 1.0; network->hidden = 1.0; for (int i = 0; i < network->num_inputs+1; i++) { for (int j = 0; j < network->num_hidden; j++) { network->ithw = rnd_double(-1, 1); network->delta_hidden = rnd_double(0, 0); printf("ithw: %f\n", i, j, network->ithw); } } for (int i = 0; i < network->num_hidden+1; i++) { for (int j = 0; j < network->num_outputs; j++) { network->htow = rnd_double(-1, 1); network->delta_output = rnd_double(0, 0); // printf("htow: %f\n", i, j, network->htow); } } } void pass_forward(double* inputs, Network *network) { log_info("pass_forward()... less
From C:\Anaconda3\envs\tensorflow_cpu\lib\site-packages\tensorflow\python\platform\app.py:125: main (from __main__) is deprecated and will be removed in a future... moreFrom C:\Anaconda3\envs\tensorflow_cpu\lib\site-packages\tensorflow\python\platform\app.py:125: main (from __main__) is deprecated and will be removed in a future version.
Instructions for updating:
Use object_detection/model_main.py.
Traceback (most recent call last):
File "train.py", line 184, in <module>
tf.app.run()
File "C:\Anaconda3\envs\tensorflow_cpu\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Anaconda3\envs\tensorflow_cpu\lib\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\arfan\Documents\TensorFlow\models\research\object_detection\legacy\trainer.py", line 248, in train
detection_model = create_model_fn()
File "C:\Users\arfan\Documents\TensorFlow\models\research\object_detection\builders\model_builder.py", line 122, in build
raise ValueError('Unknown meta... less
I tried to install XGBoost package in python. I am using windows os, 64bits . I have gone through following.
The package directory states that xgboost is unstable for windows and... moreI tried to install XGBoost package in python. I am using windows os, 64bits . I have gone through following.
The package directory states that xgboost is unstable for windows and is disabled: pip installation on windows is currently disabled for further invesigation, please install from github. https://pypi.python.org/pypi/xgboost/
I am not well versed in Visual Studio, facing problem building XGBoost. I am missing opportunities to utilize xgboost package in data science.
Please guide, so that I can import the XGBoost package in python.
Thanks