Has anyone successfully been able to use Tableau to connect to Denodo via ODBC?I'm doing a proof of concept with Tableau and I'm trying to connect to a Denodo data source. I've... moreHas anyone successfully been able to use Tableau to connect to Denodo via ODBC?I'm doing a proof of concept with Tableau and I'm trying to connect to a Denodo data source. I've been told that there are no issues doing this, however, I'm receiving the standard Tableau warnings ofTableau identified the following warnings for the ODBC data source named 'my denodo datasource':Along with the standard warning message above, when I try to slice my data by Year, Quarter, Month I get the following when I add configure my date field to display Quarter in my column:ODBC escape convert error.I'm testing this out with Tableau 9.1.0 (9100.15.0828.1711) 64-bit. I've also tried it on Tableau 9.0.If you've gotten this to work, what versions of each are you using? (I can see that Denodo even has a mini tutorial on using Tableau to connect to Denodo, but the screenshot shows them using months instead of quarters and I can't tell which version of Tableau they are using).Thank you~! less
Let's assume for the following that only one Spark job is running at every point in time.
What I get so far
Here is what I understand what happens in... moreLet's assume for the following that only one Spark job is running at every point in time.
What I get so far
Here is what I understand what happens in Spark:
When a SparkContext is created, each worker node starts an executor. Executors are separate processes (JVM), that connects back to the driver program. Each executor has the jar of the driver program. Quitting a driver, shuts down the executors. Each executor can hold some partitions.
When a job is executed, an execution plan is created according to the lineage graph.
The execution job is split into stages, where stages containing as many neighbouring (in the lineage graph) transformations and action, but no shuffles. Thus stages are separated by shuffles.
I understand that
A task is a command sent from the driver to an executor by serializing the Function object.
The executor deserializes (with the driver jar) the command (task) and executes it on a partition.
but
Question(s)
How do I split the stage into those tasks?
Specifically:
I am installing Hadoop on my laptop. SSH works fine, but I cannot start hadoop.
munichong@GrindPad:~$ ssh localhost
Welcome to Ubuntu 12.10 (GNU/Linux 3.5.0-25-generic... moreI am installing Hadoop on my laptop. SSH works fine, but I cannot start hadoop.
munichong@GrindPad:~$ ssh localhost
Welcome to Ubuntu 12.10 (GNU/Linux 3.5.0-25-generic x86_64)
* Documentation: https://help.ubuntu.com/
0 packages can be updated.
0 updates are security updates.
Last login: Mon Mar 4 00:01:36 2013 from localhost
munichong@GrindPad:~$ /usr/sbin/start-dfs.sh
chown: changing ownership of `/var/log/hadoop/root': Operation not permitted
starting namenode, logging to /var/log/hadoop/root/hadoop-munichong-namenode-GrindPad.out
/usr/sbin/hadoop-daemon.sh: line 136: /var/run/hadoop/hadoop-munichong-namenode.pid: Permission denied
usr/sbin/hadoop-daemon.sh: line 135: /var/log/hadoop/root/hadoop-munichong-namenode-GrindPad.out: Permission denied
head: cannot open `/var/log/hadoop/root/hadoop-munichong-namenode-GrindPad.out' for reading: No such file or directory
localhost: chown: changing ownership of `/var/log/hadoop/root': Operation not permitted
localhost: starting datanode, logging to... less
I have just started learning Azure IoT and it's quite interesting. I am confuse about does IoT Hub stores data somewhere?i.e. Suppose i am passing room Temperature to IoT hub and... moreI have just started learning Azure IoT and it's quite interesting. I am confuse about does IoT Hub stores data somewhere?i.e. Suppose i am passing room Temperature to IoT hub and want to store it in database for further use. How it's possible?I am clear on how device-to-cloud and cloud-to-device works with IoT hub.
In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?
I prefer Python over Scala. But, as Spark is natively written in Scala, I was expecting my code to run faster in the Scala than the Python version for obvious reasons.With that... moreI prefer Python over Scala. But, as Spark is natively written in Scala, I was expecting my code to run faster in the Scala than the Python version for obvious reasons.With that assumption, I thought to learn & write the Scala version of some very common preprocessing code for some 1 GB of data. Data is picked from the SpringLeaf competition on Kaggle. Just to give an overview of the data (it contains 1936 dimensions and 145232 rows). Data is composed of various types e.g. int, float, string, boolean. I am using 6 cores out of 8 for Spark processing; that's why I used minPartitions=6 so that every core has something to process.Scala Code
val input = sc.textFile("train.csv", minPartitions=6)
val input2 = input.mapPartitionsWithIndex { (idx, iter) =>
if (idx == 0) iter.drop(1) else iter }
val delim1 = "\001"
def separateCols(line: String): Array = {
val line2 = line.replaceAll("true", "1")
val line3 = line2.replaceAll("false", "0")
val vals: Array = line3.split(",")
I am really new to Data Science/ML and have been working on Tensorflow to implement Linear Regression on California Housing Prices from Kaggle.
I tried to train a mode in two... moreI am really new to Data Science/ML and have been working on Tensorflow to implement Linear Regression on California Housing Prices from Kaggle.
I tried to train a mode in two different ways:
Using a Sequential model
Custom implementation
In both cases, the loss of the model was really high and I have not been able to understand what are the ways to improve it.
Dataset prep
df = pd.read_csv('california-housing-prices.zip')
df = df
print('Shape of dataset before removing NAs and duplicates {}'.format(df.shape))
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)
input_train, input_test, target_train, target_test = train_test_split(df.values, df.values, test_size=0.2)
scaler = MinMaxScaler()
input_train = input_train.reshape(-1,1)
input_test = input_test.reshape(-1,1)
input_train = scaler.fit_transform(input_train)
input_test = scaler.fit_transform(input_test)
target_train = target_train.reshape(-1,1)
target_train = scaler.fit_transform(target_train)
target_test =... less
I installed R studio 1.2.1335 in my Pc powered by Win 7 x64. after installation when i try to open R studio shows error "api-ms-win-crt-runtime-l1-1-0.dll is missing" how to fix... moreI installed R studio 1.2.1335 in my Pc powered by Win 7 x64. after installation when i try to open R studio shows error "api-ms-win-crt-runtime-l1-1-0.dll is missing" how to fix this error?
I tried to reinstall R studio, but problem continuous
Has anyone tried building an Angular2 application with tableau visualizations integrated in it using the Tableau JavaScript API?
According to the documentation, you're supposed... moreHas anyone tried building an Angular2 application with tableau visualizations integrated in it using the Tableau JavaScript API?
According to the documentation, you're supposed to include the following script in your file which will create a Tableau global variable:<script src="https://YOUR-SERVER/javascripts/api/tableau-2.js"></script>
I'm not sure how to access this global variable within an Angular2 class.
I have an idea on how to extract Table data to Cloud storage using Bq extract command but I would like rather like to know, if there are any options to extract a Big Query table... moreI have an idea on how to extract Table data to Cloud storage using Bq extract command but I would like rather like to know, if there are any options to extract a Big Query table as NewLine Delimited JSON to Local Machine?
I could extract Table data to GCS via CLI and also download JSON data from WEB UI but I am looking for solution using BQ CLI to download table data as JSON in Local machine?. I am wondering is that even possible?
Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH:... moreCould not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/nickopotamus/.local/share/r-miniconda/envs/r-reticulate/lib:/usr/lib/R/lib::/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server
sudo find / -name 'libcudart.so.11.0' finds the file in:
/home/nickopotamus/.local/share/r-miniconda/envs/r-reticulate/lib/libcudart.so.11.0
/home/nickopotamus/anaconda3/pkgs/cudatoolkit-11.3.1-h2bc3f7f_2/lib/libcudart.so.11.0
/home/nickopotamus/anaconda3/pkgs/cudatoolkit-11.2.0-h73cb219_8/lib/libcudart.so.11.0
/home/nickopotamus/anaconda3/pkgs/cudatoolkit-11.2.72-h2bc3f7f_0/lib/libcudart.so.11.0
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudart.so.11.0
The top entry at least appears to be in the path that the error is searching, so I'm at a bit of a loss as to what to try next. Is it a conflict with the other anaconda packages (which I can't seem to remove), or am I simply being... less
We are confused on the difference between R and R studio. We do the majority of our work on R studio but we were required to download R as well. Is regular R necessary for R... moreWe are confused on the difference between R and R studio. We do the majority of our work on R studio but we were required to download R as well. Is regular R necessary for R studio to work?
Out of curiosity, I've been reading up a bit on the field of Machine Learning, and I'm surprised at the amount of computation and mathematics involved. One book I'm reading... moreOut of curiosity, I've been reading up a bit on the field of Machine Learning, and I'm surprised at the amount of computation and mathematics involved. One book I'm reading through uses advanced concepts such as Ring Theory and PDEs (note: the only thing I know about PDEs is that they use that funny looking character). This strikes me as odd considering that mathematics itself is a hard thing to "learn."
Are there any branches of Machine Learning that use different approaches?
I would think that a approaches relying more on logic, memory, construction of unfounded assumptions, and over-generalizations would be a better way to go, since that seems more like the way animals think. Animals don't (explicitly) calculate probabilities and statistics; at least as far as I know. less
I just got into SQL to do some data science and was wondering why my code was running but not affecting the MySQL database in any way. I am using pycharm and the MySQLdb module.
... moreI just got into SQL to do some data science and was wondering why my code was running but not affecting the MySQL database in any way. I am using pycharm and the MySQLdb module.
import MySQLdb
db = MySQLdb.connect(host="localhost",
user="root",
passwd="********", #Password blocked
db="test")
cur = db.cursor()
cur.execute("SELECT * FROM movies")
cur.execute("Update movies set genre = 'action' where id = 1")
for row in cur.fetchall() :
print row, " ", row, " ", row
My code runs and returns no errors, but when I delete the
cur.execute("Update movies set genre = 'action' where id = 1")
line it just prints out the table the as it was before. Just for reference, here is the table:
1 Interstellar sci-fi
2 Thor: Ragnarok action
3 Thor: The Dark World action
How can I make the commands in python actually affect the table? Thank you so much for your help! less
I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.
Need a Scala function which will take parameter... moreI am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.
Need a Scala function which will take parameter like path and file name and write that CSV file.
I have checked the PyTorch tutorial and questions similar to this one on Stackoverflow.
I get confused; does the embedding in pytorch (Embedding) make the similar words closer to... moreI have checked the PyTorch tutorial and questions similar to this one on Stackoverflow.
I get confused; does the embedding in pytorch (Embedding) make the similar words closer to each other? And do I just need to give to it all the sentences? Or it is just a lookup table and I need to code the model?
I have been using Knitr via R-Studio, and think it is pretty neat. I have a minor issue though. When I source a file in an R-Chunk, the knitr output includes external comments as... moreI have been using Knitr via R-Studio, and think it is pretty neat. I have a minor issue though. When I source a file in an R-Chunk, the knitr output includes external comments as follows:
+ FALSE Loading required package: ggplot2
+ FALSE Loading required package: gridExtra
+ FALSE Loading required package: grid
+ FALSE Loading required package: VGAM
+ FALSE Loading required package: splines
+ FALSE Loading required package: stats4
+ FALSE Attaching package: 'VGAM'
+ FALSE The following object(s) are masked from 'package:stats4':
I have tried to set R-chunk options in various ways but still didn't seem to avoid the problem:
```{r echo=FALSE, cache=FALSE, results=FALSE, warning=FALSE, comment=FALSE, warning=FALSE}
source("C:/Rscripts/source.R");
__future__ frequently appears in Python modules. I do not understand what __future__ is for and how/when to use it even after reading the Python's __future__ doc.Can anyone... more__future__ frequently appears in Python modules. I do not understand what __future__ is for and how/when to use it even after reading the Python's __future__ doc.Can anyone explain with examples?A few answers regarding the basic usage of __future__ I've received seemed correct.However, I need to understand one more thing regarding how __future__ works:The most confusing concept for me is how a current python release includes features for future releases, and how a program using a feature from a future release can be be compiled successfully in the current version of Python.I am guessing that the current release is packaged with potential features for the future. However, the features are available only by using __future__ because they are not the current standard. Let me know if I am right. less
I began to fall in love with a Python Visualization library called Altair, and i use it with every small data science project that ive done.
Now, in terms of Industry use... moreI began to fall in love with a Python Visualization library called Altair, and i use it with every small data science project that ive done.
Now, in terms of Industry use cases, Does it make sense to visualize Big Data or should we just take a random sample?
I trained quora question pair detection with LSTM but training accuracy is very low and always changes when i train. I dont understand what mistake i did.
I tried changing loss... moreI trained quora question pair detection with LSTM but training accuracy is very low and always changes when i train. I dont understand what mistake i did.
I tried changing loss and optimiser and with increased epoch.
import numpy as np
from numpy import array
from keras.callbacks import ModelCheckpoint
import keras
from keras.optimizers import SGD
import tensorflow as tf
from sklearn import preprocessing
import xgboost as xgb
from keras import backend as K
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from keras.preprocessing.text import Tokenizer , text_to_word_sequence
from keras.preprocessing.sequence import pad_sequences
from keras.layers.embeddings import Embedding
from keras.models import Sequential, model_from_json, load_model
from keras.layers import LSTM, Dense, Input, concatenate, Concatenate, Activation, Flatten
from keras.models import Model
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from... less
I'm following a course on EdX on Programming with Python in Data Science. When using a given function to plot the results of my linear regression model, the graph seems very off... moreI'm following a course on EdX on Programming with Python in Data Science. When using a given function to plot the results of my linear regression model, the graph seems very off with all the scatter points clustered at the bottom and the regression line way up top.
I'm not sure if it is the defined function drawline to be incorrect or sth else is wrong with my modeling process.
here is the defined function
def drawLine(model, X_test, y_test, title, R2):
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(X_test, y_test, c='g', marker='o')
ax.plot(X_test, model.predict(X_test), color='orange', linewidth=1, alpha=0.7)
plt.show()
here is the code I wrote
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import linear_model
from sklearn.model_selection import... less
How do I convert a PyTorch tensor into a python list?
My current use case is to convert a tensor of size into a list of 2048 elements.
My tensor has floating point values. Is... moreHow do I convert a PyTorch tensor into a python list?
My current use case is to convert a tensor of size into a list of 2048 elements.
My tensor has floating point values. Is there a solution which also accounts for int and possibly other data types?