This is kind of naive question but I am new to NoSQL paradigm and don't know much about it. So if somebody can help me clearly understand difference between the HBase and Hadoop... moreThis is kind of naive question but I am new to NoSQL paradigm and don't know much about it. So if somebody can help me clearly understand difference between the HBase and Hadoop or if give some pointers which might help me understand the difference.
Till now, I did some research and acc. to my understanding Hadoop provides framework to work with raw chunk of data(files) in HDFS and HBase is database engine above Hadoop, which basically works with structured data instead of raw data chunk. Hbase provides a logical layer over HDFS just as SQL does. Is it correct? less
If shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most... moreIf shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory.However, on cygwin, output from bash isn't the same as from Python's subprocess, viz:Bash:
$ ls -ls ~rbarakx
total 0
0 drwxr-xr-x 1 Administrator None 0 Aug 21 17:54 bash
0 drwxr-xr-x 1 Administrator None 0 Jul 11 09:11 python
Python:
>>> subprocess.call(,shell=True)
RCS mecha.py print_unicode.py req.py requests_results.html selen.py
0
It looks as if subprocess.call is executing just ls.Can you suggest why?My Environment:Python: Python 2.7.3 (default, Dec 18 2012, 13:50:09) on cygwincygwin: CYGWIN_NT-6.1-WOW64 ... 1.7.22(0.268/5/3) ... i686 Cygwinwindows: Windows 7 Ultimate less
I am trying to publish to Sonar using Gradle on a Java 8 project which is failing with the following error:
INFO:... moreI am trying to publish to Sonar using Gradle on a Java 8 project which is failing with the following error:
INFO: ------------------------------------------------------------------------
INFO: EXECUTION FAILURE
INFO: ------------------------------------------------------------------------
Total time: 1:18.786s
Final Memory: 25M/764M
INFO: ------------------------------------------------------------------------
ERROR: Error during Sonar runner execution
ERROR: Unable to execute Sonar
ERROR: Caused by: Rule 'squid:S1192' can not use 'Constant/issue' remediation function because this rule does not have a fixed remediation cost.
If I select my project to use the FindBugs quality profile then everything works and stats are uploaded to sonar. However if I turn on the sonar way profile the error above is thrown.
Looking at the error it seems it cannot find a remediation cost (which I think is required to work out how many days it will take to fix all tech debt)
I have... less
I am looking into pubnub to use in my real time data visualization with Rickshaw. But I do not understand are the channels already configured or do we have to configure them. If... moreI am looking into pubnub to use in my real time data visualization with Rickshaw. But I do not understand are the channels already configured or do we have to configure them. If so how can we configure a channel for a data viz? Also I am getting the data from python Ceilometer API how can I push that data into pubnub?
My company has been using Jira for production issue tracking for last 6~8 years and as a result, there is a huge amount of production issue details logged in our Jira.
Usually... moreMy company has been using Jira for production issue tracking for last 6~8 years and as a result, there is a huge amount of production issue details logged in our Jira.
Usually each Jira ticket for any production support issues consist of some useful information such as:
Error Message
System Involved
Root Cause
Resolution
Time Taken
etc
My company has its own team chat service that supports the Chatbot API in Java / Python / etc. I would like to build the smart chatbot (if not AI) that is smart enough to exchange conversation like this in the chatroom:
DevOps) Hey Jirabot, what do you know about this error message?
Jirabot) Hi there, in which systems did this occur? Can you choose from one of the followings?
System A
System B
DevOps) 1
Jirabot) Right, it looks like following Jira tickets have experienced the similar issues.. please check the following tickets.
Jira-12zx
Jira-52123zz
Jira-vvvbbb
I would like to ask people with experiences in implementing something similar to this or have any... less
Below is a fully functional and working code . When I copy paste it to a text file testFile.html and then open it with a browser it works fine.But I want the selectCollege... moreBelow is a fully functional and working code . When I copy paste it to a text file testFile.html and then open it with a browser it works fine.But I want the selectCollege function to execute right after the initViz functionI tried this<body onload="initViz();selectCollege('Engineering');"> . . .But it didn't work. How can I make the selectCollege function to execute right after the initViz ?
<!DOCTYPE html>
<html>
We are working on a datawarehouse for a bank and have pretty much followed the standard Kimball model of staging tables, a star schema and an ETL to pull the data through the... moreWe are working on a datawarehouse for a bank and have pretty much followed the standard Kimball model of staging tables, a star schema and an ETL to pull the data through the process.
Kimball talks about using the staging area for import, cleaning, processing and everything until you are ready to put the data into the star schema. In practice this typically means uploading data from the sources into a set of tables with little or no modification, followed by taking data optionally through intermediate tables until it is ready to go into the star schema. That's a lot of work for a single entity, no single responsibility here.
Previous systems I have worked on have made a distinction between the different sets of tables, to the extent of having:
Upload tables: raw source system data, unmodified
Staging tables: intermediate processing, typed and cleansed
Warehouse tables
You can stick these in separate schemas and then apply differing policies for archive/backup/security etc. One of the other guys... less
I've been asked to port a legacy data processing application over to Java.
The current version of the system is composed of a nubmer of (badly written) Excel sheets. The sheets... moreI've been asked to port a legacy data processing application over to Java.
The current version of the system is composed of a nubmer of (badly written) Excel sheets. The sheets implement a big loop: A number of data-sources are polled. These source are a mixture of CSV and XML-based web-servics.
The process is conceptually simple:
It's stateless, that means the calculations which run are purely dependant on the inputs. The results from the calculations are published (currently by writing a number of CSV files in some standard locations on the network).
Having published the results the polling cycle begins again.
The process will not need an admin GUI, however it would be neat if I could implemnt some kind of web-based control panel. It would be nothing pretty and purely for internal use. The control panel would do little more than dispay stats about the source feeds and possibly force refresh the input feeds in the event of a problem. This component is purely optional in the first delivery round.
A... less
I am trying to follow this tutorial: https://medium.com/@natu.neeraj/training-a-keras-model-on-google-cloud-ml-cb831341c196to upload and train a Keras model on Google Cloud... moreI am trying to follow this tutorial: https://medium.com/@natu.neeraj/training-a-keras-model-on-google-cloud-ml-cb831341c196to upload and train a Keras model on Google Cloud Platform, but I can't get it to work.Right now I have downloaded the package from GitHub, and I have created a cloud environment with AI-Platform and a bucket for storage.I am uploading the files (with the suggested folder structure) to my Cloud Storage bucket (basically to the root of my storage), and then trying the following command in the cloud terminal:
gcloud ai-platform jobs submit training JOB1
--module-name=trainer.cnn_with_keras
--package-path=./trainer
--job-dir=gs://mykerasstorage
--region=europe-north1
--config=gs://mykerasstorage/trainer/cloudml-gpu.yaml
But I get errors, first the cloudml-gpu.yaml file can't be found, it says "no such folder or file", and trying to just remove it, I get errors because it says the --init--.py file is missing, but it isn't, even if it is empty (which it... less
I am attacking a combinatorial optimization problem similar to the multi-knapsack problem. The problem has an optimal solution, and i prefer not to settle for an approximate... moreI am attacking a combinatorial optimization problem similar to the multi-knapsack problem. The problem has an optimal solution, and i prefer not to settle for an approximate solution.
Are there any recommended tutorials regarding the quick prototyping and deployment of combinatorial optimization solutions (for senior software engineers that are also Big Data newbies)? I want to move quickly from prototype to deployment onto a docker cluster or AWS.
My background is in distributed systems (a focus on .NET, java, kafka, docker containers, etc...), thus I'm typically inclined to solve complex problems by parallel processing across a cluster of machines (via scaling on a docker cluster or AWS). However, this particular problem can NOT be solved in a brute force manner as the problem space is too large (roughly 100^1000 combinations are possible).
I've limited experience with “big data”, but I'm studying up on knapsack solvers, genetic algorithms, reinforcement learning, and some other AI/ML... less
I'm using google-bigquery on Chicago crime dataset. However, I want to find out the most frequent crime type from primary_type column for each distinct block. To do so, I come up... moreI'm using google-bigquery on Chicago crime dataset. However, I want to find out the most frequent crime type from primary_type column for each distinct block. To do so, I come up following standardSQL.Data:Since the Chicago crime data is rather big, there is an official website where you can preview the dataset:crime data on Google cloudMy current standard SQL:
SELECT primary_type,block, COUNT(*) as count
FROM `bigquery-public-data.chicago_crime.crime`
HAVING COUNT(*) = (SELECT MAX(count)
FROM (SELECT primary_type, COUNT(*) as count FROM `bigquery-public-data.chicago_crime.crime` GROUP BY primary_type, block) `bigquery-public-data.chicago_crime.crime`)
The problem of my above query is it has an error now, and to me, this query is quite inefficient even I fixed the error. How can I fix and optimize the above query?How to work with regex in standard SQL:To count the most frequent type for each block, including both North and South, I have to deal with regex, for example, 033XX S WOOD ST, I should... less
I'm going through the ML Class on Coursera on Logistic Regression and also the Manning Book Machine Learning in Action. I'm trying to learn by implementing everything in... moreI'm going through the ML Class on Coursera on Logistic Regression and also the Manning Book Machine Learning in Action. I'm trying to learn by implementing everything in Python.I'm not able to understand the difference between the cost function and the gradient. There are examples on the net where people compute the cost function and then there are places where they don't and just go with the gradient descent function w :=w - (alpha) * (delta)w * f(w).What is the difference between the two if any?
I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the... moreI'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric.This is my true Data :
array()
My predicted Data:
array()
My code:
accuracy_score(y_true, y_pred, normalize=False)
Error message:
ValueError: Can't handle mix of binary and continuous target