I have to deal with very big data (Point clouds generally more than 30 000 000 points) using Matlab. I can read ascii data using textscan function. After reading, I need to detect... moreI have to deal with very big data (Point clouds generally more than 30 000 000 points) using Matlab. I can read ascii data using textscan function. After reading, I need to detect invalid data (points with 0,0,0 coordinates) and then I need to do some mathematical operations on each point or each line in the data. In my way, first I read data with textscan and then I assign this data to a matrix. Secondly, I use for loops for detecting invalid points and doing some mathematical operations on each point or line in the data. A sample of my code is shown as below. According to profile tool of Matlab textscan takes 37% and line
transformed_list((i:i),(1:4)) = coordinate_list((i:i),(1:4))*t_matrix;
takes 35% of all computation time.I tried it with another point cloud (stores around 5 500 000) and profile tool reported same results. Is there a way of avoiding for loops, or is there another way of speeding up this computation?
fileID = fopen('C:\Users\Mustafa\Desktop\ptx_all_data\dede5.ptx'); ... less
Docker data volumes is living within the host or boot2docker on the local VM(boot2docker).
Having big data from mongoDB running in a data container and mongoDB in another seems to... moreDocker data volumes is living within the host or boot2docker on the local VM(boot2docker).
Having big data from mongoDB running in a data container and mongoDB in another seems to be the way. Will this scale on Google Cloud Engine or Azure Virtuel Machines or other. I mean if all this is running within ONE Virtuel Machine, like boot2docker or other in the cloud. Normally you would scale VM's by creating new instances of VM's but how is this possible with Docker?
Sorry to ask this on StackExchange, but there is no category on dba - but StackExchange has a Category for Docker. less
I'm a big data architect with no skills with the cloud.
I have always worked with Hadoop on Premise, I know that servers locality is a very serious concern as it may apply higher... moreI'm a big data architect with no skills with the cloud.
I have always worked with Hadoop on Premise, I know that servers locality is a very serious concern as it may apply higher latency.
Today with Hadoop integration on the cloud I'm wondering :
If cloud providers ( AWS, AZURE ) have the possibility to offer hosts of the same cluster on the same locality to reduce the latency?
How do we manage the latency to transfer huge data from local machines to the cloud?
Nathan Marz in his book "Big Data" describes how to maintain files of data in HDFS and how to optimize files' sizes to be as near native HDFS block size as possible using... moreNathan Marz in his book "Big Data" describes how to maintain files of data in HDFS and how to optimize files' sizes to be as near native HDFS block size as possible using his Pail library running on top of Map Reduce.
Is it possible to achieve the same result in Google Cloud Storage?
Can I use Google Cloud Dataflow instead of MapReduce for this purpose?
I am wanting to start a data warehouse in Google Big Query but I'm not sure how to actually schedule jobs to get the data into the cloud.
To give some background. I have a MySQL... moreI am wanting to start a data warehouse in Google Big Query but I'm not sure how to actually schedule jobs to get the data into the cloud.
To give some background. I have a MySQL database hosted on-prem which I currently take a demp of each night as a backup. My idea is that I can send this dump to the Google Cloud and have it import the data into Big Query. I have thought that I could send the dump and probably use a cloud scheduler function to then run something that opens the dump and does this but I'm unsure how these services all fit together.
I'm a bit of a newby with the Google Cloud so if there is a better way to achieve this then I'm happy to change my plan of action.
Thanks in advance. less
Google announced beta version of a new machine learning environment today. Can someone update me on where Google CloudML stands? The docs seemed to have just changed overnight.... moreGoogle announced beta version of a new machine learning environment today. Can someone update me on where Google CloudML stands? The docs seemed to have just changed overnight. I'm guessing that commands that were
gcloud beta ml
are now
gcloud ml-engine
Having trouble parsing the notes released today:
https://cloud.google.com/ml-engine/docs/resources/release-notes
I was following the tutorial here.
https://cloud.google.com/blog/big-data/2016/12/how-to-train-and-classify-images-using-google-cloud-machine-learning-and-cloud-dataflow
What else should I be look out for? less
I have come across many NoSQL databases and SQL databases. There are varying parameters to measure the strength and weaknesses of these databases and scalability is one of them.... moreI have come across many NoSQL databases and SQL databases. There are varying parameters to measure the strength and weaknesses of these databases and scalability is one of them. What is the difference between horizontally and vertically scaling these databases?
I am trying to collect data from a .txt file and add it into a matrix in Matlab for plotting purposes, but there seems to be an error when collecting the data. It seems to be... moreI am trying to collect data from a .txt file and add it into a matrix in Matlab for plotting purposes, but there seems to be an error when collecting the data. It seems to be happening with the time record.
I am using the following code snippet.
Error using textscan
Unable to parse the format character vector at position 16 ==> %{HH:MM:SS}T %f %f %f %f %f %f %f %d %d %d %f %f %f %f %f %f %f %f %f %f
%f %f %f %f... less
Could someone please let me know what does it mean by 'Big Data implementation over Cloud'
I have been using Amazon S3 to store data and query using hive, which I read is one of... moreCould someone please let me know what does it mean by 'Big Data implementation over Cloud'
I have been using Amazon S3 to store data and query using hive, which I read is one of the cloud implementation. I would like to know what exactly does this mean and all possible ways to implement it.
Thanks
What exactly is the difference between Apache's Mesos and Google's Kubernetes? I understand both are server cluster management software. Can anyone elaborate where the main... moreWhat exactly is the difference between Apache's Mesos and Google's Kubernetes? I understand both are server cluster management software. Can anyone elaborate where the main differences are - when would which framework be preferred?
Why would you want to use Kubernetes on top of Mesosphere?