I'm making a PUT request in order to upload data on Google Storage. But I'd like to upload big data, files around 2GB or so and I'd like to make a multi-part request. I mean, to... moreI'm making a PUT request in order to upload data on Google Storage. But I'd like to upload big data, files around 2GB or so and I'd like to make a multi-part request. I mean, to upload an object in smaller parts and my application doesn't do it so far...Does anyone know if this is possible by using PUT method? As I saw on Google Cloud's documentation, they use POST method: https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload
But I'd like to use PUT method instead.
I'm confronting a strange situation here with Google Cloud SQL.
I'm migrating a 15.7Gb mysql database to Google cloud. I've followed the migration process exactly as the doc says.... moreI'm confronting a strange situation here with Google Cloud SQL.
I'm migrating a 15.7Gb mysql database to Google cloud. I've followed the migration process exactly as the doc says. And everything worked perfectly. Absolutely no issue during the process, my application works just fine. The only problem here is that the size used by the DB shown on Google Cloud is much bigger that the original DB. Right now I have a 39Gb sql database, from a 15.7Gb database.
After some research and testing I've come to the conclusion that it's the way that Google count the data on their side.
I just wanted to know if somebody have any idea, or can confirm what I'm saying.
Thank you for your answers. less
I have an idea on how to extract Table data to Cloud storage using Bq extract command but I would like rather like to know, if there are any options to extract a Big Query table... moreI have an idea on how to extract Table data to Cloud storage using Bq extract command but I would like rather like to know, if there are any options to extract a Big Query table as NewLine Delimited JSON to Local Machine?
I could extract Table data to GCS via CLI and also download JSON data from WEB UI but I am looking for solution using BQ CLI to download table data as JSON in Local machine?. I am wondering is that even possible?
traceback:
Traceback (most recent call last):
File "/Users/soubhagyapradhan/Desktop/upwork/baby/data-science/env/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/Users/soubhagyapradhan/Desktop/upwork/baby/data-science/env/lib/python3.8/site-packages/grpc/_channel.py", line 923, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Users/soubhagyapradhan/Desktop/upwork/baby/data-science/env/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details =... less
I am looking for so guidance and tips in understanding what would it take to do a reasonable Hadoop Proof of Concept in the Cloud? I am a complete noob to the Big Data Analytics... moreI am looking for so guidance and tips in understanding what would it take to do a reasonable Hadoop Proof of Concept in the Cloud? I am a complete noob to the Big Data Analytics world and I will be more than happy for some suggestions that you might have based on your experience?
I'm relatively new to GCP and just starting to setup/evaluate my organizations architecture on GCP.
Scenario:Data will flow into a pub/sub topic (high frequency, low amount of... moreI'm relatively new to GCP and just starting to setup/evaluate my organizations architecture on GCP.
Scenario:Data will flow into a pub/sub topic (high frequency, low amount of data). The goal is to move that data into Big Table. From my understanding you can do that either with a having a cloud function triggering on the topic or with Dataflow.
Now I have previous experience with cloud functions which I am satisfied with, so that would be my pick.
I fail to see the benefit of choosing one over the other. So my question is when to choose what of these products?
Thanks less
Ops: This does not belong to ServerFault because it focuses on Programing Architecture.
I have following questions regarding differences between Cloud and... moreOps: This does not belong to ServerFault because it focuses on Programing Architecture.
I have following questions regarding differences between Cloud and Virtualization..
How Cloud is different then Virtualization?
Currently I tried to find out pricing of Rackspace, Amazone and all similar cloud providers, I found that our current 6 dedicated servers came cheaper then their pricing. So how one can claim cloud is cheaper? Is it cheaper only in comparison of normal hosting?
We re organized our infrastructure in virtual environment to reduce or configuration overhead at time of failure, we did not have to rewrite any peice of code that is already written for earlier setup. So moving to virtualization does not require any re programming. But cloud is absoltely different and it will require entire reprogramming right?
Is it really worth to recode when our current IT costs are 3-4 times lower then cloud hosting including raid backups and all sort of clustering for high availability?
I'm using google-bigquery on Chicago crime dataset. However, I want to find out the most frequent crime type from primary_type column for each distinct block. To do so, I come up... moreI'm using google-bigquery on Chicago crime dataset. However, I want to find out the most frequent crime type from primary_type column for each distinct block. To do so, I come up following standardSQL.Data:Since the Chicago crime data is rather big, there is an official website where you can preview the dataset:crime data on Google cloudMy current standard SQL:
SELECT primary_type,block, COUNT(*) as count
FROM `bigquery-public-data.chicago_crime.crime`
HAVING COUNT(*) = (SELECT MAX(count)
FROM (SELECT primary_type, COUNT(*) as count FROM `bigquery-public-data.chicago_crime.crime` GROUP BY primary_type, block) `bigquery-public-data.chicago_crime.crime`)
The problem of my above query is it has an error now, and to me, this query is quite inefficient even I fixed the error. How can I fix and optimize the above query?How to work with regex in standard SQL:To count the most frequent type for each block, including both North and South, I have to deal with regex, for example, 033XX S WOOD ST, I should... less
Task: We have to setup a periodic sync of records from Spanner to Big Query. Our Spanner database has a relational table hierarchy.
Option Considered I was thinking of using... moreTask: We have to setup a periodic sync of records from Spanner to Big Query. Our Spanner database has a relational table hierarchy.
Option Considered I was thinking of using Dataflow templates to setup this data pipeline.
Option1: Setup a job with Dataflow template 'Cloud Spanner to Cloud Storage Text' and then another with Dataflow template 'Cloud Storage Text to BigQuery'. Con: The first template works only on a single table and we have many tables to export.
Option2: Use 'Cloud Spanner to Cloud Storage Avro' template which exports the entire database. Con: I only need to export selected tables within a database and I don't see a template to import Avro into Big Query.
Questions: Please suggest what is the best option for setting up this pipeline less
I am trying to export some data from Big Query. This is done by first saving the table and then exporting it to google cloud storage. This used to work just fine but recently... moreI am trying to export some data from Big Query. This is done by first saving the table and then exporting it to google cloud storage. This used to work just fine but recently apparently some tables have nested schemas so exporting as csv does not work anymore. Exporting as a JSON should work, and the export job claims to succeed, but data is not available on google cloud storage. Anyone experiencing similar issues? Is Google having some problems?
In a Python 3.5 notebook, backed by an Apache Spark service, I had installed BigDL 0.2 using pip. When removing that installation and trying to install version 0.3 of BigDL, I... moreIn a Python 3.5 notebook, backed by an Apache Spark service, I had installed BigDL 0.2 using pip. When removing that installation and trying to install version 0.3 of BigDL, I get this error: (linebreaks added for readability)
AssertionError: Multiple .dist-info directories:
/gpfs/fs01/user/scbc-4dbab79416a6ec-4cf890276e2b/.local/lib/python3.5/site-packages/BigDL-0.3.0.dist-info,
/gpfs/fs01/user/scbc-4dbab79416a6ec-4cf890276e2b/.local/lib/python3.5/site-packages/BigDL-0.2.0.dist-info
However, neither of these directories exists:
!ls -al /gpfs/fs01/user/scbc-4dbab79416a6ec-4cf890276e2b/.local/lib/python3.5/site-packages/
total 0
drwx------ 2 scbc-4dbab79416a6ec-4cf890276e2b users 4096 Nov 8 06:12 .
drwx------ 3 scbc-4dbab79416a6ec-4cf890276e2b users 4096 Nov 8 06:12 .. less
I need to ETL data into my Cloud SQL instance. This data comes from API calls. Currently, I'm running a custom Java ETL code in Kubernetes with Cronjobs that makes request to... moreI need to ETL data into my Cloud SQL instance. This data comes from API calls. Currently, I'm running a custom Java ETL code in Kubernetes with Cronjobs that makes request to collect this data and load it on Cloud SQL. The problem comes with managing the ETL code and monitoring the ETL jobs. The current solution may not scale well when more ETL processes are incorporated. In this context, I need to use an ETL tool.
My Cloud SQL instance contains two types of tables: common transactional tables and tables that contains data that comes from the API. The second type is mostly read-only in a "operational database perspective" and a huge part of the tables are bulk updated every hour (in batch) to discard the old data and refresh the values.
Considering this context, I noticed that Cloud Dataflow is the ETL tool provided by GCP. However, it seems that this tool is more suitable for big data applications that needs to do complex transformations and ingest data in multiple formats. Also, in Dataflow, the... less
'ascii' codec can't encode character u'\u2019' in position 80: ordinal not in range(128)
Traceback (most recent call last):
File "C:\Program Files... more'ascii' codec can't encode character u'\u2019' in position 80: ordinal not in range(128)
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1102, in __call__
return handler.dispatch()
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File... less
I am a student and studying Computer Science. I am taking up Big Data Course this semester. As a part of curriculum, I am supposed to develop the Private Cloud using Ubuntu... moreI am a student and studying Computer Science. I am taking up Big Data Course this semester. As a part of curriculum, I am supposed to develop the Private Cloud using Ubuntu machines and other team needs to build up the authentication techniques for that private cloud. I am very new to bigdata and cloud even though i have basic understanding on concepts. I was going through the internet to see the resources on how to build cloud i came up with openstack which works best with ubuntu but before going any further into it i wanted to take the suggestions from stackoverflow community if they can guide me on the best tools and technology i can use to setup the private cloud. We now need to setup very small private cloud as a research project. Can anyone help me give me an idea on tools and technology please ? Appreciate the time.
I'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for... moreI'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for so many years.
I think of Data Lake as something very linked to big data, but where is the line between using Data Lake technolgies vs SSIS?
Is there any advantage of using Data Lake technologies with 25MB ~100MB ~ 300MB files? Parallelism? flexibility? Extensible in the future? Is there any performance gain when the files to be loaded are not so big as U-SQL best scenario...
What are your thoughts? Would it be like using a hammer to crack a nut? Please, don't hesitate to ask me any questions to clarify the situation. Thanks in advance!!
21/03 EDIT More clarifications:
has to be on the cloud
the reason I considered about using ADL is because there is no substitution for SSIS in the cloud. There is ADF, but it's not the same, it orchestrates the data, but it's not so flexible as SSIS
I thought I could use U-SQL for some... less