I am wanting to start a data warehouse in Google Big Query but I'm not sure how to actually schedule jobs to get the data into the cloud.
To give some background. I have a MySQL database hosted on-prem which I currently take a demp of each night as a backup. My idea is that I can send this dump to the Google Cloud and have it import the data into Big Query. I have thought that I could send the dump and probably use a cloud scheduler function to then run something that opens the dump and does this but I'm unsure how these services all fit together.
I'm a bit of a newby with the Google Cloud so if there is a better way to achieve this then I'm happy to change my plan of action.
Thanks in advance.
As the new EXTERNAL_QUERY has been launched and you can query from BigQuery a Cloud SQL instance, your best shot right now is:
You get this way a live access to your relational database as:
Example query that you run on BigQuery:
SELECT * EXTERNAL_QUERY(
'connection_id',
'''SELECT * FROM mysqltable AS c ORDER BY c.customer_id'');
You can even join Bigquery table with SQL table:
Example:
SELECT c.customer_id, c.name, SUM(t.amount) AS total_revenue,
rq.first_order_date
FROM customers AS c
INNER JOIN transaction_fact AS t ON c.customer_id = t.customer_id
LEFT OUTER JOIN EXTERNAL_QUERY(
'connection_id',
'''SELECT customer_id, MIN(order_date) AS first_order_date
FROM orders
GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;
'use strict';
const Storage = require('@google-cloud/storage');
const BigQuery = require('@google-cloud/bigquery');
// Instantiates a client
const storage = Storage();
const bigquery = new BigQuery();
/**
* Creates a BigQuery load job to load a file from Cloud Storage and write the data into BigQuery.
*
* @param {object} data The event payload.
* @param {object} context The event metadata.
*/
exports.loadFile = (data, context) => {
const datasetId = 'Your_Dataset_name';
const tableId = 'Your_Table_ID';
const jobMetadata = {
skipLeadingRows: 1,
writeDisposition: 'WRITE_APPEND'
};
// Loads data from a Google Cloud Storage file into the table
bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(data.bucket).file(data.name), jobMetadata)
.catch(err => {
console.error('ERROR:', err);
});
console.log(`Loading from gs://${data.bucket}/${data.name} into ${datasetId}.${tableId}`);
};
Then create your BigQuery dataset using your desired schema
And now you can upload your csv file into your bucket and you will see the uploaded data in your bigquery.