I'm making a PUT request in order to upload data on Google Storage. But I'd like to upload big data, files around 2GB or so and I'd like to make a multi-part request. I mean, to upload an object in smaller parts and my application doesn't do it so far...Does anyone know if this is possible by using PUT method? As I saw on Google Cloud's documentation, they use POST method: https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload
But I'd like to use PUT method instead.
Advika Banerjee said: Traditional data warehouse environments are being overwhelmed by the soaring volumes and wide variety of data pouring in from cloud, mobile, social media, machine, sensor, and other sources. And the problem will only worsen as big data continues to grow. IT organizations that need to address performance degradation in warehouses approaching their capacity are already considering costly upgrades. However, an upgrade is not the most effective way to manage an excess of seldom-used data. Nor does it save valuable CPU cycles currently consumed by the need to execute compute-intensive extract, load, and transform (ELT) jobs. To keep pace with exploding data volumes, the data warehouse itself needs to evolve.
One emerging strategy is data warehouse optimization using Hadoop as an enterprise data hub to augment an existing warehouse infrastructure. By deploying the Hadoop framework to stage and process raw or rarely used data, you can reserve the warehouse for high-value information frequently accessed by business users. This white paper outlines a new reference architecture for this strategy, jointly developed by Informatica and Cloudera to help organizations speed time to value, maximize productivity, lower costs, and minimize risk.
You can use the odbc driver provided by cloudera.
http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html
For Irene, the you can use the same driver the above one is based the simba driver.
The new Data Warehouse Optimization (DWO) reference architecture specifically for Enterprise Data Hub deployments addresses the challenges facing traditional data warehouse infrastructures, where capacity is too quickly consumed by increasing data volumes, leading to performance bottlenecks and costly upgrades. The DWO architecture empowers companies to optimally deploy an Enterprise Data Hub, a central system to land and work with all data in a variety of ways, together with the tools, security and governance customers require. An Enterprise Data Hub is a complementary technology to data warehouse implementations, enabling them to store and process data at any scale, to dramatically reduce data warehouse costs, and to boost developer productivity by up to a factor of five.
The proven core building blocks for implementing the DWO architecture are Cloudera Enterprise, a subscription offering that combines CDH, Cloudera’s 100 percent open source distribution of Apache Hadoop, Cloudera Manager and Cloudera Navigator and Informatica PowerCenter Big Data Edition powered by Informatica Vibe. Informatica Vibe is the world’s first and only embeddable virtual data machine (VDM), with “map once, deploy anywhere” data integration.
“Legacy environments are not going away, but they need to be augmented by Hadoop-based solutions to meet the demands of big data,” said Todd Goldman, vice president and general manager, Enterprise Data Integration, Informatica. “The Cloudera and Informatica Data Warehouse Optimization reference architecture helps companies leverage their existing environment with emerging technologies using readily available skills, so organizations can more affordably and efficiently unlock the massive potential of big data.”
Fast-growing data volumes and new types of data sources, ranging from cloud and mobile apps to social media and machine data, are placing substantial demands on current data warehouse infrastructures. To optimize their data warehouse environments, organizations are seeking ways to support unlimited data volumes while leveraging industry-standard hardware and software to reduce infrastructure costs and existing skills to minimize operational costs. They are also seeking ways to support all types of data, and easily integrate new and existing types of infrastructure.
The DWO reference architecture addresses all these requirements through the combination of Informatica and Cloudera technologies. Informatica delivers a broad and mature set of data integration and data management capabilities around Hadoop. Cloudera Enterprise enables cost-effective, scalable storage and processing on commodity infrastructure, along with enterprise-grade security, high availability, cluster management, and low-latency querying. The joint reference architecture includes technologies and solutions that:
· Lower infrastructure and operational costs - Delivers the killer app on Cloudera, so organizations can cost-effectively scale data storage and processing on industry-standard hardware and open-source software using readily available resource skills.
· Use existing resource skills to staff projects - Many data warehouse organizations already have ETL developers and consultants on staff trained on Informatica. With the Informatica PowerCenter Big Data Edition, every Informatica developer is now a Hadoop developer without having to become a Hadoop expert. With Informatica's and Cloudera's world-class support and training organizations, users can staff the development and administration of data warehouse projects on Cloudera with readily available resource skills.
· Future proof the data warehouse and drive productivity - Informatica Vibe enables data integration and ETL processes to be written just once and deployed anywhere. This means that existing ETL processes created using Informatica's codeless visual development paradigm can be redeployed on Cloudera Enterprise with minimal effort, resulting in a more resilient data warehouse infrastructure and an up-to-5x productivity gain for developers. Rapid development is further enhanced with Informatica's Vibe for rapid ETL prototyping and Cloudera's Impala for real-time interactive queries to discover insights faster.
· Optimize data warehouse performance - Informatica PowerCenter Big Data Edition deploys on Cloudera Enterprise to load, profile, parse and transform for analysis of data in a high performance and cost-effective fashion. Optimal processing flows can be defined quickly using Informatica's visual design interface and extensive library of pre-built transforms.
· Handle virtually all types of data and sources - With Informatica, nearly all types of data - including legacy, ERP, CRM, social and machine - can be accessed and integrated through a variety of methods ranging from batch to replication, change data capture (CDC) and real-time streaming. Newly released Informatica Vibe Data Stream for Machine Data technology, for example, collects and streams high-volume, real-time machine data into Hadoop to drive new levels of operational intelligence.
· Ensure data quality - Informatica Data Quality Big Data Edition executes data quality and matching rules on Cloudera Enterprise to ensure trust in the data.
· Ensure enterprise-ready deployments that meet business SLAs - With Informatica's Vibe, "Map Once, Deploy Anywhere", virtual data machine technology, users can immediately deploy ETL jobs from development into production. The combination of Informatica's unified administration and Cloudera Manager makes it easy to manage ETL workloads on Cloudera for data warehouse projects.