QBoard » Big Data » Big Data - Data Storage : Hive, HBase, MongoDB, Teradata.. » How to process all Hbase data with Hive

How to process all Hbase data with Hive

  • I have a HBase with the 750GB data. All data in the HBase are time series sensor data. And, my row key design is like this;

    deviceID,sensorID,timestamp

    I want to prepare all data in the hbase for batch processing(for example, CSV format on the HDFS). But there is a lot of data in the hbase. Can I prepare data using hive without getting data partially? Because, if I will get data using sensor id(scan query with start-end row), I must specify start and end row for each time. I don't want do this.

      August 7, 2020 2:56 PM IST
    0
    1. Step-1:Create HBase-Hive Integrated table: hive> CREATE EXTERNAL TABLE <db_name>. ...
    2. Step-2:Create Hive Dump of Hbase table: hive> create table <db_name>. ...
    3. Step-3: Exporting to CSV format: hive> INSERT OVERWRITE DIRECTORY <hdfs_directory> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from <db_name>.
      December 24, 2021 12:34 PM IST
    0
  • You can try using Hive-Hbase integration and then map hbase table data to hive table.

    Then by using Hive-Hbase table we can create full dump of Hbase table to Regular Hive table(orc,parquet..etc).

    Step-1:Create HBase-Hive Integrated table:
    hive> CREATE EXTERNAL TABLE <db_name>.<hive_hbase_table_name> (key int, value string) 
          STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
          WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
          TBLPROPERTIES ("hbase.table.name" = "<hbase_table_name>");​

    Step-2:Create Hive Dump of Hbase table:

    hive> create table <db_name>.<table_name> stored as orc as 
             select * from <db_name>.<hive_hbase_table_name>;
    Step-3: Exporting to CSV format:

    hive> INSERT OVERWRITE  DIRECTORY <hdfs_directory> 
          ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
          select * from <db_name>.<hive_hbase_table_name>;
    Refer to this link for more details/options regards to exporting hive table. This post was edited by Rakesh Racharla at August 7, 2020 3:02 PM IST
      August 7, 2020 3:01 PM IST
    0
  • A change to Hive in HDP 3.0 is that all StorageHandlers must be marked as “external”. There is no such thing as an non-external table created by a StorageHandler. If the corresponding HBase table exists when the Hive table is created, it will mimic the HDP 2.x semantics of an “external” table. If the corresponding HBase table does not exist when the Hive table is created, it will mimic the HDP 2.x semantics of a non-external table (e.g. the HBase table is dropped when the Hive table is dropped).

    From the Hive shell, create a HBase table:
    CREATE EXTERNAL TABLE hbase_hive_table (key int, value string) 
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
    TBLPROPERTIES ("hbase.table.name" = "hbase_hive_table", "hbase.mapred.output.outputtable" = "hbase_hive_table");​
    1. The hbase.columns.mapping property is mandatory. The hbase.table.name property is optional.The hbase.mapred.output.outputtable property is optional; It is needed, if you plan to insert data to the table


      July 31, 2021 4:24 PM IST
    0
  • You can access the existing HBase table through Hive using the CREATE EXTERNAL TABLE:
    CREATE EXTERNAL TABLE hbase_table_2(key int, value string) 
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key
    ,cf1:val")
    TBLPROPERTIES("hbase.table.name" = "some_existing_table", "hbase.mapred.output.outputtable" = "some_existing_table");

     

    • You can use different type of column mapping to map the HBase columns to Hive:
      • Multiple Columns and Families

        To define four columns, the first being the rowkey: “:key,cf:a,cf:b,cf:c”

      • Hive MAP to HBase Column Family

        When the Hive datatype is a Map, a column family with no qualifier might be used. This will use the keys of the Map as the column qualifier in HBase: “cf:”

      • Hive MAP to HBase Column Prefix

        When the Hive datatype is a Map, a prefix for the column qualifier can be provided which will be prepended to the Map keys: “cf:prefix_.*”

        Note: The prefix is removed from the column qualifier as compared to the key in the Hive Map. For example, for the above column mapping, a column of “cf:prefix_a” would result in a key in the Map of “a”.

    • You can also define composite row keys. Composite row keys use multiple Hive columns to generate the HBase row key.
      • Simple Composite Row Keys

        A Hive column with a datatype of Struct will automatically concatenate all elements in the struct with the termination character specified in the DDL.

      • Complex Composite Row Keys and HBaseKeyFactory

        Custom logic can be implemented by writing Java code to implement a KeyFactory and provide it to the DDL using the table property key “hbase.composite.key.factory”.

      December 22, 2021 1:27 PM IST
    0