QBoard » Big Data » Big Data - Hadoop Eco-System » Hbase quickly count number of rows

Hbase quickly count number of rows

  • Right now I implement row count over ResultScanner like this

    for (Result rs = scanner.next(); rs != null; rs = scanner.next()) {
        number++;
    }
     

    If data reaching millions time computing is large.I want to compute in real time that i don't want to use Mapreduce

    How to quickly count number of rows.

      September 28, 2020 1:27 PM IST
    0
  • Go to Hbase home directory and run this command,

    ./bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'namespace:tablename'

    This will launch a mapreduce job and the output will show the number of records existing in the hbase table.
      August 18, 2021 2:29 PM IST
    0
  • If you're using a scanner, in your scanner try to have it return the least number of qualifiers as possible. In fact, the qualifier(s) that you do return should be the smallest (in byte-size) as you have available. This will speed up your scan tremendously.

    Unfortuneately this will only scale so far (millions-billions?). To take it further, you can do this in real time but you will first need to run a mapreduce job to count all rows.

    Store the Mapreduce output in a cell in HBase. Every time you add a row, increment the counter by 1. Every time you delete a row, decrement the counter.

    When you need to access the number of rows in real time, you read that field in HBase.

    There is no fast way to count the rows otherwise in a way that scales. You can only count so fast.

      January 1, 2022 2:14 PM IST
    0
  • Use RowCounter in HBase RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit.

    $ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>
    
    Usage: RowCounter [options] 
        <tablename> [          
            --starttime=[start] 
            --endtime=[end] 
            [--range=[startKey],[endKey]] 
            [<column1> <column2>...]
        ]​
      September 28, 2020 5:49 PM IST
    0
  • Simple, Effective and Efficient way to count row in HBASE:

        1.Whenever you insert a row trigger this API which will increment that particular cell.

     
     Htable.incrementColumnValue(Bytes.toBytes("count"), Bytes.toBytes("details"), Bytes.toBytes("count"), 1);​


        2.To check number of rows present in that table. Just use "Get" or "scan" API for that particular Row 'count'.

    By using this Method you can get the row count in less than a millisecond. This post was edited by Nitara Bobal at September 28, 2020 5:54 PM IST
      September 28, 2020 5:51 PM IST
    0
  • You can use the count method in hbase to count the number of rows. But yes, counting rows of a large table can be slow.count 'tablename' [interval]

    Return value is the number of rows.

    This operation may take a LONG time (Run ‘$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount’ to run a counting mapreduce job). Current count is shown every 1000 rows by default. Count interval may be optionally specified. Scan caching is enabled on count scans by default. Default cache size is 10 rows. If your rows are small in size, you may want to increase this parameter.

    Examples:

    hbase> count 't1'
    
    hbase> count 't1', INTERVAL => 100000
    
    hbase> count 't1', CACHE => 1000
    
    hbase> count 't1', INTERVAL => 10, CACHE => 1000

    The same commands also can be run on a table reference. Suppose you had a reference to table 't1', the corresponding commands would be:

    hbase> t.count
    
    hbase> t.count INTERVAL => 100000
    
    hbase> t.count CACHE => 1000
    
    hbase> t.count INTERVAL => 10, CACHE => 1000
      September 29, 2020 1:10 PM IST
    0
    • Viaan Prakash
      Viaan Prakash This counter runs very slow and can be accessed from hbase shell only. For large tables its not recommended to use
      September 29, 2020
  • To count the Hbase table record count on a proper YARN cluster you have to set the map reduce job queue name as well:

    hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dmapreduce.job.queuename= < Your Q Name which you have SUBMIT access>
     < TABLE_NAME>
     
      September 29, 2020 1:12 PM IST
    0