QBoard » Big Data » Big Data - Data Storage : Hive, HBase, MongoDB, Teradata.. » Hive alternative for big data query

Hive alternative for big data query

  • From the official Hive documentation:

    Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.

    I'm not an expert about database architecture, and I would like to know if there is an alternative when the assumption above is not true, that is, when queries are made over a big data set.

      September 22, 2021 2:40 PM IST
    0
    • Apache Impala. It is an open-source parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop. ...
    • Presto DB. Presto is another alternative for HIVE developed by facebook. ...
    • Spark SQL. ...
    • Shark. ...
    • BigSQL by IBM.
      November 26, 2021 12:24 PM IST
    0
  • There are a couple of alternatives to make the queries run significantly faster. I would't go into details of those but you can explore the following:

    1. Cloudera Impala : Developed by cloudera http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html

    2. Presto DB: Developed by Facebook http://prestodb.io/

    3. Spark SQL : Build on top of Spark (https://spark.apache.org/sql/)

    There are a lot of nice articles comparing Hive vs Impala vs Presto and comparing their performances. You can read about them and pick the one which best suits your use case. This is one link which compares their advantages and disadvantages: http://bigdatanerd.wordpress.com/2013/11/19/war-on-sql-over-hadoop/

      September 23, 2021 1:52 PM IST
    0
  • From your question i can make out that you want to decrease the latency in query ... but you are ok with hdfs as a datastore .... you have many alternatives like presto and spark sql ... both of them seemlessly integrate with hive but have considerable performance benefits ..... the other alternative can be to shift the datastore to a no sql database .... if you want to use HDFS as the datastore hbase can provide some performance benefit .... others can be mongo , cassandra etc

     
      October 2, 2021 2:16 PM IST
    0
  • From your question i can make out that you want to decrease the latency in query ... but you are ok with hdfs as a datastore .... you have many alternatives like presto and spark sql ... both of them seemlessly integrate with hive but have considerable performance benefits ..... the other alternative can be to shift the datastore to a no sql database .... if you want to use HDFS as the datastore hbase can provide some performance benefit .... others can be mongo , cassandra etc
      October 23, 2021 4:18 PM IST
    0