Latest News

Monday, October 30, 2023

What is Hive ?

A distributed, fault-tolerant data warehousing system, Apache Hive allows for large-scale analytics. Hive Metastore (HMS) is an essential part of many data lake systems because it offers a central repository of metadata that can be readily analysed to make data-driven choices. Hive is based on Apache Hadoop and uses HDFS to provide storage on S3, adls, gs, and other platforms. SQL can be used by Hive users to read, write, and manage petabytes of data.


Hive Metastore Server (HMS):

  • Using the metastore service API, clients (such as Hive, Impala, and Spark) can access the central repository of metadata for Hive tables and partitions in a relational database, which is called the Hive Metastore (HMS). 

  • It is now a fundamental component of data lakes that make use of the wide range of open-source tools, including Apache Spark and Presto. 

  • Actually, the Hive Metastore serves as the foundation for an entire ecosystem of tools, some of which are depicted in this diagram.







Hive ACID:

Hive provides full acid support for ORC tables out and insert only support to all other formats.

ACID stands for four traits of database transactions:  
  1. Atomicity (an operation either succeeds completely or fails, it does not leave partial data).
  2. Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation).
  3. Isolation (an incomplete operation by one user does not cause unexpected side effects for other users).
  4. Durability (once an operation is complete it will be preserved even in the face of machine or system failure).

These traits have long been expected of database systems as part of their transaction functionality.  
  • Google+
  • Pinterest
« PREV
NEXT »

No comments

Post a Comment