A distributed, fault-tolerant data warehousing system, Apache Hive allows for large-scale analytics. Hive Metastore (HMS) is an essential part of many data lake systems because it offers a central repository of metadata that can be readily analysed to make data-driven choices. Hive is based on Apache Hadoop and uses HDFS to provide storage on S3, adls, gs, and other platforms. SQL can be used by Hive users to read, write, and manage petabytes of data.
Hive Metastore Server (HMS):
- Using the metastore service API, clients (such as Hive, Impala, and Spark) can access the central repository of metadata for Hive tables and partitions in a relational database, which is called the Hive Metastore (HMS).
- It is now a fundamental component of data lakes that make use of the wide range of open-source tools, including Apache Spark and Presto.
- Actually, the Hive Metastore serves as the foundation for an entire ecosystem of tools, some of which are depicted in this diagram.
Hive ACID:
- Atomicity (an operation either succeeds completely or fails, it does not leave partial data).
- Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation).
- Isolation (an incomplete operation by one user does not cause unexpected side effects for other users).
- Durability (once an operation is complete it will be preserved even in the face of machine or system failure).
Social Buttons