Writen by
Devil
9:25 PM
-
0
Comments
Hive stores the DATA into HDFS and SCHEMA into RDBMS (Derby, SQL, etc.)
- When user creates table, a schema is created in RDBMS
- When data is entered, files are created in HDFS. User can also directly put files into HDFS without interacting with RDBMS.
- Schema while reading data concept - Now when table is read - then Hive will check the schema and most importantly line delimiter and field delimiter.
As per delimiters rows and fields will be read from file. And a table will be formed to send to user.
e.g.
As per table definition line delimiter is '\n' (new line) and field delimiter is ',' (comma)
Then file in HDFS would -
1,Employee_Name1,1000
2,Employee_Name2,2000
And while reading this file Hive would assign the 2 rows and 3 columns each to the table.
Interesting part -
- Now even if the file we put directly into HDFS is anything like lyrics of song. Then also Hive will not throw any exception.
- Hive will just check line delimiter to create multiple rows of table. And check field delimiter to check for multiple columns in a row.
- Now if any line/field delimiter is not present in the file then all the data of song lyrics would be put inside first column of first row in table.
No comments
Post a Comment