Apache Hive is an open-source
data warehouse infrastructure that provides tools for data summarization, query and analysis. It is specifically designed to support the analysis of large datasets stored in
Hadoop files and compatible file systems, such as Amazon S3. Hive was initially developed by data engineers at Facebook in 2008, but is now used by many other companies.