- Modern Big Data Processing with Hadoop
- V. Naresh Kumar Prashant Shindgikar
- 90字
- 2025-04-04 17:12:20
Parquet
Parquet stores nested data structures in a flat columnar format. Parquet is more efficient in terms of storage and performance than any row-level file formats. Parquet stores binary data in a column-oriented way. In the Parquet format, new columns are added at the end of the structure. Cloudera mainly supports this format for Impala implementation but is aggressively becoming popular recently. This format is good for SQL queries, which read particular columns from a wide table having many columns because only selective columns are read to reduce I/O cost.