Apache Arrow & Parquet

Comments revolve around Apache Arrow and Parquet as columnar data formats, with users comparing them to the discussed project, suggesting integrations, and highlighting differences and benefits for efficient data storage and processing.

➡️ Stable 0.8x Databases

2,408

Comments

Years Active

Top Authors

#9403

Topic ID

Activity Over Time

2009

2010

2011

2012

2013

2014

2015

2016

110

2017

122

2018

2019

2020

258

2021

303

2022

206

2023

317

2024

438

2025

397

2026

Top Contributors

lmeyerov (41) wenc (32) MrPowers (27) wesm (26) andygrove (18)

Keywords

DataLoader CPU S3 JS PyTorch AWS executor.rs CSV traildb.io slideshare.net parquet arrow apache format data formats s3 pytorch pandas memory

Sample Comments

deepsun • Jul 4, 2025 • View on HN

Unless it's Apache Arrow or Parquet.

lesotholand • Feb 10, 2021 • View on HN

Will it use Apache Arrow data format?

kstenerud • Feb 12, 2021 • View on HN

Have a look at Apache Arrowhttps://arrow.apache.org/

eternalban • Jan 9, 2023 • View on HN

I grep'd for parquet and yours is the only comment that mentions it. parquet -> arrow -> AI

stevesimmons • Nov 10, 2020 • View on HN

Arrow + Parquet is brilliant!Right now I'm writing tools in Python (Python!) to analyse several 100TB datasets in S3. Each dataset is made up of 1000+ 6GB parquet files (tables UNLOADed from AWS Redshift db). Parquet's columnar compression gives a 15x reduction in on-disk size. Parquet also stores chunk metadata at the end of each file, allowing reads to skip over most data that isn't relevant.And once in memory, the Arrow format gives zero-copy compatibility with Numpy and

monstrado • Jul 28, 2020 • View on HN

Any plans on integration with Apache Arrow?

wesm • Oct 31, 2017 • View on HN

Apache Arrow is not competing with Apache Parquet or Apache ORC.

nosefouratyou • Jul 15, 2017 • View on HN

What's the difference between Apache Parquet and Apache Arrow? They are both columnar formats right?

polskibus • Jul 17, 2019 • View on HN

How does this compare to dremio, that also uses Apache Arrow? Is this a competitor?

brutuscat • Apr 27, 2022 • View on HN

isn't it what Apache Arrow [1], Apache CarbonData [2] and others are for?[1] https://arrow.apache.org/docs/format/Columnar.html[2] https://carbondata.apache.org/introduction.html