Apache Arrow & Parquet

Comments revolve around Apache Arrow and Parquet as columnar data formats, with users comparing them to the discussed project, suggesting integrations, and highlighting differences and benefits for efficient data storage and processing.

➡️ Stable 0.8x Databases
2,408
Comments
18
Years Active
5
Top Authors
#9403
Topic ID

Activity Over Time

2009
3
2010
1
2011
5
2012
5
2013
19
2014
15
2015
32
2016
110
2017
122
2018
83
2019
76
2020
258
2021
303
2022
206
2023
317
2024
438
2025
397
2026
18

Keywords

DataLoader CPU S3 JS PyTorch AWS executor.rs CSV traildb.io slideshare.net parquet arrow apache format data formats s3 pytorch pandas memory

Sample Comments

deepsun Jul 4, 2025 View on HN

Unless it's Apache Arrow or Parquet.

lesotholand Feb 10, 2021 View on HN

Will it use Apache Arrow data format?

kstenerud Feb 12, 2021 View on HN

Have a look at Apache Arrowhttps://arrow.apache.org/

eternalban Jan 9, 2023 View on HN

I grep'd for parquet and yours is the only comment that mentions it. parquet -> arrow -> AI

stevesimmons Nov 10, 2020 View on HN

Arrow + Parquet is brilliant!Right now I'm writing tools in Python (Python!) to analyse several 100TB datasets in S3. Each dataset is made up of 1000+ 6GB parquet files (tables UNLOADed from AWS Redshift db). Parquet's columnar compression gives a 15x reduction in on-disk size. Parquet also stores chunk metadata at the end of each file, allowing reads to skip over most data that isn't relevant.And once in memory, the Arrow format gives zero-copy compatibility with Numpy and

monstrado Jul 28, 2020 View on HN

Any plans on integration with Apache Arrow?

wesm Oct 31, 2017 View on HN

Apache Arrow is not competing with Apache Parquet or Apache ORC.

nosefouratyou Jul 15, 2017 View on HN

What's the difference between Apache Parquet and Apache Arrow? They are both columnar formats right?

polskibus Jul 17, 2019 View on HN

How does this compare to dremio, that also uses Apache Arrow? Is this a competitor?

brutuscat Apr 27, 2022 View on HN

isn't it what Apache Arrow [1], Apache CarbonData [2] and others are for?[1] https://arrow.apache.org/docs/format/Columnar.html[2] https://carbondata.apache.org/introduction.html