Apache Arrow & Parquet
Comments revolve around Apache Arrow and Parquet as columnar data formats, with users comparing them to the discussed project, suggesting integrations, and highlighting differences and benefits for efficient data storage and processing.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Unless it's Apache Arrow or Parquet.
Will it use Apache Arrow data format?
Have a look at Apache Arrowhttps://arrow.apache.org/
I grep'd for parquet and yours is the only comment that mentions it. parquet -> arrow -> AI
Arrow + Parquet is brilliant!Right now I'm writing tools in Python (Python!) to analyse several 100TB datasets in S3. Each dataset is made up of 1000+ 6GB parquet files (tables UNLOADed from AWS Redshift db). Parquet's columnar compression gives a 15x reduction in on-disk size. Parquet also stores chunk metadata at the end of each file, allowing reads to skip over most data that isn't relevant.And once in memory, the Arrow format gives zero-copy compatibility with Numpy and
Any plans on integration with Apache Arrow?
Apache Arrow is not competing with Apache Parquet or Apache ORC.
What's the difference between Apache Parquet and Apache Arrow? They are both columnar formats right?
How does this compare to dremio, that also uses Apache Arrow? Is this a competitor?
isn't it what Apache Arrow [1], Apache CarbonData [2] and others are for?[1] https://arrow.apache.org/docs/format/Columnar.html[2] https://carbondata.apache.org/introduction.html