Data Wrangling and ETL

The cluster discusses challenges in cleaning, transforming, and preparing messy data, the time-intensive nature of data wrangling for analysts and scientists, and tools or workflows to simplify ETL processes without heavy custom coding.

➡️ Stable 0.6x Databases

2,940

Comments

Years Active

Top Authors

#3274

Topic ID

Activity Over Time

2007

2008

2009

2010

2011

2012

2013

113

2014

111

2015

108

2016

162

2017

169

2018

189

2019

233

2020

284

2021

276

2022

247

2023

294

2024

228

2025

275

2026

Top Contributors

danso (22) hermitcrab (12) dagw (8) minimaxir (8) lincpa (8)

Keywords

EVERYTHING AI DBA SQL OP DE ETL XML data.sets PM data etl cleaning format transform rows processing organize data processing half baked

Sample Comments

bjoli • Feb 20, 2025 • View on HN

I think that the data isn't really the problem, it is the workflow.

nsonha • Aug 13, 2024 • View on HN

why not, many data related tasks are rather ad-hoc, it's a waste of time to make a long lasting software out of every ad-hoc request

dcnstrct • Nov 11, 2010 • View on HN

Wow this is amazing! In the real world data can be messy and this looks like a great tool to transform it without an extensive custom ETL process that requires code

GFK_of_xmaspast • Dec 23, 2016 • View on HN

Why are you doing data wrangling in production?

Leftium • Aug 16, 2023 • View on HN

I think you're interested in keyword: ETL (Extract, transform, load)(Specifically the "transform" part.)- https://en.wikipedia.org/wiki/Extract,_transform,_load- https://hn.algolia.com/?q=etl

jnpatel • Mar 21, 2016 • View on HN

What's a scalable way to handle data cleaning, while still keeping things simple?

lastofus • Jan 20, 2018 • View on HN

What are some if the types of problems that needed to be solved when cleaning data that required heavy tooling?

Muddassar • Oct 27, 2019 • View on HN

Data Scientists and Analysts spend 70-80% of their time cleaning and preparing data. Hope this can help cut down the time by half.

nicbou • Mar 30, 2021 • View on HN

There's quite a bit of data wrangling happening both in the backend and in the frontend.

gryn • Jan 29, 2022 • View on HN

yes I'm aware it's nitpicking, which is why I said I agree and that it just need more clarification.but you don't need machine learning to hit these cases just a mismatch between what want to do and how/where it's stored.examples:- a noSQL db that cant directly do what you want and now you need to download lots of document to string them together before updating that field.- a SQL data schema not matching your tasks where need lots and lots of joins and aggre