Data Wrangling and ETL

The cluster discusses challenges in cleaning, transforming, and preparing messy data, the time-intensive nature of data wrangling for analysts and scientists, and tools or workflows to simplify ETL processes without heavy custom coding.

➡️ Stable 0.6x Databases
2,940
Comments
20
Years Active
5
Top Authors
#3274
Topic ID

Activity Over Time

2007
6
2008
15
2009
29
2010
47
2011
53
2012
81
2013
113
2014
111
2015
108
2016
162
2017
169
2018
189
2019
233
2020
284
2021
276
2022
247
2023
294
2024
228
2025
275
2026
20

Keywords

EVERYTHING AI DBA SQL OP DE ETL XML data.sets PM data etl cleaning format transform rows processing organize data processing half baked

Sample Comments

bjoli Feb 20, 2025 View on HN

I think that the data isn't really the problem, it is the workflow.

nsonha Aug 13, 2024 View on HN

why not, many data related tasks are rather ad-hoc, it's a waste of time to make a long lasting software out of every ad-hoc request

dcnstrct Nov 11, 2010 View on HN

Wow this is amazing! In the real world data can be messy and this looks like a great tool to transform it without an extensive custom ETL process that requires code

GFK_of_xmaspast Dec 23, 2016 View on HN

Why are you doing data wrangling in production?

Leftium Aug 16, 2023 View on HN

I think you're interested in keyword: ETL (Extract, transform, load)(Specifically the "transform" part.)- https://en.wikipedia.org/wiki/Extract,_transform,_load- https://hn.algolia.com/?q=etl

jnpatel Mar 21, 2016 View on HN

What's a scalable way to handle data cleaning, while still keeping things simple?

lastofus Jan 20, 2018 View on HN

What are some if the types of problems that needed to be solved when cleaning data that required heavy tooling?

Muddassar Oct 27, 2019 View on HN

Data Scientists and Analysts spend 70-80% of their time cleaning and preparing data. Hope this can help cut down the time by half.

nicbou Mar 30, 2021 View on HN

There's quite a bit of data wrangling happening both in the backend and in the frontend.

gryn Jan 29, 2022 View on HN

yes I'm aware it's nitpicking, which is why I said I agree and that it just need more clarification.but you don't need machine learning to hit these cases just a mismatch between what want to do and how/where it's stored.examples:- a noSQL db that cant directly do what you want and now you need to download lots of document to string them together before updating that field.- a SQL data schema not matching your tasks where need lots and lots of joins and aggre