Data Wrangling and ETL
The cluster discusses challenges in cleaning, transforming, and preparing messy data, the time-intensive nature of data wrangling for analysts and scientists, and tools or workflows to simplify ETL processes without heavy custom coding.
Activity Over Time
Top Contributors
Keywords
Sample Comments
I think that the data isn't really the problem, it is the workflow.
why not, many data related tasks are rather ad-hoc, it's a waste of time to make a long lasting software out of every ad-hoc request
Wow this is amazing! In the real world data can be messy and this looks like a great tool to transform it without an extensive custom ETL process that requires code
Why are you doing data wrangling in production?
I think you're interested in keyword: ETL (Extract, transform, load)(Specifically the "transform" part.)- https://en.wikipedia.org/wiki/Extract,_transform,_load- https://hn.algolia.com/?q=etl
What's a scalable way to handle data cleaning, while still keeping things simple?
What are some if the types of problems that needed to be solved when cleaning data that required heavy tooling?
Data Scientists and Analysts spend 70-80% of their time cleaning and preparing data. Hope this can help cut down the time by half.
There's quite a bit of data wrangling happening both in the backend and in the frontend.
yes I'm aware it's nitpicking, which is why I said I agree and that it just need more clarification.but you don't need machine learning to hit these cases just a mismatch between what want to do and how/where it's stored.examples:- a noSQL db that cant directly do what you want and now you need to download lots of document to string them together before updating that field.- a SQL data schema not matching your tasks where need lots and lots of joins and aggre