ETL/ELT Tools
Discussions center on ETL/ELT data integration tools like dbt, Fivetran, Stitch, and their use with cloud data warehouses such as Snowflake, including comparisons, recommendations, and data engineering workflows.
Activity Over Time
Top Contributors
Keywords
Sample Comments
It's a bit unclear on what the scope of this tool is.But there are a couple of new classes of tools for ETL/ELT or data engineering as it's called now.There's the "Data Integration Tools" like Fivetran, Stitch, and this. They are collections of connectors that they have coded to ease ingesting data from lots of different database products and stores to another. That's valuable, I wouldn't start writing my own script to pull changes from my RDBMS'
You should definitely check out dbt :)Some links in my comment: https://news.ycombinator.com/item?id=36911937
Thanks for the feedback! We are looking for more info on user needs in this space. Sounds like you currently use Alteryx + Snowflake. Any additional information you could provide about your use case/needs would be helpful. Seems like some people are more interested in open source tools that can be run on their own computer (like DBT) while others are looking for more of an enterprise use case. What about you?
Luigi, AWS S3, DBT, Snowflake and Re:dash (currently analyzing Metabase or Looker to allow queries without SQL)Luigi runs our scrapers and other workflow management tasks (e.g. DB backups).All raw data lives in S3. We make an effort to be able to recreate the whole data warehouse from the raw data, so if any cleaning/normalization process fails, we have this safety net. I'm curious to hear if others use a similar pattern, or if there are better options.DBT handles both loading
If you're an analyst, I second the recommendation for dbt. Here's a podcast interview with the CEO of the company behind dbt that explains a lot of the philosophy, and I think will help you even if you don't end up using dbt: https://softwareengineeringdaily.com/2020/03/09/dbt-data-bui...
You’re missing the need this product addresses. It’s not about your database, it’s about the data someone sent you that you’re about to ETL into your database.I used to work at a medical ML company, the datasets we got from insurance companies and medical providers were generally awful. Occasionally, it didn’t even match the data dictionary they themselves provided. If you need to connect to or ingest outside data sources or check the output of an ETL pipeline, this tool is extremely useful.
Many BI / analytics tools don't have great support for Data Lakes, so part of the reason could be supporting those tools (e.g. they still load some of their data to snowflake to power BI / dashboards)
How do you build a data warehouse without ETL?
from any data warehouse as well?
Excited to follow your progress! I view this problem as the one of the biggest gaps in today's "Cloud Data Ecosystem". Tools like Stitch and Fivetran make it super easy to extract data from source systems; next-gen cloud data platforms like Snowflake make storing, transforming, and querying that data a breeze (especially with the help of tools like dbt and dataform); and there are a ton of powerful and easy to use BI tools for visualizing and digesting that data. But the minute yo