Autodesk Is All in on the Modern Data Stack
Key TakeawayWith a modern data stack, Autodesk Construction Services unifies the Fortune 500 data architecture from multiple acquisitions. With the addition of Fivetran, Snowflake and dbt to its stack, pipeline maintenance is eliminated. Data loading no longer requires manual backfill or thousands of lines of Python code and run times that could take hours now take minutes for BIM 360 and BuildingConnected, two of the recent four construction product acquisitions.
- Pipeline: Fivetran
- Sources: Amplitude, AWS Lambda, Amazon S3, JIRA, Marketo, Mixpanel, MongoDB, NetSuite, PostgreSQL, Salesforce, SFTP, Stripe, Webhooks, Zendesk
- Destination: Snowflake
- Transformations: dbt
Aligning Applications, Databases & Tools Amidst AcquisitionsThe main business challenges for the Autodesk Construction Services data team included data extraction, storage and transformations. Evin Anderson, Data Engineering Manager, explained the situation for BuildingConnected:
We had been using Alooma and whenever a column had a different format or an unexpected value appeared, the team had to troubleshoot each individual issue one by one. It was often easier to just reset the table, which was cumbersome to troubleshoot. An estimated 3-5% of analyst time was spent troubleshooting Alooma to accommodate for column formatting changes.
The team was also running into concurrency issues with its storage solution. Too many calls to the database and queries running in parallel slowed the queries, with one hour and 40 minutes being the longest recorded runtime. The storage limit was finite and scaling speed for computing required a plan upgrade. Over 3,000 lines of Python code were required to unpack JSON and format data to get it in a usable form for analytics.
Raul Maldonado, Data Analyst, explains that the same issues existed for BIM 360:
Extraction required daily monitoring, with an analyst going into the system to make sure no data was missing and no schema changes from upstream affected downstream workflows. There were over 20 downstream models that the business had to ensure were sound. Limitations on parameters and backfill functionality meant that there could be days worth of data missing that required manual backfill, and transformations could take up to 23 hours.
Building the Data Architecture
The business recognized the need to bring together the acquisitions to deliver a more cohesive product and experience to the customer. They also wanted to provide insights to the organization to encourage data literacy and informed decisions. A group including Anderson and Maldonado set out to define their ideal data architecture, which involved getting the data sources into a warehouse and performing the transformations. There were a few critical requirements for improving the architecture, which Autodesk was able to easily achieve through a modern data stack consisting of Fivetran, Snowflake and dbt:
- Create highly denormalized tables for source standardization
- Adjust warehouse sizes
- Increase concurrency
- Implement processing of JSON and list transformations in the warehouse.
Fivetran: “With Fivetran we can do automatic schema migrations, so there are no stoppages in the flow of data. We have tests to troubleshoot specific areas as they may come along, but it doesn’t stop our end users from being able to use our reporting tools. The design is Intuitive and simple, the connector coverage meets most of our tooling needs and customer service is great.”
Snowflake: “With Snowflake, we have infinite scaling and elastic concurrency. Previously, if we had too many queries running it slowed down our experience. Now, we can easily manage traffic and it won’t interfere with run times. The independent compute resources allow us to size up our warehouses if we need a bit more speed to our queries. In general, we're going to implement all of our processes related to JSON extraction in the actual data warehouse itself. The secure data sharing and cost-efficient storage are important for sharing data with the business.”
dbt: “dbt allows us to set up repeatable data transformations. We can schedule jobs to create data tables for us that surface in downstream tables. With this setup we should be able to remain agnostic to the specific tools that we use,” Anderson says.
Results for BIM 360 and Building ConnectedWith Fivetran, Snowflake and dbt, both acquisitions have seen massive savings in time and maintenance and the business is well on its way to creating a single source of truth for Autodesk and its multiple acquisitions. With its architecture in place, Autodesk can now work on consolidating its BI dashboards, determining a machine learning infrastructure and enhancing dbt tests and documentation. Below summarizes the results for both products:
- Pipeline maintenance has gone from three to five percent of analyst time to less than one percent (to simply add new connectors)
- No longer requires any lines of Python code to unpack JSON
- All transformations are done in-warehouse
- With elastic concurrency and the ability to create larger tables, transformation run times have been reduced by 68%
- The extraction process no longer requires daily monitoring – Fivetran provides automated email alerts if a connector is delayed or a sync fails
- Data is automatically backfilled since Fivetran pulls data based on the last successful sync
- Transformations happen within minutes which is really impactful and helps analysts provide insights more quickly
About Fivetran: Shaped by the real-world needs of data analysts, Fivetran technology is the smartest, fastest way to replicate your applications, databases, events and files into a high-performance cloud warehouse. Fivetran connectors deploy in minutes, require zero maintenance, and automatically adjust to source changes — so your data team can stop worrying about engineering and focus on driving insights.
About Snowflake: Snowflake is the leading data warehouse built for the cloud. Its unique architecture delivers proven breakthroughs in performance, concurrency and simplicity. For the first time, multiple groups can access petabytes of data at the same time, up to 200 times faster and 10 times less expensive than solutions not built for the cloud. Snowflake is a fully managed service with a pay-as-you-go-model that works on structured and semi-structured data.
About dbt: dbt is a development environment built and maintained by Fishtown Analytics that speaks the preferred language of data analysts everywhere—SQL. With dbt, analysts take ownership of the entire analytics engineering workflow, from writing data transformation code to deployment and documentation.