By replacing homemade ETL with automated ELT, Autodesk cut its data ingestion process from six months to six days.
Since joining Autodesk in 2019, Jesse Pederson, Autodesk’s VP of Data Platform and Insights, has been laser-focused on burning the “right calories.”
“If you look today at where we’re spending all of our mental calories, we’re building pipelines to bring data into the warehouse and into our lake — rather than focusing on the important part: why you even build a data lake in the first place,” Jesse said at the Gartner Data & Analytics Summit 2021.
“I came to my group, and said: Look, we need to put more focus on joins, and less on pipelines.”
Once primarily known for AutoCAD, Autodesk is now a global design software provider, with over 11,000 employees. “Whether a new bridge, a skyscraper, a smart car, or a new blockbuster movie, our software helps people design and make anything.” Autodesk’s rapid growth and acquisition strategy has created immense opportunities and internal demands.
Jesse challenged his team: “Today it takes us six months to bring a new data set into the data platform. I bet we can get there in six days. Six days to gather requirements from customers, to get the vendor set up, and to actually have this live through all security and privacy controls.”
“Soup to nuts, the entire thing six business days. That was the goal.”
Jesse kicked off transformation within Autodesk by focusing on four guiding principles to align his team’s mindset and mission:
Buy > build: Put valuable engineering calories on the hard problems, and let vendors do the rest.
Keep it simple: Have informed opinions on a strict set of tools that we’ll make available.
Minimize time to impact: Make it fast for customers to bring data to Autodesk Data Platform and get value in return.
Secure and private out the gate: Treat data with respect from day one.
When Jesse inherited Autodesk’s data stack, ingestion was a major problem. “We were using a whole slew of different ingestion mechanisms to bring data into our data lake – Attunity, Glue, Kinesis and custom scripts — and frankly a whole lot of duct tape to make sure that this data made its way into our lake.”
The team had been importing data from Salesforce, SAP, Siebel, and Autodesk’s own products such as AutoCAD, Revit and Maya into an S3 Data Lake. Visualization of data presented its own challenge, “with a whole mess of different tools: Looker or Power BI, notebooks, and a ton of other ways to visualize this data.”
To simplify the process, Jesse brought in Fivetran and Snowflake — and clearly delineated a process for data ingestion.
Today, Autodesk has bifurcated its data pipelines, with two opinionated routes for data ingestion and storage:
If the source is a structured data store, Jesse’s team uses Fivetran for ingestion. Structured data is stored in Snowflake.
If the source is an unstructured data store – for example, usage metrics from Autodesk’s own products and software – Autodesk uses AWS Kinesis for high-volume ingestion into S3.
Data is replicated between the two repositories as necessary. Product usage data is promoted to Snowflake for easier visualization and analytics, subject to Autodesk’s privacy controls, and snapshots of Snowflake data are persisted into S3 for historical reference and machine learning purposes.
“I think it’s really helped to free up our teams, frankly, to work on problems that they'd rather be working on.” With the time saved, Jesse’s team can now focus on building solutions that help grow the business, and move his team from a cost center to revenue center.
The change has been dramatic – not only in time saved, but in emails avoided. “I don't get emails anymore saying, “Urgent — Pipelines are broken.” I get emails now that are like, ‘Hey, when can I put my data in.’”