Decrease Cloud Costs: FinOps for Your Cloud Data Warehouse
Congratulations! You and your team built a successful data platform, and now other teams across your organization use it to make data-informed decisions. But...now you’re the victim of your own success.
As more people use the platform, you receive more feature requests. Also, more source systems are ingested, more data models are created, and more queries are made against your data warehouse. You might be surprised to realize that being a great data team has caused your cloud data warehouse compute costs to balloon astronomically!
If this is your organization — or you fear it could be in the future — you’re probably wondering how to manage your compute costs to make an impact without blowing up your budget. That’s where we can help — we have tips for monitoring your data platform’s usage and performance while keeping your cloud costs down and increasing the value your organization realizes from your data warehouse.
Monitor Your Cost Data
To monitor your cost data, use tools such as select.dev, vantage.sh, or build your own using cost tables auto-generated by your warehouse. Set up alerts and use metadata tags to attribute workloads and queries to certain teams or departments so you can attribute costs to specific areas of your organization.
Once you know your most expensive assets in terms of compute, you can create an optimization plan.
Reduce Ingestion Costs
Many organizations today follow an ELT (extract-load-transform) pattern, where all raw data is loaded into the data warehouse before any filtering or transformation is done. To reduce data ingestion costs, move to an “EtLT” pattern and shift data cleaning and prep left before it enters your data warehouse. Inspect your organization’s usage patterns to see if you can perform simple transformations or filtering before the data lands in your warehouse. Maybe doing data type changes, removing unnecessary columns — especially if they’re personally identifiable information (PII) — or filtering out unwanted rows. This way you’re not paying premium compute costs for your data warehouse to run these transformations. Jake Thomas, an engineer at Okta, gave a great talk at Data Council 2024 on exactly this topic!
Reduce Serving Costs
To reduce your serving costs, pay attention to what your teams use, and don’t be afraid to turn off jobs that create tables no one uses. If ad hoc queries are a large cost driver, consider SQL training for your users so they can build the skills and understanding needed to use the data warehouse more efficiently. For platforms with separate compute resources such as Snowflake or Databricks, investigate usage and performance data for your warehouses. Are they sized appropriately? Should any clustering configurations be adjusted? Uncovering the answers to these questions can provide ideas for lowering your costs around serving data.
Set Guardrails
You can proactively reduce your data ingestion and serving costs by using your data warehouse’s tracking features (e.g., resource monitors and budgets). That way you can monitor your spending and act before your costs balloon out of control.
Take a Methodical Approach to Monitoring
While it’s exciting to see increased adoption of your data platform across your organization, it can lead to rapidly increasing compute costs. However, by methodically monitoring your data warehouse’s usage and performance, you can keep your costs reasonable while increasing the value your organization receives from your data warehouse.
Want to learn more about how you can control your cloud data warehouse costs? Contact us.