How did we save 300K+ USD in AWS Cost Optimisation in 2024

Om Vikram Thapa
9 min readMar 15, 2025

--

money-power-respect

Who doesn’t want to save money and when it comes bottom line of the organisation cost, then everyone is on top of their game.

Last year we started this org-wise initiative and then we started exploring various tools out there which could give us monthly reports as well as cost saving recommendations for eg.

also we took some experts help on Cloud Diagnostics

First thing first we created a confluence page to set a goal for our team for our product line i.e. If we are starting this year with ~150K USD per month then our goal should be to optimise it to spend ~100K USD per month.

Once the goal was set we started analysing the system infrastructure and found out some low hanging, some trivial and some new ways of cost saving opportunities. Let me explain few of those initiatives we took in 2024 which resulted in huge saving for us —

1) Intel Based to Graviton Based Instances

In 2018, AWS released its own processor, Graviton, that advertised to have up to 40% better price performance for certain instances.

You can find the Graviton based instances in all popular AWS family

ec2-instances-family

While Intel still holds the majority share of the EC2 landscape, the adoption of AMD and Graviton instances is steadily increasing, reflecting customer willingness to consider alternative processor architectures based on their specific workload requirements.

2) GP2 to GP3 Migration

In December 2020, AWS announced general availability of a new Amazon EBS General Purpose SSD volume type, gp3. AWS designed gp3 to provide predictable 3,000 IOPS baseline performance and 125 MiB/s, regardless of volume size.

With gp3 volumes, you can provision IOPS and throughput independently, without increasing storage size, at costs up to 20% lower per GB compared to gp2 volumes. This means you can provision smaller volumes while maintaining high performance, at a cheaper cost.

3) Right Sizing of RDS Cluster

You can scale your database instances by adjusting memory or compute power up or down as performance and capacity requirements change. The following are some things to consider when scaling a database instance:

  • Storage and instance type are decoupled. When you scale your database instance up or down, your storage size remains the same and is not affected by the change.
  • You can separately modify your Amazon RDS DB instance to increase the allocated storage space or improve the performance by changing the storage type (such as General Purpose SSD to Provisioned IOPS SSD).
  • Determine when you want to apply the change. You have an option to apply it immediately or during the maintenance window specified for the instance.

4) Keeping the Lean Disaster Recovery Infra

To keep your disaster recovery (DR) site lean, focus on minimal, essential resources and infrastructure optimized for rapid recovery, while regularly testing and updating your DR plan and procedures to ensure efficiency and effectiveness.

A) Define Clear Recovery Objectives

B) Optimize Infrastructure and Resources

C) Focus on Core Functionality

We focussed on section B to keep the Disaster Recovery infra lean by :

  • Considering using cloud-based DR solutions for flexibility and scalability, allowing you to provision resources only when needed.
  • Utilizing virtualization technologies to efficiently manage and allocate resources, enabling quick scaling and deployment of virtual machines.
  • Implementing efficient data replication strategies to minimize data loss and ensure rapid recovery.
  • Automating as many recovery tasks as possible to reduce manual intervention and speed up the recovery process.

But this is still a Backup As A Service (BaaS), if possible, try to move towards Disaster Recovery as a Service (DRaaS)

5) Decommissioned unused microservices

To detect whether your microservices are unused and consuming infrastructure cost can be detected by either your Architecture Review or from AWS Monitoring logs. Zero or less than 5% CPU Utilisation can be also one of the trigger to question do we require this service or not.

Last year we consolidated 2 microservices based on the functionality & removed 2 microservices because they were obsolete or need not require maintenance. Once we took this decision, the infra cost directly started showing up from next billing cycle :)

6) Merge Static & Configuration Databases

When we analysed our RDS structure we found out we have 3 different MySQL databases usede for — Transaction, Static Data & Configuration DB

This set up was replicated in QA, PREPROD & PROD environment. We dint want to touch Transaction but Static & Configuration data can co-exist. These are some simple but important decisions which we took because sometimes people do not want to touch something which is legacy but running JUST FINE.

7) Reserved vs On Demand Instances

A Reserved Instance offers cost savings of up to 72% over On-Demand price. In addition, Reserved Instance three-year terms offer much greater savings over one-year terms. For the same instance type, an On-Demand Instance will always be more costly than Reserved if it is used at capacity for the full time period.

reserved-vs-on-demand-instance

To know when to use reserved instances do read this AWS blog :

8) Removed unused EBS Volumes

The lifecycle of an Amazon Elastic Block Store (Amazon EBS) volume is typically independent from the lifecycle of the Amazon Elastic Compute Cloud (Amazon EC2) instance to which it is attached. Unless you select Delete on Termination option at the time of launch, terminating the EC2 instance detaches the EBS volume but doesn’t delete it.

aws-ebs-volumes

We followed the documentation and instructions mentioned by AWS:

9) Open Source Hosted Solution vs Cloud Services

Over the period of time we procured a lot of open source softwares and then hosted them in EC2 instances basically On-Prem Solution.

We reviewed all of those softwares and ask below 3 questions -

> Is this really important for our ecosystem?

> Is there Cloud Based Solution already exist for the same?

> Is the cost of Cloud Based on long term cheaper than the On Prem hosted solution?

We found out there were 2 softwares which we were consuming around 30K USD per annum due to infra cost in QA, PREPROD and PROD while the cloud based enterprise solution exist in 1200 USD per annum.

on-premise-vs-cloud-base

10) Auto Scaling and Descaling AWS Infra

Everyone in technology would have heard about “Auto Scaling” but how many times they spend time with DevOps team member to review the Auto Scaling Group Scaling Limits? Do spend sometime there. There can be an opportunity to save cost there.

In the context of AWS, “descaling” refers to reducing the number of resources (like EC2 instances or ECS tasks) that are running to match a lower demand, often achieved through services like EC2 Auto Scaling or Application Auto Scaling. We followed this decision tree —

aws-f5-descale-logic

11) Reduce AWS Cloudwatch usage and push the data somewhere else

For your paid AWS account do check the AWS Cloudwatch Logs frequency, size and costing —

To save on AWS CloudWatch Logs costs, you can implement log retention policies, audit and remove unused metrics and alarms, use metric filters instead of custom metrics.

  • Configure log groups to automatically delete logs after a specific period, preventing unnecessary storage costs.
  • Consider archiving logs to Amazon S3 for long-term storage and analysis, which is generally cheaper than keeping them in CloudWatch Logs.
  • Identify and remove any metrics or alarms that are no longer needed or used.
  • By using metric filters, you can reduce the amount of data that CloudWatch needs to store and process.
  • Optimize CloudWatch dashboards: Avoid exceeding the free tier limits for dashboards and API calls.

12) Trusted on AI Based Recommendations

Nowdays you dont need to hire AWS Cost Optimisation specialist team, there are multiple AI tools which can sniff your ecosystem and give you best recommendations to righsize or decommission AWS infrastructure.

Cloudability vs Densify vs Spot by NetApp vs FinOpsly vs PointFive

and after comparing the pros and cons of many RightSizing tools we decided to go with PointFive, a cloud cost optimization platform, utilizes its “DeepWaste™ Detection Engine” to identify and address cloud inefficiencies by analyzing infrastructure at a granular level, similar to threat detection systems, and providing actionable insights and remediation workflows.

Result :

aws-cost-optimisation-2024-vs-2023

This year we are going to try more AI based cost optimisation techniques and hopefully would like to bring the cost per month to less than equal to 100 USD per month :)

If you have some more suggestions/recommendations please pen down your thoughts in the comment section of the blog. claps, like and share will be highly appreciated.

If you like the blog, do follow me :

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Om Vikram Thapa
Om Vikram Thapa

Written by Om Vikram Thapa

Engineering@BHN - Technology, Football n Travel

Responses (2)

Write a response