MySQL Database Optimisation in Simple but Effective Way

Om Vikram Thapa
9 min readMay 11, 2023
database-optimization

Once in your software life-cycle career you will end up meeting this stage where everyone will be talking about — Let’s optimise our database :) This is my story working in e-commerce industry for last 15 years and I have seen this stage multiple times in my career and I was part of them directly or indirectly.

Thus I would like to share my first hand experience on DB Optimisation step by step and trust me most of them are small steps which I am going to share in this blog but the whole process revolves around it.

Let’s process our thoughts around what is the need to optimise at this point of time. Is my master DB getting overcooked? Are the QPS is something you are worried about? Are you going to launch for Big Offer Day? Are you going to launch the system for another vertical or countries? Is there lag between your master and slaves? The reason may very from one system to another depending upon your use cases.

Once you have the answer for above query then figure out the steps which we generally consider for DB Scaling activity are —

  1. DB Segregation
  2. DB Size Reduction (Archival)
  3. DB Query Optimisation
  4. Finding DB Lags — binlog, GTID
  5. DB Tuning
  6. DB Partitioning
  7. Proxy SQL
  8. Re-arch your service — Caching, Queueing
  9. Upgrade/Degrade DB Machines
  10. Deploy a Performance Monitoring Tool

1. DB Segregation

What do I mean by this is find out if your master DB has got master tables, mapping tables and transaction tables and where do you see the most number of hits coming to the set of tables. For eg. In any product system we get multiple updates from the suppliers about the rates and inventory of the product. Can we find out what is static and what is dynamic tables in the database and if it make sense can we segregate them into 2 or multiple database systems so that the throughput in one of the dynamic tables won’t affect the updates on other tables in the same database.

static-dynamic-table-segregation

Its always good to have a separate datastore for Analytical purpose, those queries should not be part of your current master database. You can have a dedicated slave for the same or you can push the data from all your databases (static + dynamic) to Analytical service dedicated datastore for ease of computation.

2. DB Size Reduction (Archival)

How much data is relevant for your application and do you want all of them in your master database? Sometimes we keep saving all historical data in our master database as book keeper (just in case) but this creates issues as

More the data ~ More the number of rows ~ More the time taken for search queries ~ we end up indexing more

because we think indexing is the only solution to increase the search performance :) Thus my suggestion is try to give more breathing space to the database by planning proper archival mechanism so that you still has the access of old data but only on demand basis.

database-archive-data-access

3. DB Query Optimisation

This is a no brainer and must to have for your DB queries to be optimised. The DB Query optimisation should be part of both application layer as well as Data layer where you actually fire the queries.

Some of the best query optimisation techniques are well explained here -

4. Finding DB Lags

Most of us know that we write in master database and read from slave replicas but there is always a chance of master slave lags (read more about this in my previous medium blog https://omvikram.medium.com/understanding-mysql-db-replication-in-its-simplest-form-37df899ca166 )

You need to ask your DBA for the slow log queries or if you admin rights you need to enable the slow log query in your database —

https://www.a2hosting.com/kb/developer-corner/mysql/enabling-the-slow-query-log-in-mysql/

Slow Queries will hold the resources in your database and slows down the whole process of binlog to slave data replcation which leads to DB Lag.

database-lag-scenario

Start Using GTID (Global Transaction ID)

GTIDs bring transaction-based replication to the MySQL databases. With GTID replication, every transaction can be identified and tracked as it is committed on the originating source server and applied by replicas. You don’t have to refer to any log files when starting the replica servers. GTID replication uses auto-positioning replication, which reduces the work spent on configuration.

You need to ask your DBA to enable the GTID based replication or if you admin rights you need to enable the GTID in your database —

https://www.redhat.com/sysadmin/gtid-replication-mysql-servers

5. DB Tuning

This is one of key practice of knowing the right configuration in database which we as developers ignore and most of the times we trust that the default settings may be working fine but but there are few check points we should double check if its working in our application favour or not.

For eg.

SQL Cache https://www.percona.com/blog/the-mysql-query-cache-how-it-works-and-workload-impacts-both-good-and-bad/

mysql> show variables like ‘query_cache_size’;
+ — — — — — — — — — + — — — — -+
| Variable_name | Value |
+ — — — — — — — — — + — — — — -+
| query_cache_size | 1048576 |
+ — — — — — — — — — + — — — — -+
1 row in set (0.00 sec)

mysql> SHOW STATUS LIKE “qcache%”;
+ — — — — — — — — — — — — -+ — — — — -+
| Variable_name | Value |
+ — — — — — — — — — — — — -+ — — — — -+
| Qcache_free_blocks | 1 |
| Qcache_free_memory | 1031320 |
| Qcache_hits | 0 |
| Qcache_inserts | 0 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 1 |
| Qcache_queries_in_cache | 0 |
| Qcache_total_blocks | 1 |
+ — — — — — — — — — — — — -+ — — — — -+
8 rows in set (0.01 sec)

Buffer Pool Sizehttps://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html

innodb-buffer-pool-size

Page Size — https://stackoverflow.com/questions/4401910/mysql-what-is-a-page

The default database page size in InnoDB is 16KB. By recompiling the code, you can set it to values ranging from 8KB to 64KB.

6. DB Partitioning

Ohhh Finally we are here. There are various schemes to perform partitioning of the Database tables but the question you should ask first what do you need partitioning for? Is it really required? What will be the basis for partitioning in your database tables?

For eg. For our product system it made sense to us to have rate and inventory database tables to be partitioned “month-wise” (YYYYMM)

database-partitioning

The data can be partitioned between only 1 server or no. of servers (shards)

database-partitioning-schemes

I found a very useful youtube video which explains this — https://youtu.be/wXvljefXyEo

7. Proxy SQL

Just like Application Load Balancer if you are searching for a system which can act as — Query Load Balancer + Router + Sharding Handler

proxysql-as-query-load-balancer
proxysql-as-router

https://www.digitalocean.com/community/tutorials/how-to-use-proxysql-as-a-load-balancer-for-mysql-on-ubuntu-16-04

8. Re-arch your service

If you think still the throughput is high and your Master DB is not able to handle that much load OR you have so many reads that you can’t bear the expense of Read Replica Slaves scaling up. Then you need to take a step back and think about your Architecture again. 2 simple solutions are Caching & Queueing.

Caching for Reads — You can always have multiple slaves but we can still cache some of the static information which is commonly used for SELECT queries and serve it via distributed cache rather than fetching is directly from the Slave Databases for eg. Product_basic, Product_info etc

redis-as-distributed-cache

Queueing for Writes — You scaled up the database, put caching for reads, used proxysql also to load balance the query but the input requests via your application layer is so much that you are forced to either put throttling or you need to bear with the latency which in turn result into your DB Lag. Thus Queueing of the input requests can surely help you to manage the DB requests and there many queueing stack available in market for eg. RABBITMQ, KAFKA, AWS SQS etc.

9. Upgrade/Degrade DB Machines

The DB machines will come to play only when you have hosted solution for your database, already managed services like AWS RDS or SQL Server will have its own configuration and management systems. But hosted solution requires your database server instance type choice and its configuration.

For eg. in AWS you can select between different EC2 instance types -

aws-ec2-instance-types

If you are using a managed service like RDS then check the RDS instance types based on your need here — https://www.cloudzero.com/blog/rds-instance-types

You can always vertically scale using EBS (Elastic Block Storage) on top of the instance type you use but nowadays there is NVMe (Non Volatile Memory Express) or SATA-based solid state drives (SSD) to deliver high random I/O performance. This is a good option when you need storage with very low latency, but you don’t need the data to persist when the instance terminates or you can take advantage of fault-tolerant architectures.

Apart from this if your database version is also behind I would recommend you to upgrade the same for eg. MySQL 8.5 has much more enhanced feature for optimisation and scaling than MySQL 5.7

https://support.cpanel.net/hc/en-us/articles/4408954578967-What-s-the-difference-between-MySQL-5-7-and-MySQL-8-

10. Deploy a Performance Monitoring Tool

This one I am open for any suggestion from you guys as I have used Profiler, Percona & Zabbix but you can discuss this with your DBA and based on the use cases as well as budget try to conclue on one of the suggested DB performance monitoring tools available in the market

There is a difference in between Database Optimisation, Tuning and Scaling. I would like to touch upon each topic in my upcoming blogs. For now Optimisation can have much more than what I have suggested above but these are the ones I am familiar with and worked upon in my career. They do have their own pros and cons but based on your application need, your business scale and your dev + infra cost limitations you can take the call to utilise them in your system. Remember “Nothing is bulls eye”

P.S — Thanks to all the references I mentioned in my blog to enrich the content. I have personally hand picked those youtube, medium, stack overflow links after going through many relevant articles :) Happy DB Optimisation !!

--

--