Info Rhino, are experts in data databases and understand how to optimise them for better performance and scalability. Feel free to get in touch with us to help improve your system's performance. Contact Info Rhino here
Azure can dramatically lower costs of enterprise technology infrastructure. However, this is twofold - not paying attention to the finer points of optimising the infrastructure can lead to dramatically higher costs.
We completed an assignment to improve database performance for the user experience but we quickly switched gears to focus on reducing costs by improving performance.
Azure provides tooling and informational views to provide all manner of metrics across their cloud services. These metrics, in our estimations, are far from perfect but there is one really important feature - costs by resource group and type. We can view costs of different resources to understand what we are paying for over time.
We can now start to think about ways in which we can reduce costs and we will focus on Azure SQL databases.
Those unfamiliar with Azure can immediately see parallels with on premises architecture, in that we can add more processors and memory to improve performance. To really make the point, the largest and worst optimised database tables can return query data very quickly if we are using the maximum capacity. The problem is that maximum capacity can quickly become tens of thousands of pounds a year.
Website developers will have significant experience across many different technology stacks and frameworks. They will know their way around the database development of writing SQL queries and coming up with a reasonable database model. The challenge is that most of these developers won't have the aptitude for really getting down to performance improvements on the database. Savings in performance will tend to be focusing more on the website application tier.
The typical mid-level website developer
Now we can really start to see some problems. These developers have little interest in using a database at all, they will tend to focus on using an object relational mapper to move data back and forth between the website and the database.
This version has less features than are available on either SQL Server Standard Edition or on SQL Server Enterprise edition. Features lacking includes; not making the database consistency checker available, an inability to create far groups, unavailability of in memory tables and in memory table variable, and other important performance features. We are already at a disadvantage in today's modern scalable data technology world. This does not mean that Azure isn't a fantastic platform for hosting a wide range of software and processes - we should be mindful Microsoft Azure lacks capabilities which can make our developer's jobs a lot harder. Some will say that Azure and Amazon makes their lives easier, but anyway.
This is not going to be a lesson in database optimisation - a highly specialised area. We want to discuss a few principles to take into consideration.
Using suggested index adding from different database analytics software will often fail to lead to better performance on the database.
Database tables should not have lots of indexes covering the same columns.
If your database table has almost as many indexes as it does columns something is very wrong.
if we have database indexes that covers the same columns we are simply adding to the storage on a database and cost.
If the tables are fast moving and we have many indexes we will find the performance can be made worse.
We need to think about the ordering of column specification during index creation so our leading column is the most selective.
We must really think about whether we should be using partitioning and whether we should be using column store indexes over non-clustered indexes.
We should consider whether we should be by caching data outside of the database, within the application tier for more commonly used reference tables.
We should consider where the sequel server is the right platform four our data. We think Azure SQL Database is incredible, we have to make this point for transparency.
Consider the structure of the database itself, how the tables are designed.
"hmm, that's a lot of work"
When our database has the maximum memory and cores, queries run very quickly without needing optimisation, although we still get performance issues. We want to use features within the database to reduce the number amount of memory and cores being used to lower our costs.
Our current costs are £100k a year. Our database will grow but much of that data won't need to be made available to the application. We want to try and reduce our costs down to £50k a year in year two, and £25k in year three.
Our conceptual target has identified that we could spend £30k in year one and maybe £15k in year two, and still save £30k in our efforts to reduce costs. Note - none of this is a promise, but a target. We may be able to exceed this, we may not be able to achieve this.
Remember too, the more work the database is performing, it is likely that the application tier will also have more work to do. We should be able to re-architect parts of our application tier to be more judicious in its handling of data.
The one thing we must be made aware of is that Azure does not provide enough artificial intelligence driven metrics on how to optimise database and application. If we get a seasoned DBA to go through a large database and analyse how it has been maintained, they will find far more than the typical recommendations by Microsoft Azure AI.
Similarly the staff that have built this solution are unlikely to be the staff to optimise the solution and repair a lot of the issues.
We will now present a high-level overview on how we can optimise systems like this.
1 - Storage - table structure and index analysis
We will analyse tables, sizes, settings. We want to understand whether there are enough indexes, too many indexes, the wrong type of indexes? Data Fragmentation. Can some data be stored on cheaper storage?
2 - Data traffic.
When is a database being hit, what kind of tables are being hit do we have any metrics on slow running queries? Do we see deadlocks?
3 - Data retrieval and reporting
We start to look at the types of queries that are issued against the database. We want to understand if we should have a reporting schema within the database.
4 - Load Testing.
We want to understand how our performance improvements can achieve similar or better results whilst changing scaling levels.
At this point, we should note that we aren't going all-in on improving the database. We have many options but we want to achieve a minimal implementation to gauge our Return on Investment (ROI).
There were three main aims;
Index removal and remodelling (different indexing strategies)
Regular Database Optimisation
Scale down through Load Testing - we wrote a small dynamic application that could simulate load to test query performance.
Microsoft C# and NUnit Testing Framework
NBi Data Testing Framework (.Net)
Azure Data Factory for workload and database automation
Azure SQL Database tooling and analytics
Azure Cost Management views to understand breakdowns of costs by resource group and resource type
Azure Data Studio and SQL Server Enterprise Manager
Thank you for taking the time to read this article. it's hope it's giving some good ideas on how to dramatically reduce costs on cloud data architecture. We think it is better to get external support to do this. Those enterprises with a DBA function may think it is better to hand this work over to them but unfortunately they rarely have oversight of the application. (Note, many organisations does have DBA functions highly attuned to the needs of their applications).
Again, just drop us an email at solutions@inforhino.co.uk if you wish to discuss anything on this.
If you are a DBA or data expert, drop us an email too, happy to talk about some of the concepts in this article.
Written with StackEdit.