Info Rhino provide data solutions to Enterprises and SMEs. A recent assignment saw a typical software application setup, a website where data can be edited and uploaded, website reports (charts and tables), and other typical functionality. The application was hosted in Azure, and the database was Azure SQL Server. In a separate article, we discussed our efforts to improve the performance of the application by optimising the database.
Do get in touch with Info Rhino if you want us to build this solution for you - www.inforhino.co.uk/contact
A word of caution to the reader. We never seek to divulge personal/commercially sensitive information about a client. This article won't divulge any code or Intellectual Property. The solution is what all clients can benefit from and we would build this solution from scratch.
Note - Azure Load Testing is in preview mode, please bear this in mind.
We are also fully aware that we have Azure Test Labs which we should look to integrate into our CI/CD DevOps pipelines.
We keep finding that things that are easier to do on a Server or Desktop, can be harder to implement in Azure. The minute you start developing Software as a Service in Azure, your organisation are locking yourself into how they want you to do things. The tail often wags the dog. Our approach is to concede to using Azure features where required but to be very mindful about portability - if not simply because a new Preview feature may pop up that may make your life easier in future.
Many will say that it is "horses for courses", that we would have to do as much work on a traditional on-premise server architecture. The answer is no, the maturity of OP architecture trounces Cloud server management software. We are fully onboard with Azure and other cloud infrastructure for the enterprise, but we have to be frank about what the pros and cons of cloud vs on-premise are.
In this article, we explained how Azure SQL Server can run a large number of cores, offering large amounts of memory. When running at optimum processor and memory, even the largest poorly maintained database tables will return data quickly but, very soon, increased database load and concurrency will destroy the value of extra money paid. In our article we discuss the importance of improving database performance using traditional approaches. The objective was clear, improving the database performance by half will save costs.
To be able to understand if we can scale down (reduce CPU and Memory allocation) we will want to load test Azure. This article provides an overview on how we set about solving this challenge. Remember, this is representative of the way we would solve this problem and is not specific to our client.
A system that runs for many years, taking on large amounts of data into the database, without very careful management, will see performance for reporting requirements decline dramatically. There are many ways we know to improve application performance in the website tier. Many will look to the CQRS pattern as a means of separating writing and reading, but that alone doesn't do very much if queries for reporting takes data from the same database. Worse, inexperienced proponents of CQRS may have a separate data store for read/write data, and do crazy things like joining in the application tier. It is important, if not essential, to consider caching of data. Another thing we suggest is caching lookups, potentially injecting them into the database for joining if their size is small. Azure SQL Server prevents using memory tables - which is a disaster for those wishing to take advantage of the more advanced performance features of SQL Server.
So, we have the above approaches to improve, but really, we are shooting in the dark if we want to have a handle on whether we can scale down. Some will say that Azure provides fantastic tooling and dashboards to understand exactly what is causing performance issues - we say good luck.
On previous sites we have undertaken fullstack performance analysis. We can consider the performance from the browser, the JavaScript libraries, the webserver, and the database request and command return. However, we know that with a web application with strong user demand, most of the performance issues will lie firmly with the database. We know that database locking is a major problem. It is likely the database model will need to be modified too, but let's focus on a working model we can take forwards.
If we are working in a delivery focused SDLC, we are likely to see regular releases deployed into the deployment pipeline. The testing team will undoubtedly be running front-end tests. We may see developers-in-test (DIT) building automated code with perhaps BDD, Selenium, Playwrght, Cypress, to name a few of these. We are big fans of NBI for database unit testing. We want to be able to manually create database executions and then run them at a later stage but, in a CI SDLC, we will find manually created tests will go stale very quickly. That is not what we want to see, we don't want to spend a significant length of time creating configuration to execute queries, stored procedures, and functions. Often, the minute we have created a test file or class, somebody has already added a new parameter or field. This model is unsustainable for any technology department.
A typical application accepting user input to return data to the application from a database, will do the following;
There are additional steps to consider, but we can see there is a lot there.
We want to intercept those requests and save them to be replayable at the database command stage. Our objective is to capture a number of requests and save them for retrieval at a later stage. At this point, some will say - "Oh, but we can just run SQL Server Profiler". This is possibly true, we could extract a sessions
As said, this is a generic solution to a generic problem. We won't use client code or objects in our discussion.
Those familiar with dependency injection will be aware that we can have different implementations of a concrete method. For example, we can decide to go to a file system or database, and as long as the functions are the same the application behaves as normal. This is a powerful feature because we can decide to have an actor/double approach. The idea of the test double is common in software engineering.
In our example, we want to capture the query command objects, serialise them, and save them for reuse. We can create an execution library that executes these database commands, recording the start and end times.
Our goal is to be able to scale down, and see how the effects of our efforts to improve the database performance work out. Over time, we can group our runs and assess how well these items perform. We could have a before and after indexes group, for example.
Azure gives you many ways to achieve the same goal. It is not uncommon to explore one resource type before settling on another. In our situation, we decided upon;
In our case, the application used Dapper ORM, it could be Entity Framework, or any other Micro ORM. We wrote a wrapper around the calls that will save the commands in a json for retrieval later on. Many will frown upon this, but we very much want this to be runnable on any environment - including production if required. Remember, how we as developers and testers are constantly struggling to replicate the issues production users are finding? It could be that Monday morning surge. For the minimal amount of overhead in capturing and persisting the database command executions, we can get a much more accurate reflection on what happens in production.
Many organisations have monthly and quarterly processes. We aren't claiming to be able to solve this problem with our approach alone but we should be mindful of its potential.
A relatively simple Repository on top of Blob Container Storage saves the commands by a number of properties which we can filter from.
Blob Storage in Azure lets you attach metadata to objects. We build filtering into our repository pattern using tags. Azure Blob storage has a relatively poor implicit typing and filtering capability but we can abstract this functionality to potentially switch away from blob storage to another storage architecture if required.
It is likely that the best type of manual execution is to capture from production. We think it will be useful to maybe have this running in a test environment and ask testers and developers to manually run reports and tasks on the application over say, a working day.
Once we have our window of captures, we can simply change the database connection string and run them. We can change the scaling of the database to test query time difference. Testers may be happy just running these processes from a virtual machine? Spinning up a new database from a separate environment? Perhaps we integrate this into a daily CI build?
The results are persisted to storage. We would want to be able to save these with different process names for us to compare them later on.
The minute a system is conceptualised, we can start to list pros and cons of how it can work. Perhaps we would want to write something that could despatch the same calls at the exact same time. Perhaps we may want to store the data inside a database? What was important was to stop developers and testers manually creating test files that can go stale quickly, and to let us test specific environmental hardware settings. We have remained relatively agnostic in not attaching application contextual information to the persisting of the database calls but we have the choice to extend this if required.
The approach taken is very much, a foot in both camps. We deliberately wrote C# libraries that can be used in different architectures such as a web application or unit test project. We are using Azure Blob Storage, but it wouldn't be that hard to create a new implementation to persist this in ServiceStack or MongoDB.
We hope you liked this article, we tried to avoid going too much into the technical details of this approach. We welcome opinions on how you would attempt to do something similar.
Written with StackEdit.