Introduction to our Data Calculation Engine
Our Data Calculation Engine (DCE) is known as the Analyser processor. It is strongly suspected that this name will change in the future. You may have seen that this application is currently listed on our beta section of our website. We have intensively tested the DCE and are planning to use it for a new cryptocurrency platform we are soon to launch.
The Data Calculation Engine is unique in the way in which data can be passed between different generation modes to be be used for further calculations. Our core objective was to avoid having to write lots of code and data analytic language.
What is a data calculation engine?
We have worked on multiple projects where the client requires data to be analysed by applying aggregations and calculations against it. Typically, information is passed to an application at intervals and calculations are performed on it before the calculation results are evaluated and are marked as a pass or a fail. The important benefit of a data calculation engine is that it does not expose programming logic (code) to the end user. It is possible for a non technical person with reasonable maths skills to set up calculations and tests that can then tell them something they are interested in knowing.
How does a data calculation engine vary from a spreadsheet such as Microsoft Excel?
The main difference is that Excel relies upon significant effort to create formulas relative to data value items. This can quickly become unmaintainable. Another challenge with Excel is that it is a very insular application, the data isn't commonly portable to other platforms. Indeed, Excel has an unfortunate tag of "Death by Excel", offering what is known as End User Computing. Excel is a truly remarkable application offering functionality way beyond what most applications can offer, but the key is that Excel is too opinionated in how to work with data.
Is your application based upon your previous client experiences?
Absolutely not. We are working on a price analysis solution that will generate trade signals to feed into our trading application. Largely this is a technical analysis tool but not limited to financial analysis.
How performant is your Analytical processor?
It is not designed to compete with large enterprise database platforms but if we see the need to scale up rather than scaling out we will do so. The application goes through a data preparation which does mean it has latency.
What makes the Analyser Processor a different beast to other data calculation engines?
An important difference is that calculations can be injected into the application. This makes the application extensible. It means we can add new calculations in without having to write new code in within the data calculation engine. We can write new calculations and load them into the engine.
The analyser processor works along the lines of this
- We deliver one or more source files we expect to process. An example would be stock or cyptocurrency prices.
- We have a folder of instances of an entity - like a range of cryptocurrency pairs.
- We have a set of time series where we calculate slices within them for a measure. Such as The max amount every 10 seconds.
- We set up tests - range tests to say whether those values falls within it.
- We work with data files that are json arrays.
- The application is cyclical in nature. We can use one output from one calculation mode in another.
An additional powerful feature (the only part that uses a database) is we can take a json file and then write SQL against it. This can be useful because we can join up entity instances and run analysis on it. For example, we could test how many cryptocurrencies increases by 10% in an hour, and then use SQL to report the top 10%. This data is also exported as json.
The main benefits are that the only mode that requires programming skills is the transformation mode of the software.
The DCE was written to process cryptocurrency metrics and decision points. This data will flow into our cryptocurrency platform we will build shortly.
What is interesting is, we could do certain data processing outside the application. This may allow us to use more powerful software for certain data calculations.
Initial questions on our Data Calculation Engine
We had some discussions with a few respected colleagues in the data architecture and business intelligence space. We will try to answer a couple of their questions, and unfortunately they will be quite lengthy. This cannot be avoided unfortunately.
"The application doesn't use a database, does this not mean there could be data quality issues, or data mapping challenges?"
This is a very experienced question. The reason for this question, for those outside of the data space, is that traditional Data Architecture focuses on the consistency and quality of data within our systems. We bring this data into what is known as structured data repositories because this allows us to understand it effectively to produce consistent results. The moment we have to start working with json files outside of a database, we are forgoing the many decades of expertise put into ensuring data is held optimally within databases for reuse.
There is a very different approach to building systems that we have at Info Rhino. We think assistance as performing small units of work that can just focus on a specific task. If we find ourselves putting significant functionality into an application, when it seems like this should be best placed outside of application, we will write a new application or service to do this. We won't overburden an application with lots of noise that may destroy the purity of what the application is intended to do. In a simple example we have three clear units of work;
1 - Data delivery. Data from an external source is delivered to a location that can be accessed by the Data Calculation Engine.
2 - DCE invocation. At a certain point, the DCE must start its work.
3 - DCE Data Processing. One of the DCE's modes is executed.
The data architect's question - what happens between steps one and two?
What if the data is of poor quality? Novice developers would build extra validation processing into the DCE. They may keep adding more validation as data requirements increase. Instead, it is the responsibility of the data delivery application to ensure the data is in a good state. Our preferred approach may be to build some middleware that performs the validation independently. The objective is to keep the DCE lean, performing the tasks we expect it to do.
"The DCE doesn't use a database?"
Database developers and data architects tend to think in terms of databases. This is perfectly normal. Databases are optimised for performance and ensuring data quality is optimal. They provide a great language for analysing data, and their structure lends themselves to advanced data analysis, using tools such as OLAP and reporting software. Our current version does use a database for the transformation mode. We are thinking to remove using the database altogether. The reason is that databases are an expensive cost, which even larger enterprises can't afford to bear. We are data experts, we can find ways to use databases without having to spend lots of money on licensing - legally. There is a cost in terms of time when trying to get data into databases.
The importance of structured data inside JSON files
We have to concede that data inside JSON files has some degree of consistency to the data. For example, it would have been produced by an application that will have specific types for data items. We can have some degree of confidence we can work with the data, and it will have assessment of data integrity to it. Suddenly, we can store lots of data and files without having to worry about database management systems or No SQL databases. Nothing stops us using these systems. What is important is that most programming languages have undertaken significant work to make portability of data between different systems easier through JSON. Having data available in a format that is relatively easy to consume, transform, and export between systems is useful.
The practical use of the DCE for a data analytics website
We've talked a lot about our Web Data Platform (WDP) on our website. We can think of the DCE as middleware investigating crypto currency price data, performing calculations, and transformations on them. Testing for acceptable criteria, and further transforming this for use by our WDP. The end goal is to give users access to lots of reports on cryptocurrency price metrics from a fairly small set of configuration. What is really powerful is that our transformation mode can link to database table data to really enrich the metrics coming out of the DCE. Imagine if we could break down cryptocurrencies by dimensions and other metrics not provided by cryptocurrency price sources?
How does the DCE compare to applications such as R?
They don't really overlap is a fantastic solution for analysing data mathematically statistically. The DCE takes data, analyses it, and makes it available for further processing or consumption into applications.
Can you give us more practical examples of the Analyser Processor?
For now, we are working on other phases of our cryptocurrency platform. As we build up more use cases, we will start to document this and disseminate this information.
The best way to think of this DCE is time series analysis - looking at different calculations within a time period, before determining whether these values pass or fail range tests.
Thank you for reading this article
Please, don't hesitate to contact us at Info Rhino if you feel the need to find out more.
Written with StackEdit.