RBB Speed and Optimization
This page documents techniques for performance tuning and scalability limitations in the Relational Blackboard software.
A main objective of RBB is high performance (locating and retreiving data quickly), so applications can rely heavily on the blackboard concept. This means using the blackboard itself as the primary data structure in an application, instead of retrieving long-lived local copies of data for processing (since maintaining local copies is extra work for the implementor and leads to inconsistent state across the application).
For interactive Java applications, selective transparent mirroring of RBB data is provided by the EventCache and TimeseriesCache classes. They mange local copies of Events matching a specified tagset, updated in soft real-time. The cached information includes time coordinate transformation parameters, drastically accelerating performance when time coordinates are used.
If an RBB Is Slow to Open
The underlying H2 database is slow to open if there are many tables in the database. (This is typically because there are many Timeseries, because each is a table). One solution to this is to run a standalone server. To do this, run the h2 server with rbb.jar in its classpath using the command
java -jar rbb.jar server
Then in the application, use a server url with DB_CLOSE_DELAY with a sufficiently large number of seconds, or -1 to leave it open indefinitely. For example:
The standalone server will still be slow to open the database the first time, but after that clients can quickly connect to the server and access the RBB.
Creating a Single-Tag Primary Key
Many functions in RBB must find a set of events from a tagset. This takes longer if the database contains a large number of distinct tagsets, or if the ‘find’ tagset contains several tags (each corresponds to a sql JOIN). If you are frequently finding the same tagsets, try combining all the values of that tagset into a single tag (or at least combining several tags often used in the same combination).
For example, H2SEvent.sequence(db, t, idTags, infoTags) calls 'find' on 'idTags' every time. Try replacing:
H2SEvent.sequence(db, 29, 'firstName=Bob,lastName=Jones,DOB=19601004', 'currentLocation=home');
H2SEvent.sequence(db, 29, 'id=Bob:Jones:19601004', 'firstName=Bob,lastName=Jones,DOB=19601004,currentLocation=home');
The result is the same except each event has an extra ‘id’ tag.
Timeseries (Table) Overhead
Each timeseries in RBB is a table. The H2 database (as of Oct 2010) uses substantial resources for each table in the database whenever the database is opened. As a result the practical limit on the number of timeseries in an RBB is on the order of 100,000. The remainder of this section is supporting data from a benchmark.
The following table shows runtimes and memory usage for RBBs with varying numbers of timeseries. The columns are:
The computer is a Core 2 Duo T9600 laptop @ 2.80GHz with 4GB RAM on AC power (not battery) with swap disabled.
- a new RBB was created and populated with the specified number of very short timeseries (two, one-dimensional observations).
- the minimum value of -Xmx used to populate and then open the RBB without raising an OutOfMemory exception. Tested values for -Xmx were: 100M 500M 1000M 2000M.
- CPU ("user") seconds to populate the RBB using RBB.put. This is the “user” time
- CPU ("user") seconds to open and close the database with the H2 shell.
- PutRate / OpenRate
- NumTimeseries divided by PutTime and OpenTime, respectively.
- size of the database on disk, in megabytes.
- is the size of the same series of timeseries in textual (.seq) format.
|NumTimeseries||Heapsize (MB)||PutTime (s)||OpenTime (s)||PutRate||OpenRate||DBSize (MB)||SeqSize (MB)|
* For 100,000 and 150,000 timeseries the "real" runtime was about twice the "user" (cpu) time reported, perhaps indicating insufficient system memory.
**It is strange that a 150,000 timeseries database could be created with a 2000MB heap, but not subsequently opened; however that is what happened.