RBB Speed and Optimization

This page documents techniques for performance tuning and scalability limitations in the Relational Blackboard software.

A main objective of RBB is high performance (locating and retreiving data quickly), so applications can rely heavily on the blackboard concept. This means using the blackboard itself as the primary data structure in an application, instead of retrieving long-lived local copies of data for processing (since maintaining local copies is extra work for the implementor and leads to inconsistent state across the application).

Distributed Caching

For interactive Java applications, selective transparent mirroring of RBB data is provided by the EventCache and TimeseriesCache classes. They mange local copies of Events matching a specified tagset, updated in soft real-time. The cached information includes time coordinate transformation parameters, drastically accelerating performance when time coordinates are used.

If an RBB Is Slow to Open

The underlying H2 database is slow to open if there are many tables in the database. (This is typically because there are many Timeseries, because each is a table). One solution to this is to run a standalone server. To do this, run the h2 server with rbb.jar in its classpath using the command
java -jar rbb.jar server
Then in the application, use a server url with DB_CLOSE_DELAY with a sufficiently large number of seconds, or -1 to leave it open indefinitely. For example:
jdbc:h2:tcp:localhost/mydb;DB_CLOSE_DELAY=-1
The standalone server will still be slow to open the database the first time, but after that clients can quickly connect to the server and access the RBB.

Creating a Single-Tag Primary Key

Many functions in RBB must find a set of events from a tagset. This takes longer if the database contains a large number of distinct tagsets, or if the ‘find’ tagset contains several tags (each corresponds to a sql JOIN). If you are frequently finding the same tagsets, try combining all the values of that tagset into a single tag (or at least combining several tags often used in the same combination).

For example, H2SEvent.sequence(db, t, idTags, infoTags) calls 'find' on 'idTags' every time. Try replacing:

H2SEvent.sequence(db, 29, 'firstName=Bob,lastName=Jones,DOB=19601004', 'currentLocation=home');
with:
H2SEvent.sequence(db, 29, 'id=Bob:Jones:19601004', 'firstName=Bob,lastName=Jones,DOB=19601004,currentLocation=home');
The result is the same except each event has an extra ‘id’ tag.

Timeseries (Table) Overhead

Each timeseries in RBB is a table. The H2 database (as of Oct 2010) uses substantial resources for each table in the database whenever the database is opened. As a result the practical limit on the number of timeseries in an RBB is on the order of 100,000. The remainder of this section is supporting data from a benchmark.

The following table shows runtimes and memory usage for RBBs with varying numbers of timeseries. The columns are:

NumTimeseries
a new RBB was created and populated with the specified number of very short timeseries (two, one-dimensional observations).
Heapsize
the minimum value of -Xmx used to populate and then open the RBB without raising an OutOfMemory exception. Tested values for -Xmx were: 100M 500M 1000M 2000M.
PutTime
CPU ("user") seconds to populate the RBB using RBB.put. This is the “user” time
OpenTime
CPU ("user") seconds to open and close the database with the H2 shell.
PutRate / OpenRate
NumTimeseries divided by PutTime and OpenTime, respectively.
DBSize
size of the database on disk, in megabytes.
SeqSize
is the size of the same series of timeseries in textual (.seq) format.
The computer is a Core 2 Duo T9600 laptop @ 2.80GHz with 4GB RAM on AC power (not battery) with swap disabled.
NumTimeseriesHeapsize (MB)PutTime (s)OpenTime (s)PutRateOpenRateDBSize (MB)SeqSize (MB)
101000.50.520200.130.01
1001001.61.362.576.920.780.01
1,0001004.42.1227.27476.197.40.03
10,000500157.7666.671,298.7073.90.37
50,0001,00078.4535.04637.351,426.90378.431.98
100,0002,000221.52*78.86451.431,268.10759.223.98
150,0002,000740.57*- **202.7-11166.13
200,0000-----8.29

* For 100,000 and 150,000 timeseries the "real" runtime was about twice the "user" (cpu) time reported, perhaps indicating insufficient system memory.

**It is strange that a 150,000 timeseries database could be created with a 2000MB heap, but not subsequently opened; however that is what happened.