(adapted from an article on my UNIFACE page)
Where there is data associated with a particular object that may change over a period of time there may be a requirement to keep a history of those changes so that you can tell what values were in effect for a particular date. Not only is this useful for keeping a record of changes that have been made in the past, it may also be useful for entering changes that will not come into effect until a date in the future. Typical examples of this requirement are:
In my long career I have seen several different ways of satisfying this requirement, some methods being better than others, so I want to share with you what I consider to be the most effective and efficient design.
First we must state the rules that must be satisfied in the design:
As we may be holding multiple history records for an object the database design should be obvious - a one-to-many relationship between the object and its history, as shown in figure 1 below:
Figure 1 - E-R diagram of OBJECT and OBJECT_HISTORY
|
|
|
The only questionable area now is the layout of the OBJECT_HISTORY table. Below is one design that I came across quite recently:
Column Name | Description |
---|---|
ID | Technical primary key |
OBJECT_ID | Foreign key to OBJECT table |
VALUE | Object value |
START_DATE | Starting date for this value |
I do not like this design as the use of an unnecessary technical primary key requires the maintenance of a counter and the creation of a second index for the foreign key. For further insight into my opinion on the indiscriminate versus intelligent usage of technical keys please refer to Technical Keys - Their Uses and Abuses.
A second design I came across several years ago was similar to the following:
Column Name | Description |
---|---|
OBJECT_ID | Primary key, and Foreign key to OBJECT table |
DATE | Primary key, Starting date for this value |
VALUE | Object value |
I do not like this design as it has the start date built into the primary key, which means that it cannot be changed. I remember the panic this caused when some butter-fingered user accidentally entered the wrong date and wanted to change it in a hurry.
I dislike both of these designs as they do not hold the end date for each entry, therefore they both require to access more than 1 occurrence in order to find the single occurrence that matches the target date. The implementation I saw for Design 1 required separate stored procedures to accomplish the following steps:
If your DBMS can handle subselects it is actually possible to complete these two actions in a single query similar to the following:
SELECT value FROM 'object_history' WHERE object_id = '$object_id' AND date = (SELECT MAX(date) FROM 'object_history' WHERE object_id = '$object_id' AND date <= $today))
The following design is the one that I prefer to use as it makes the retrieval of data extremely fast and efficient:
Design 3 - my favourite
Column Name | Description |
---|---|
OBJECT_ID | Primary key, Foreign key to OBJECT table |
SEQ_NO | Primary key, starts at 1 for each object |
VALUE | Object value |
START_DATE | Starting date for this value |
END_DATE | Ending date for this value |
This design has the following advantages over the others:
SELECT ... WHERE (start_date <= 'target_date' & end_date >= 'target_date')
Note that if an entry does not yet have a value for END_DATE I do not leave it as null. I always use a dummy date such as '9999-12-31' to simulate 'sometime in the future', as explained in Dealing with null End Dates.
The maintenance of these history occurrences is not a problem provided that you keep to the following rules:
SELECT
statement as follows:
$query = "SELECT max(seq_no) FROM 'object_history' WHERE (object_id = $object_id)"; $result = mysql_query($query); // issue query $data = mysql_fetch_row($result); // fetch first row $count = $data[0]; // extract count object_history.seq_no = $count + 1 // increment countor by using a counter on the parent record as follows:
object.last_seq_no = object.last_seq_no + 1 object_history.seq_no = object.last_seq_no
Note that the SELECT
statement above is very efficient as it references a field which is indexed. This requires only a single database access.
Note also that the use of a sequential number in the compound primary key makes the identification and retrieval of the previous and next occurrences very simple and very efficient:
This simplicity and efficiency is lost if you employ one of the solutions shown in Design 1 and Design 2.
Although this is a common and relatively simple requirement which can appear to be satisfied in several different ways there may be hidden drawbacks in a particular design that do not make themselves apparent until after it has been implemented. I have personally witnessed the weaknesses of some designs and have therefore created my own solution which does not contain any of those weaknesses. I hope that you can benefit from my experience and thus avoid your own painful learning curve.
23 August 2005 | Added sample code to read a single record via a subselect. |