Keeping a history of changes by date

Posted on 1st January 2004 by Tony Marston

Amended on Amended 23rd August 2005

(adapted from an article on my UNIFACE page)

Introduction
The Business Rules
The Database Design
Conclusion
Amendment History
Comments

Introduction

Where there is data associated with a particular object that may change over a period of time there may be a requirement to keep a history of those changes so that you can tell what values were in effect for a particular date. Not only is this useful for keeping a record of changes that have been made in the past, it may also be useful for entering changes that will not come into effect until a date in the future. Typical examples of this requirement are:

Keeping track of changes of address in a Personnel system
Keeping track of price changes in a Parts system

In my long career I have seen several different ways of satisfying this requirement, some methods being better than others, so I want to share with you what I consider to be the most effective and efficient design.

The Business Rules

First we must state the rules that must be satisfied in the design:

The system must hold a value (or set of values) that apply to an object between a range of dates.
There must not be any overlapping dates (ie: there must not be a date on which more than one value applies).
There must not be any missing dates (ie: there must not be a date on which no value applies).

The Database design

As we may be holding multiple history records for an object the database design should be obvious - a one-to-many relationship between the object and its history, as shown in figure 1 below:

Figure 1 - E-R diagram of OBJECT and OBJECT_HISTORY

OBJECT

OBJECT
HISTORY

The only questionable area now is the layout of the OBJECT_HISTORY table. Below is one design that I came across quite recently:

Design 1 - not good

Column Name	Description
ID	Technical primary key
OBJECT_ID	Foreign key to OBJECT table
VALUE	Object value
START_DATE	Starting date for this value

I do not like this design as the use of an unnecessary technical primary key requires the maintenance of a counter and the creation of a second index for the foreign key. For further insight into my opinion on the indiscriminate versus intelligent usage of technical keys please refer to Technical Keys - Their Uses and Abuses.

A second design I came across several years ago was similar to the following:

Design 2 - not good

Column Name	Description
OBJECT_ID	Primary key, and Foreign key to OBJECT table
DATE	Primary key, Starting date for this value
VALUE	Object value

I do not like this design as it has the start date built into the primary key, which means that it cannot be changed. I remember the panic this caused when some butter-fingered user accidentally entered the wrong date and wanted to change it in a hurry.

I dislike both of these designs as they do not hold the end date for each entry, therefore they both require to access more than 1 occurrence in order to find the single occurrence that matches the target date. The implementation I saw for Design 1 required separate stored procedures to accomplish the following steps:

Extract all occurrences except those where START_DATE > target date.
Of the remainder extract the one with the highest value of START_DATE.

If your DBMS can handle subselects it is actually possible to complete these two actions in a single query similar to the following:

SELECT value FROM 'object_history' 
 WHERE object_id = '$object_id'  
   AND date = (SELECT MAX(date) FROM 'object_history' 
                WHERE object_id = '$object_id' 
	                AND date <= $today))

The following design is the one that I prefer to use as it makes the retrieval of data extremely fast and efficient:

Design 3 - my favourite

Column Name	Description
OBJECT_ID	Primary key, Foreign key to OBJECT table
SEQ_NO	Primary key, starts at 1 for each object
VALUE	Object value
START_DATE	Starting date for this value
END_DATE	Ending date for this value

This design has the following advantages over the others:

A separate index is not required for the foreign key as the primary key caters for both.
The start date and end date for an occurrence can be changed quite easily.
The single occurrence which matches any target date can be retrieved very efficiently in a single step by using code similar to the following:
```
SELECT ... WHERE (start_date <= 'target_date' & end_date >= 'target_date')
```

Note that if an entry does not yet have a value for END_DATE I do not leave it as null. I always use a dummy date such as '9999-12-31' to simulate 'sometime in the future', as explained in Dealing with null End Dates.

The maintenance of these history occurrences is not a problem provided that you keep to the following rules:

When creating a new occurrence the value for SEQ_NO is incremented by 1. Remember that each OBJECT_ID has its own sequence. This value can be obtained either by the use of a SELECT statement as follows:

$query = "SELECT max(seq_no) FROM 'object_history' 
          WHERE (object_id = $object_id)";
$result = mysql_query($query);        // issue query
$data = mysql_fetch_row($result);     // fetch first row
$count = $data[0];                    // extract count
object_history.seq_no = $count + 1    // increment count

or by using a counter on the parent record as follows:

object.last_seq_no = object.last_seq_no + 1
object_history.seq_no = object.last_seq_no

When entering or changing the START_DATE for an occurrence the END_DATE of the previous occurrence must be set to 1 day earlier, but no earlier than its own START_DATE.
When changing the END_DATE for an occurrence the START_DATE of the next occurrence must be set to 1 day later, but no later than its own END_DATE.

Note that the SELECT statement above is very efficient as it references a field which is indexed. This requires only a single database access.

Note also that the use of a sequential number in the compound primary key makes the identification and retrieval of the previous and next occurrences very simple and very efficient:

The primary key of the previous occurrence has a value for SEQ_NO which is always 1 less than the value on the current occurrence.
The primary key of the next occurrence has a value for SEQ_NO which is always 1 greater than the value on the current occurrence.

This simplicity and efficiency is lost if you employ one of the solutions shown in Design 1 and Design 2.

Conclusion

Although this is a common and relatively simple requirement which can appear to be satisfied in several different ways there may be hidden drawbacks in a particular design that do not make themselves apparent until after it has been implemented. I have personally witnessed the weaknesses of some designs and have therefore created my own solution which does not contain any of those weaknesses. I hope that you can benefit from my experience and thus avoid your own painful learning curve.

Amendment history:

23 August 2005

Added sample code to read a single record via a subselect.

counter

Tony Marston's Blog About software development, PHP and OOP