Tony Marston's Blog About software development, PHP and OOP

Keeping a history of changes by date

Posted on 1st January 2004 by Tony Marston

Amended on Amended 23rd August 2005

(adapted from an article on my UNIFACE page)

Introduction
The Business Rules
The Database Design
Conclusion
Amendment History
Comments

Introduction

Where there is data associated with a particular object that may change over a period of time there may be a requirement to keep a history of those changes so that you can tell what values were in effect for a particular date. Not only is this useful for keeping a record of changes that have been made in the past, it may also be useful for entering changes that will not come into effect until a date in the future. Typical examples of this requirement are:

In my long career I have seen several different ways of satisfying this requirement, some methods being better than others, so I want to share with you what I consider to be the most effective and efficient design.

The Business Rules

First we must state the rules that must be satisfied in the design:

The Database design

As we may be holding multiple history records for an object the database design should be obvious - a one-to-many relationship between the object and its history, as shown in figure 1 below:

Figure 1 - E-R diagram of OBJECT and OBJECT_HISTORY

OBJECT
arrow-right (1K)
OBJECT
HISTORY

The only questionable area now is the layout of the OBJECT_HISTORY table. Below is one design that I came across quite recently:

Design 1 - not good

Column Name Description
ID Technical primary key
OBJECT_ID Foreign key to OBJECT table
VALUE Object value
START_DATE Starting date for this value

I do not like this design as the use of an unnecessary technical primary key requires the maintenance of a counter and the creation of a second index for the foreign key. For further insight into my opinion on the indiscriminate versus intelligent usage of technical keys please refer to Technical Keys - Their Uses and Abuses.

A second design I came across several years ago was similar to the following:

Design 2 - not good

Column Name Description
OBJECT_ID Primary key, and Foreign key to OBJECT table
DATE Primary key, Starting date for this value
VALUE Object value

I do not like this design as it has the start date built into the primary key, which means that it cannot be changed. I remember the panic this caused when some butter-fingered user accidentally entered the wrong date and wanted to change it in a hurry.

I dislike both of these designs as they do not hold the end date for each entry, therefore they both require to access more than 1 occurrence in order to find the single occurrence that matches the target date. The implementation I saw for Design 1 required separate stored procedures to accomplish the following steps:

If your DBMS can handle subselects it is actually possible to complete these two actions in a single query similar to the following:

SELECT value FROM 'object_history' 
 WHERE object_id = '$object_id'  
   AND date = (SELECT MAX(date) FROM 'object_history' 
                WHERE object_id = '$object_id' 
	                AND date <= $today))

The following design is the one that I prefer to use as it makes the retrieval of data extremely fast and efficient:

Design 3 - my favourite

Column Name Description
OBJECT_ID Primary key, Foreign key to OBJECT table
SEQ_NO Primary key, starts at 1 for each object
VALUE Object value
START_DATE Starting date for this value
END_DATE Ending date for this value

This design has the following advantages over the others:

Note that if an entry does not yet have a value for END_DATE I do not leave it as null. I always use a dummy date such as '9999-12-31' to simulate 'sometime in the future', as explained in Dealing with null End Dates.

The maintenance of these history occurrences is not a problem provided that you keep to the following rules:

Note that the SELECT statement above is very efficient as it references a field which is indexed. This requires only a single database access.

Note also that the use of a sequential number in the compound primary key makes the identification and retrieval of the previous and next occurrences very simple and very efficient:

This simplicity and efficiency is lost if you employ one of the solutions shown in Design 1 and Design 2.

Conclusion

Although this is a common and relatively simple requirement which can appear to be satisfied in several different ways there may be hidden drawbacks in a particular design that do not make themselves apparent until after it has been implemented. I have personally witnessed the weaknesses of some designs and have therefore created my own solution which does not contain any of those weaknesses. I hope that you can benefit from my experience and thus avoid your own painful learning curve.


Amendment history:

23 August 2005 Added sample code to read a single record via a subselect.

counter