I recently came across a blog post from Shawn McCool called Active Record: How We Got Persistence Perfectly Wrong which complains about certain problems appearing when using the Active Record pattern, and he wants people to stop using it in favour of a "proper" object oriented approach (whatever that means). I have read this document and concluded that there is nothing inherently wrong with this pattern and that all the problems are actually caused by a faulty implementation. I have been using my own version of this pattern for over 20 years and I have never encountered any of his problems, so I will attempt to explain why I believe my different approach is better.
I found the following description at wikipedia.org:
In software engineering, the active record pattern is an architectural pattern. It is found in software that stores in-memory object data in relational databases. It was named by Martin Fowler in his 2002 book Patterns of Enterprise Application Architecture. The interface of an object conforming to this pattern would include functions such as Insert, Update, and Delete, plus properties that correspond more or less directly to the columns in the underlying database table.The active record pattern is an approach to accessing data in a database. A database table or view is wrapped into a class. Thus, an object instance is tied to a single row in the table. After creation of an object, a new row is added to the table upon save. Any object loaded gets its information from the database. When an object is updated, the corresponding row in the table is also updated. The wrapper class implements accessor methods or properties for each column in the table or view.
This pattern is commonly used by object persistence tools and in object-relational mapping (ORM). Typically, foreign key relationships will be exposed as an object instance of the appropriate type via a property.
The Active Record pattern is described by Martin Fowler, the author of Patterns of Enterprise Application Architecture (PoEAA), as follows:
An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.
An object carries both data and behavior. Much of this data is persistent and needs to be stored in a database. Active Record uses the most obvious approach, putting data access logic in the domain object. This way all people know how to read and write their data to and from the database.
The book adds to this description as follows:
The essence of an Active Record is a Domain Model (116) in which the classes match very closely the record structure of an underlying database. Each Active Record is responsible for saving and loading to the database and also for any domain logic that acts on the data. This may be all the domain logic in the application, or you may find that some domain logic is held in Transaction Scripts (110) with common an data oriented code in the Active Record.
That means having a separate class for each table in the database that handles the data and the operations which act on that data. In RADICORE this includes methods for each of the Create, Read, Update and Delete (CRUD) operations as well as all the business rules. I never put any domain logic into Transaction Scripts.
The data structure of the Active Record should exactly match that of the database: one field in the class for each column in the table. Type the fields the way the SQL interface gives you the data - don't do any conversion at this stage.
Each table (Model) class in RADICORE has the structure of the underlying table loaded into its common table properties which are populated with data which is extracted from the database, which means that the class is always in step with its table. I do not have a separate property for each field as this would require a separate getter and setter for each and would contribute to tight coupling which is a bad thing. Instead I pass all data around in a single $fieldarray argument as this contributes to loose coupling which is a good thing. This means that all operations on all tables can be covered by the same set of common table methods which are inherited from the abstract table class. This approach is the reason why I have so much polymorphism in my applications which I can utilise using dependency injection.
You may consider Foreign Key mapping (236), but you may also leave the foreign keys as they are. You can use views or tables with Active Record, although updates through views are obviously harder. Views are particularly useful for reporting purposes.
While in OO parlance there are things known as associations, if there is a parent-child relationship between two tables in the database then the child table contains a column (or columns) known as the foreign key which maps to the primary key of a record in the parent table. There is nothing in the column definition which identifies it as a foreign key, it is a column with a name and a value just like every other column. In RADICORE this information is held in either the $parent_relations or $child_relations arrays. I do not use any views as I am able to construct whatever query is most appropriate, including sub-queries, JOINS and Common Table Expressions within the _cm_pre_getData() method of the table class.
The Active Record class typically has methods that do the following:
- Construct an instance of the Active Record from an SQL result set row.
- Construct a new instance for later insertion into the table.
- Static finder methods to wrap commonly used SQL queries and return Active Record objects.
- Update the database and insert into it the the data in the Active Record.
- Get and Set the fields.
- Implement some pieces of business logic.
The getting and setting methods can do some other intelligent things, such as convert from SQL-oriented types to better in-memory types. Also, if you ask for a related table, the getting method can return the appropriate Active Record, even if you aren't using Identity Field (216) on the data structure (by doing a lookup).
Every piece of data that comes into a PHP script, either from a HTML form or an SQL query, is presented as a string, and as PHP's default behaviour is to coerce strings into other types as and when necessary I do not need to do any type conversions. All I have to do on input or update operations is the check that the contents of each string can successfully be coerced into the type which is expected by the database, and this validation is performed automatically by the framework using my standard validation object. If I wish to retrieve data from a related table I can either build a JOIN into the SELECT query, or I can put code into the _cm_getForeignData() method.
In this pattern the classes are convenient, but they don't hide the fact that a relational database is present. As a result you usually see fewer of the other object-relational mapping patterns present when you're using Active Record.
It makes no sense to me to hide the fact that I am using a relational database as there is no viable alternative for the applications which I write. I have also found it better to work with the database than attempt to fight against it. Just because several mapping patterns exist does not mean that I am obliged to use them. I see the need for mapping between opposing structures as a mistake, which is why I take steps to avoid that need.
Active Record is very similar to Row Data Gateway (152). The principle difference is that a Row Data Gateway (152) contains only database access while an Active Record contains both data source and domain logic. Like most boundaries in software, the line between the two isn't terribly sharp, but it's useful
As I deliberately chose to design my framework around the 3 Tier Architecture from the outset, all database access has been taken out of the Business layer and placed into a separate Data Access layer. Note that I have one Data Access Object (DAO) for each supported DBMS, not each table. I have found that the boundary between the two layers can be as sharp as the intellect of the designer.
Because of the close coupling between the Active Record and the database, I more often see static find methods in this pattern. However, there's no reason that you can't separate out the find methods into a separate class.
An SQL query is nothing but a string which is comprised of a fixed set of substrings, some of which are optional, such as <select_string>
, <from_string>
, <where_string>
and <having_string>
. As PHP is perfectly capable of manipulating strings with simple code within the table class I can see no benefit in moving that code to a separate finder class.
Active Record is a good choice for domain logic that isn't too complex, such as creates, reads, updates and deletes. Derivations and validations based on a single record work well with this structure.
There is no rule that says that the domain logic must be simple, or must be restricted to a single record. There is no rule that says I cannot modify the implementation and add as much complexity as I see fit. In RADICORE I can deal with as many rows as I like, I can read from and write to as many other tables as I like, and I can make the business rules as simple or as complex as I like. It is important to remember that design patterns are just that - designs - and they do not dictate or restrict the many ways in which the pattern can be implemented. That is down to the intellect and skill of the individual developer who may choose to create an implementation based on different aspects of several different patterns.
In an initial design for a Domain Model (116) the main choice is between Active Record and Data Mapper (165). Active Record has the primary advantage of simplicity. It's easy to build Active Records, and they are easy to understand. Their primary problem is that they work well only in the Active Record objects correspond directly to the database tables: as isomorphic schema.
I do not see that objects which map directly to database tables create any sort of problem. On the contrary, this gives me the ability to generate table classes using data extracted from the INFORMATION SCHEMA in the application database. Each table (Model) class inherits all its standard code from an abstract table class, and custom code can be added later using any of the available "hook" methods. I see an isomorphic schema as a way to remove the need for that abomination called an Object Relational Mapper (ORM).
If your business logic is complex, you'll soon want to use your object's direct relationships, collections, inheritance, and so forth. These don't map easily onto Active Record, and adding them piecemeal gets very messy. That's what will lead you to use Data Mapper (165) instead.
Using relationships with other tables has never been a problem in RADICORE. I have even added the ability to automatically create SELECT queries with JOINS to parent tables. Collections (aggregations) are not a problem as I never create a single object to deal with an aggregation, I create a separate task to deal with each parent-child relationship. Inheritance is not a problem as I only ever inherit from an abstract class.
Another argument against Active Record is the fact that it couples the object design to the database design. This makes it more difficult to refactor either design as a project goes forward.
I do not see any sort of problem with keeping the object design and database design perfectly matched as I have automated the procedure for extracting changes to a table's schema and importing them into that table's object using a table structure script. Because of this I do not need any sort of Object Relational Mapper (ORM).
Active Record is a good pattern to consider if you're using Transaction Script (110) and are beginning to feel the pain of code duplication and the difficulty in updating scripts and tables that Transaction Script (110) often brings. In this case you can gradually start creating Active Records and then slowly refactor behaviour into them. It often helps to wrap the tables as a Gateway (466) first, and then start moving behaviour so that the tables evolve to an Active Record.
I do not use the Transaction Script pattern - I have Controllers which call Models where the business rules are within the Model. I did not start to create my table (Model) classes by reading about the Active Record pattern and them implementing it in exactly the same way as everybody else. I followed Erich Gamma's advice who, in How to Use Design Patterns, said Do not start immediately throwing patterns into a design, but use them as you go and understand more of the problem. Because of this I really like to use patterns after the fact, refactoring to patterns.
I do not have any code duplication in my table classes as I had the foresight to move all code which could be duplicated into an abstract table class which could then be inherited by every concrete table (Model) class. This then enabled me to implement the Template Method Pattern which is the backbone of my framework. Over the last 20 years I have made numerous enhancements to the contents of the abstract class which has not affected any concrete classes as they contain nothing but "hook" methods.
In RADICORE the table classes function differently:
Perhaps my implementation of this pattern is not identical to anyone else's, but why should it be? No design pattern comes supplied with a definitive implementation, just a series of objectives. I do not pick a pattern and then duplicate someone else's implementation, I write code that works according to my own rules, and if a pattern appears then that just proves that great minds think alike. If an existing pattern does not appear then it indicates that I have followed my mantra of innovate, don't imitate and invented a new pattern. The efficacy of any particular implementation is governed solely by the skill, or lack thereof, of its implementor.
Before I switched to using PHP in 2002 I had 20 years' experience of developing database applications (now known as enterprise applications) in two different languages using a mixture of hierarchical, network and relational databases. I had written frameworks in both of those languages.
I had knowledge of the following:
After working with databases for 20 years I also knew the following:
Smart data structures and dumb code works a lot better than the other way around.
In a large ERP application, such as the GM-X Application Suite, which is comprised on a number of subsystems, each subsystem has a unique set of attributes:
Despite the fact that these two areas are completely different for each subsystem, they each have their own patterns and so can be handled using standard reusable code provided by the framework:
More information on this topic can be found at Evolution of the RADICORE framework.
Before switching to a language with object-oriented capabilities I asked myself the questions Why should I switch? What new things does OO bring to the table?
The best explanation I found was:
Object Oriented Programming is programming which is oriented around objects, thus taking advantage of Encapsulation, Inheritance and Polymorphism to increase code reuse and decrease code maintenance.
This told me that I was supposed to use the features of the OO language in order to increase the amount of reusable code. This seemed a good idea to me as the more reusable code you have at your disposal then the less code you have to write to get things done, and the less code you have have to write then the quicker you can get things done, which in turn means that you become more productive.
My initial understanding of OO programming was that it is exactly the same as procedural programming in that they both deal with imperative statements which are executed in a linear fashion, with the only exception being that one has encapsulation, inheritance and polymorphism while the other does not. Encapsulation, the ability to create classes and objects, provides the ability to bundle data and behaviour together in the same component, something which cannot be done in procedural languages. My understanding has not changed over the years.
I had no knowledge of the following "rules" relating to Object-Oriented Programming (OOP), so I did not follow them:
Some people act as if these "rules" were set in stone by some sort of a supreme being and handed down from the mountain top. I do not. They are merely suggestions devised by people based on their experiences and the choices which worked for them. I choose to make decisions based on my experience and what works best for me, and I refuse to revise my decisions just to be consistent with everyone else. By being consistent with bad practices I would be doing nothing but being consistently bad. That idea does not float my boat.
I have produced criticisms of some of these principles in the following:
I also noticed, after reading many articles on the theories behind OOP, that they seemed to have been written by academics as a sort of learning exercise or proof of concept. They did not appear to have been written by professional programmers working in a commercial environment to produce business applications. They also seemed to be restricted to compiled languages tied to a bit-mapped GUI, such as Smalltalk, which were around in the 1970s and 1980s. Consequently the languages and the hardware on which they ran were primitive and slow. Fast forward 35+ years and the programming environment is totally different - we have hardware which is orders of magnitude faster, data storage mechanisms which are orders of magnitude bigger, programming languages which are orders of magnitude more sophisticated, and relational databases which are orders of magnitude more powerful. Last but not least we have the internet which allows any computer to connect and communicate with other computers anywhere in the world using the HTTP protocol and HTML forms. Consequently the implementations "suggested" by the early pioneers of OOP need to be adjusted to take advantage of what is available now instead of what was available then.
The PHP manual informed me how to deal with encapsulation and inheritance, but I had to work out polymorphism for myself. All the descriptions I read were long on words and short on meaning, as in:
Polymorphism is a feature of object-oriented programming languages that allows a specific routine to use variables of different types at different times.
Polymorphism allows the use of a single interface with different underlying forms, such as data types or classes.
Polymorphism is the characteristic of being able to assign a different meaning or usage to something in different contexts - specifically, to allow an entity such as a variable, a function, or an object to have more than one form.
It took several years for me to condense it down to the simplest explanation that a novice like me could understand -
Polymorphism is the ability to have the same method signature in different classes, usually but not necessarily via inheritance, so that calling the same method on different objects can produce different results.
That may provide a definition of polymorphism, but it does not describe how you make use of it. The simple answer is:
You write code which performs one or more known operations on an unknown object where the identity of that object is not supplied until runtime.
Although I created my own implementation in 2003, it was not until years later that I discovered the technique was called Dependency Injection. This realisation was delayed by the fact that the official description was long on words but short on substance and failed to get to the point in a succinct and easily comprehensible manner.
Every OO programmer should be aware that a design pattern simply defines a design and not an implementation. It identifies what should be done, not how it should be done. This means that each implementation of that pattern can be totally different. If you come across sample code in a book or online tutorial you should treat that as an example of how it could be done and not how it should be done. You should never treat that implementation as the one source of truth. You should never be afraid to try something different as you may end up by finding a way that is actually better. You will never be able to prove your own worth if all you do is copy someone else's work. In other words you should never be afraid to innovate, not imitate.
When you are a novice programmer you have to learn the capabilities of the programming language so that you can become more effective in that language. If you are a junior member of a team then that team may have a document called "standards" or "best practices" which should help you write effective and maintainable code and to avoid mistakes. Bear in mind that while a particular set of standards may be used by one programming team, a different team may have a different set of standards. There is no such thing as a single set of programming standards which are universally applicable just as there is no such thing as a single implementation for each design pattern. This is one of the reasons that I don't treat those OO rules as if they were cast in stone and handed down from the mountain top by some supreme being. They are merely the random thoughts of some individual, or group of individuals, which may be debunked, replaced, superseded or enhanced at any time. Some ideas stand the test of time while others are in fashion one minute and out of fashion the next.
This is why in my own implementation, without even knowing that such patterns existed, I ended up by deviating from the Active Record pattern where an object instance is tied to a single row and merged it with the Table Module pattern which handles the business logic for all rows in a database table. I also made the changes described below.
The only architectural pattern which I had in mind before I started coding was the 3 Tier Architecture which I had encountered in UNIFACE, my previous language. This language started out using a 2 Tier Architecture in version 5, but upgraded it to 3 tiers in version 7. This was better that the monolithic single tier architecture which I was used in my COBOL days. Note that OO programming naturally forces you to implement a minimum of 2 tiers as after writing a class with methods and properties you need a separate piece of code to instantiate that class into an object so that you can call its methods. The object being called exists in the Business layer while the object doing the calling exists in the Presentation layer. I had also decided that I would create all my HTML output using XML and XSL transformations, so I created a single reusable object to do this. It required only two pieces of information - the name of the object from which it could extract the data and convert it to XML, and the name of the XSL stylesheet which would be used in the transformation process. This then meant that I had effectively split my Presentation layer into two parts - a Controller and a View, which left me with a Model in the Business layer. This architecture has the structure shown in Figure 1 below:
Figure 1 - Combination of 3 Tier Architecture and Model-View-Controller design pattern
The flow of control is as follows:
In the RADICORE framework these four modules/components are implemented as follows:
Note that I did not set out to implement the MVC design pattern because at that time I had never heard of it. It happened by accident, not by design (pun intended).
It is the Models in the above diagram which are constructed around the pattern which is the subject of this article.
After getting PHP, Apache and MySQL installed on my home PC I taught myself to write code in this new language using various books and online tutorials. This was 2002, so what was available at that time was pretty primitive. I certainly did not see a style which I wanted to copy. I started by building a sample database, then I built a concrete class for the first table containing the following methods:
Note that the first three these methods deal with one record at a time. The getData()
method can return any number of rows, with the number being return governed by the setting of $object->setRowsPerPage() and $dbobject->setPageNo().
Later on, when the need arose, I added the following methods:
I later found a need for the following methods:
Note here that the above methods are defined in objects which reside in the business/domain layer (which I later refer to as Models) and which are called by other objects which reside in the presentation layer (which I later refer to as Controllers). I saw too many code samples which had separate methods for load()
, validate()
and store()
which always had to be called in the same sequence, but in this situation I remembered a technique which I had first adopted in the 1980s in my COBOL days - if you always have to perform a group of functions in a set sequence in order to carry out a high-level objective then it would be a good idea to wrap that group in a high-level function so that you can replace a group of subroutine calls with a single call to that wrapper function. This then results in a series of wrapper functions which resemble the following:
Note here that there is no separate load()
method as the entire contents of the $_POST array is copied from the Controller into the Model as a single parameter on the method call. This is explained in A single property for all table data. There is a different validation method for each of these operations because - guess what? - the validation rules are different for each operation. Likewise there are different _dml_*
methods to construct and execute the SQL query because - guess what? - the structure of the query is different for each operation. Note that if the validateXXX()
method detects any sort of error then the following _dml_*
method will not be executed. Instead it will return an array of error messages to the Controller which will be displayed to the user so that the error(s) can be corrected.
Those of you who are still awake and have brains which are firing on all cylinders should be able to recognise that this arrangement of wrapper functions inside an abstract class opens the door to using the Template Method Pattern. This means that an any time in the future the contents of the wrapper function may be altered without having to change any places where the wrapper function is called. This is how I later introduced my sets of "hook" methods to deal with custom processing in each concrete subclass.
The Active Record pattern only mentions three of the four CRUD operations, which I thought very strange, so I added it as I kinda guessed that I might want to read records from the database at some point. I created the following method:
This created and executed a query along the lines of:
SELECT * FROM <table> [WHERE $where]where the WHERE string, if supplied, could be anything at all. I had never heard about the idea of writing finder methods, so I followed the simplest path and used bog standard PHP code to build my query string. I found the string handling functions in PHP very easy to work with, and I could see no reason to write a complicated object for doing something so simple. An example of how the query can be customised is shown in the _cm_pre_getData() method.
Note here that a SELECT query is capable of returning zero or more rows, so I saw no reason to restrict each object to a single row.
I discovered many years later that my coding style contravened the "advice" given by some proponents of Domain Driven Design (DDD) who advocate the use of a separate method for each use case. This to me sounds like a bad idea as it would wipe out the ability to reuse code via that mechanism known as polymorphism which is absolutely essential for Dependency Injection. Let me explain. In my ERP application I have over 4,000 use cases, which are implemented as separate tasks, where I have over 400 table classes which are accessed through 45 reusable Controllers. This is all possible because all 400 table classes use exactly the same generic methods and all 45 Controllers communicate with the table classes using these methods. This arrangement provides 400 x 45 = 18,000 (yes, EIGHTEEN THOUSAND) opportunities for polymorphism which I can then take advantage of using my version of Dependency Injection.
If I were to use 4,000 unique methods instead of 4 shared methods I would have to kiss goodbye to all that polymorphism and the benefits that it provides. Instead of having 45 Controllers which could be reused with any table class (by virtue of the fact that they call the same methods irrespective of which table class they are given) I would have to have a customised Controller for each table class. Reducing the amount of reusable code at my disposal would be cancelling out the reason for using OOP in the first place. It would be as sensible as buying a new car to go somewhere quicker, then refusing to take it out of first gear or disengage the handbreak.
As explained further in Separate Controllers for each task (use case) the processing for each use case is actually split between the Controller and the Model - the Model contains a large number of methods while each Controller calls only a specific subset of those methods in order to achieve a specific result.
In all the code samples which I found in books and online tutorials I saw getters and setters being used to move data into and out of each object. As I played with PHP I was amazed at the flexibility of arrays, especially when compared with my previous languages, so when I wrote my first script to perform an INSERT operation on a table and saw that all the data in the HTML form was being passed in via the $_POST array I asked myself a simple question: Can the code handle data which exists in a single variable which is an array instead of separate variables for each column?
I did a quick test and discovered that the answer was YES. There is no effective difference between:
$fieldarray['column1']and
$this->column1
This meant that I could end up with the following set of method calls in my Controllers:
$fieldarray = $object->insertRecord($_POST);
$fieldarray = $object->updateRecord($_POST);
$fieldarray = $object->deleteRecord($_POST);
$fieldarray = $object->getData($where);
Note here that $fieldarray can either be an associative array representing a single row, or it can be an indexed array containing any number of rows. This single array is always used to passed data between objects and even between methods. The View object can extract all this data using a singe call to the getFieldArray() method, thus avoiding the need to have a separate object for each Model due to them having different sets of columns.
Note also that I had no intention of including the table name in each method name in order to make them unique as I had already learned that method names need only be unique within a class, unlike function names which have to be unique within the entire application. The ability to have the same method names used in multiple objects produces polymorphism, and I later learned that polymorphism is an essential ingredient for dependency injection, which is another useful coding technique.
While some of you may be surprised or even shocked at my choice of an array to pass data from one component to another, there is method in my madness. As I stated earlier in My prior knowledge I was already aware of the principle of coupling with the aim of keeping it as loose as possible. Coupling is all about how modules interact, and this is all done through method signatures - one module contains a method signature and another module interacts with it by calling that method with a compatible set of arguments (parameters). If the signature in the callee is changed then it must also be changed in every caller, thus creating what is known as a ripple effect. In order to achieve loose coupling you must devise a method signature which would need to be changed as little as possible. How can this be done? Look at the following examples:
<?php $dbobject = new Person(); $dbobject->setUserID ( $_POST['userID' ); $dbobject->setEmail ( $_POST['email' ); $dbobject->setFirstname ( $_POST['firstname'); $dbobject->setLastname ( $_POST['lastname' ); $dbobject->setAddress1 ( $_POST['address1' ); $dbobject->setAddress2 ( $_POST['address2' ); $dbobject->setCity ( $_POST['city' ); $dbobject->setProvince ( $_POST['province' ); $dbobject->setCountry ( $_POST['country' ); if ($dbobject->updatePerson($db) !== true) { // do error handling } ?>
$result = $dbobject->update($_POST['userID'], $_POST['email'], $_POST['firstname'], $_POST['lastname'], $_POST['address1'], $_POST['address2'], $_POST['city'], $_POST['province'], $_POST['country'], );
<?php require_once 'classes/$table_id.class.inc'; // $table_id is provided by the previous script $dbobject = new $table_id; $result = $dbobject->updateRecord($_POST); if ($dbobject->errors) { // do error handling } ?>
There is no rule which states which one of the above options should be followed. Just because a sample implementation of a design pattern shows a particular method does not mean that it is a requirement, it is just a suggestion. It is up to the individual to decide which implementation is best for them, which has the most advantages and the fewest disadvantages, but then they must be prepared to live with the consequences of that decision. Copying someone else's example and then complaining about it fits the description of a bad workman always blames his tools.
When you consider that in the life of a database application you may, at any time, have to deal with columns being either added to or removed from a table, which one of the approaches shown above could absorb those additions or deletions without requiring any changes to any class properties or method signatures? If you have to change ANY method signatures then you have created a ripple effect which means that your code is tightly coupled, which is considered to be bad. Not only does option #3 produce a result where a Controller is NOT tied to a particular Model, it actually has the advantage that any method called by a Controller can be used with ANY Model. It is not physically possible to produce coupling which is looser than that.
One quick way to verify the level of coupling is to check whether the methods are polymorphic or not. Those in option #1 and #2 are not polymorphic as they are tied to a particular Model containing particular columns. The method signature in option #3 does not contain any table or column names, and as it is shared by every Model class through inheritance it is polymorphic and therefore demonstrates loose coupling.
Another issue that leapt out at me with options #1 and #2 is when performing a partial update where some of the columns are not supplied how can you tell the difference between a missing value and a value being changed to NULL. With option #3 there is no such issue as any column which is not to be changed will not be in the array.
Not only does option #3 avoid the ripple effect caused by adding or removing columns from a database table, it also demonstrates my implementation of Dependency Injection which means that the code can operate on any database table with any mixture of columns. It also solves the performance issue caused by the N+1 problem.
One of the vitally important things that every database programmer should know is that you must never trust a user's data as it may contain values that are not compatible with the table's definition. Each column in a database table has a name, a data type, a size, and perhaps other limitations, and the program must validate each piece of data to check that it won't be rejected by the database. This check must be performed BEFORE it is sent to the database so that in the event of a validation error the data can be sent back to the user with a suitable error message so that it can be corrected. Without this check the database will reject the query by terminating the program, and this will not please the user.
In order to carry out this validation the table class must know the names of its columns along with their data specifications. I did not like the idea of hard-coding this validation, so I looked for a way to automate it. I already knew that I had to match each column's data with its specifications in the DDL data, so I copied this data into a standard property called $fieldspec. Initially I did this manually, but I found this to be boring and repetitious, so I automated it by building a process which extracted this data from the database and copied it into a <tablename>.dict.inc file which could then be loaded into a table object when it was instantiated.
An astute programmer should instantly recognise the advantage of having one array of field values and a second array of field specifications - it then becomes incredibly easy to write a standard function which can iterate through one of these arrays, find a corresponding entry in the other array, then verify that the value matches its specifications.
The abstract table class contains a set of common table properties which are loaded with different values with each concrete table class.
It is important to note here that this metadata is stored within each table class but not processed by any code within that class. Any processing which is required is performed by the framework as and when necessary, either with invariant methods within the abstract class, or functions with each Controller.
After I had built a concrete class for the first table I then moved on to the second table. I did this by copying the original table class, then going through it to change any references to table and column names. As you can imagine this produced a lot of duplicated code, so I decided to eliminate this duplication using this magic concept called "inheritance". I created an abstract class, changed both table classes to inherit from this abstract class, then moved all duplicated methods from the subclasses to the abstract class. The end result was a large abstract class full of methods while each subclass had nothing but a constructor which loaded the table's metadata from a separate <tablename>.dict.inc file into standard variables.
Note that while I started my PHP development with version 4 which did not include the term "abstract" in class definitions I created what I called a "generic" class which did not have a constructor which provided any identifying details, so any attempt to use it without going through a subclass would result in a fatal error.
By causing each concrete table class to inherit from an abstract table class this means that all the public methods are shared by each subclass and are available polymorphically which in turn means that I can access them using dependency injection. It also meant that I could later implement the Template Method Pattern.
After perfecting my implementation I heard such ideas as favour composition over inheritance and inheritance breaks encapsulation. I dismissed these notions as pure bunkum for two reasons:
As a devout follower of the if it ain't broke, don't fix it principle I fail to see any justification in fixing a "problem" that does not exist.
It was not until several years later after reading Designing Reusable Classes and Object Composition vs. Inheritance that I discovered the so-called "problem" was not with inheritance itself, but with its overuse. Over-eager developers were creating deep inheritance hierarchies with a concrete class as its root. In order to avoid any problems that this creates the advice was always the same - only inherit from an abstract class. When I came to use inheritance for the very first time I had two concrete classes which shared a lot of code, so did I make the second concrete class a subclass of the first? No I did not, simply because I saw straight away that the first concrete class contained things that should not be shared by the second one. Instead I created a generic class that contained only those elements that could be shared, then I changed my two concrete classes to inherit from this third generic class. It turns out that I went down the right path not because I was taught to go down that particular path, but because I was not taught to go down the wrong path.
More of my thoughts on this topic can be found at Inheritance is NOT evil.
Dealing with tables which needed nothing but primary validation was easy, but I very quickly encountered a situation where I needed some specific validation in a table subclass which could not be automated. After a bit of pondering I remembered something that I had heard about years before involving using customisable pre- and post- processing functions which could be processed before and after a standard function. I did not know at the time that it was called the Template Method Pattern, but it sounded like a good idea, so I implemented it. This was very easy as I had already coded the insertRecord()
method to call sub methods called _validateInsertPrimary() and _dml_insertRecord(), so all I needed to do was insert some new "hook" methods into the processing flow. I decided to give them a prefix of '_cm_' to indicate that they were customisable methods. I implemented them as concrete methods, not abstract, which meant that they did not need to be defined in a subclass unless they were actually needed. I ended up with the following set of methods:
Public method | Internal methods |
---|---|
insertRecord ($rowdata) |
|
Another area where I was informed that my implementation was "impure" and "incorrect" was regarding the Controller pattern in the GRASP principles. The more I read about this pattern the more reasons I found to ignore it, mainly because it would cause me to lose vast amounts of reusability that I had already achieved. I had originally based my framework on the 3 Tier Architecture with its separate Presentation, Business and Data Access layers, but because I had split my Presentation layer into two separate components - one to handle the HTTP request and another to handle the HTTP response - I discovered that I had actually combined it with an implementation of the Model-View-Controller design pattern which resulted in the structure shown in Figure 1 above. My implementation of the Controller did not conform to that which was suggested in the GRASP pattern in the following areas:
GRASP description | My Implementation |
---|---|
It is not part of the User Interface (UI). | I disagree. It sits in the Presentation layer, therefore it is part of the UI. |
It is part of the application/service layer. | I disagree. All application knowledge resides in the Models which exist in the business/domain layer. Controllers and Views are services which reside in the Presentation layer. | It receives a request from the client and translates this into method calls on the relevant business/domain object(s). | I agree. Each HTTP request is either a GET or a POST, and will result in one or more method calls on one or more domain objects. |
It can handle more than one use case. | I disagree. It can only handle a single use case. |
Each Model requires its own controller. | I disagree. Each Controller can communicate with any Model as the methods which it calls are available in every Model via polymorphism. |
In my early COBOL days it was common practice to create a separate program for each entity which always started off in LIST mode, but had separate function keys which enabled it to switch into one of the other modes - ADD, READ, UPDATE and DELETE. Each "mode" can be regarded as a separate use case (which I used to call user transaction but now call task). This single program could handle multiple modes with multiple screens, and once activated the user could switch from one mode to any other simply by pressing a function key. Because all the logic was mixed together it became difficult to ensure that the right logic was executed at the right time. This problem became worse when in the mid-1980s a customer for a new bespoke system insisted of having a method whereby users could only access those modes (use cases) to which they had specifically been granted access. This was when I designed and built my first Role Base Access Control (RBAC) system. This required its own database with its own set of maintenance screens. It became rather tricky to have extra code in each program to check a user's permissions to either allow or disallow the pressing of a function key, so this is when I decided to split the program dealing with multiple modes into a separate subprogram for each mode. Each mode (use case) had its own entry on the TASK table which then enabled it to have its own entry on the ROLE-TASK and MENU tables. I had a separate program which handled the LOGON screen (which identified the user) which then displayed a screen of menu options after those to which the user had not been granted access had been filtered out. When an option was chosen the program would then activate the relevant subprogram. This meant that each user could only see those options to which he had been granted access. This MENU program then had code, equivalent to a router of today, which took the user's choice of task and translated it into a subroutine call. This meant that none of the application subprograms needed to contain any code to check the user's permissions as this was all handled in the MENU program before the subprogram was activated. This idea is discussed further in Component Design - Large and Complex vs. Small and Simple and is something which I have followed in every implementation of my framework. The most common set of maintenance tasks which can be used on a database table is shown in Figure 2:
Figure 2 - A typical Family of Forms
Note that each of the boxes in the above diagram is a clickable link.
I don't have a separate Controller for each Model as I have learned to take advantage of Inheriting from an abstract class and having A single property for all table data to produce Controllers which can be used with any Model using the power of Dependency Injection. This comes from two simple facts which I first observed decades ago in my COBOL days:
It wasn't until I started using a language which offered me encapsulation, inheritance and polymorphism that I was able to write code which allowed me to implement those facts in a more reusable way:
In the following table I compare the results of writing code which follows the traditional and "proper" rules with the code that I write which follows a different set of rules:
traditional | effect on the database | the Tony Marston way |
---|---|---|
createProduct() | insert a record into the PRODUCT table |
$table_id = 'product'; .... require "classes/$table_id.class.inc"; $dbobject = new $table_id; $result = $dbobject->insertRecord($_POST); if (!empty($dbobject->errors)) { ... handle error here ... } |
createCustomer() | insert a record into the CUSTOMER table |
$table_id = 'customer'; .... require "classes/$table_id.class.inc"; $dbobject = new $table_id; $result = $dbobject->insertRecord($_POST); if (!empty($dbobject->errors)) { ... handle error here ... } |
createInvoice() | insert a record into the INVOICE table |
$table_id = 'invoice'; .... require "classes/$table_id.class.inc"; $dbobject = new $table_id; $result = $dbobject->insertRecord($_POST); if (!empty($dbobject->errors)) { ... handle error here ... } |
payInvoice() | insert a record into the PAYMENT table |
$table_id = 'payment'; .... require "classes/$table_id.class.inc"; $dbobject = new $table_id; $result = $dbobject->insertRecord($_POST); if (!empty($dbobject->errors)) { ... handle error here ... } |
Here you should see that the only difference in the blocks of code in the right-hand column is where it assigns a value for $table_id
. This is actually done inside a separate script which I call a component script. Each task in the application will have one of these scripts, and while new Model classes can be created at any time to deal with new database tables, the reusable Controllers are predefined and supplied by the framework. They are all documented in Transaction Patterns.
In Shawn McCool's article he highlights several areas where he considers the implementation of the Active Record pattern to be sub-optimal, and these are dissected below.
This is not true in my implementation. I have combined Martin Fowler's Active Record pattern with his Table Module pattern to provide an object that can handle any number of rows in a given database table. An entity is therefore a database table and not just a single row in that table. Some operations deal with a single row while others can handle several rows. A SELECT query can return any number of rows, so the software should be able to deal with any number of rows. My implementation can handle any number of rows due to my decision to have a single property for all table data.
He opens this section with the following statement:
The Active Record pattern attempts to deliver extreme value through applying extreme coupling between relational database schema and application use-case implementation.
His idea that this pattern results in "extreme coupling" between the database and the application is a misuse of the term Coupling which can have different strengths varying from tight to loose. In software development the term can only be used to describe how different modules within the application interact, how one module calls another using its API. If a Model has method signatures (APIs) which contain table and/or column names it means that these methods can only be used by other components to communicate with just that one Model. This means that if the table's structure is altered, such as by adding or removing columns, there may have to be corresponding changes to method signatures. This is known as the ripple effect and is a sign of tight coupling which is supposed to be bad. If the methods are generic in that they do not contain any table or columns names and the same methods can be used on any Model, then they are polymorphic methods which can be used with any Model. As such any changes to a table's structure do not require corresponding changes to any method signatures in any Models or Controllers so they do not suffer from the ripple effect. This is a sign of loose coupling, which is supposed to be good.
The AR pattern does not dictate how the method signatures should be constructed, so whether the coupling turns out to be tight or loose is entirely down to the implementor.
Because these entities represent database records, object modifications are 1:1 mirrored to database record modifications.
This statement is absolutely correct. When writing a database application to maintain the data on numerous entities it would be a good idea to mirror the way in which the DBMS maintains that data. It splits the data into tables and columns, and each table can be manipulated using the same CRUD methods.
$data = new Data; $data->a = 5; $data->b = "yyz"; $data->save(); // results in immediate and synchronous execution of an INSERT query
Here I can see several glaring mistakes. Why isn't he using the insert()
method which was mentioned in Martin Fowler's description of the Active Record pattern? Where is the validation being performed?
This may be the way that Shawn McCool was taught to write code, but I would never use this approach in a month of Sundays. Why not? Because it automatically produces tight coupling. I can achieve exactly the same outcome using a different technique using wrapper methods which produces loose coupling.
The method I use does not have these issues and is far more flexible:
$dbobject = new $table id; $fieldarray = $dbobject->insertRecord($_POST); if (empty($dbobject->errors)) { $dbobject->commit(); } else { $dbobject->rollback(); }
This code exists in the Controller for an ADD1 pattern where the contents of $table id
is supplied from a component script and can be any table in the database. The insertRecord() method performs the load()
, validate()
and store()
with a single call, but will only update the database if there are no validation errors.
As these objects represent database records:
Incorrect. An entity represents an entire table, not just one row in a table. While it is true that some operations deal with one record at a time there may be other operations which deal with multiple records.
- With few exceptions, changes to these entities cannot be made in isolation from database changes.
- The inverse, with few exceptions, schema or semantics changes to the backing database tables cannot be done in isolation from changes to the entities.
While it is true that each class which handles the business rules for a table should be kept synchronised with the structure of that table, how that is handled is up to the skill of the individual implementor. It is important to note that the structure of the table itself - the columns which it contains and the type and size of each column - forms part of those business rules. There are two ways in which a table class can be kept synchronised with the structure of the table which it represents:
Option #2 was a logical choice for me as I had already created a similar procedure called COPYGEN way back in the 1980s in my COBOL days. However, for my RADICORE framework I required a much more sophisticated solution so instead of a quick-and-dirty program I designed and built my own Data Dictionary which contained support for the following activities:
Later on I added a 4th option to create the scripts for individual tasks. This was made easy by the fact that I had already created a library of reusable Controllers, so all I had to do was create a process where the user could link a Transaction Pattern with a database table and press a button to generate the necessary scripts as well as performing the associated database updates.
This section contains the following statement:
There are a few flavors of implementation. In one approach, all database columns must be specified within the entity class. In another, column listings aren't used, and by convention if a public field is written to, the ORM will attempt to add it to generate queries.
Here he is identifying two choices when in fact there is a third - instead of having a separate property for each column defined within the class I have a single property for all table data. In this way I do not have to handle an entity's data one column at a time, I can insert or extract ALL the data in a single array with a single standard method call. I can change a table's structure at any time to add or remove columns, and I don't have to change any method signatures which means I don't have the ripple effect which is an unavoidable consequence of tight coupling.
As well as having a single $fieldarray property to hold all the data I also have a $fieldspec array which holds all the specifications for the columns within that table. By doing this it has enabled me to create a standard validation object which can validate all user input before it gets written to the database. Custom validation rules can be added to a Model class using any of the predefined "hook" methods which are a consequence of implementing the Template Method Pattern.
As for using that abomination called an Object-Relational Mapper (ORM), these are the biggest cause of inefficient SQL queries on the planet. As far as I am concerned any database programmer who does not know SQL is as useless as a web programmer who does not know HTML. It is easier to learn how to write efficient SQL queries and write them yourself than is is to learn how to get your ORM to write them for you.
For Active Record implementations that do not require column listings, all fields become a part of the entity's public API. A database schema update can modify an entity's public API without deploying a code change
This does not compute! If a database schema update causes a corresponding change to a public API then it MUST cause a code change in every place where that API is used. If a table class contains a separate property for each column then any change to a table's columns must have a corresponding change to that list of properties. This is called the ripple effect which is an unavoidable consequence of tight coupling.
This section contains several statements which I find questionable, such as:
Data access occurs through publicly accessible properties. These are usually wired through "magic catch-all functions" to access data that is stored in a protected array.
There is no rule which says that each table column MUST have its own property in a table class, so I do not follow that rule, and I do not encounter the problems faced when following that rule.
Making the entity's data directly accessible through public properties prevents the leverage of encapsulation as a consistency mechanism.
I do not understand what is meant by that statement. The use of getters and setters has nothing to do with either encapsulation or data consistency.
Idiomatically, features are implemented by running entities through service objects that manipulate their fields. Much of the use-case behavior is implemented externally to the entities in these service objects. These models are often called "anemic models" or "data models" because they contain little logic of their own.
If you think that the AR pattern forces the creation of anemic domain models then you are very much mistaken. Such models have data but no behaviour, but Martin Fowler's description of this pattern specifically states that An object carries both data and behavior
. The RADICORE framework contains a mixture of Models, Views, Controllers and DAOs as shown in Figure 1. It is only the Models, being stateful entities, which contain business rules which can change that entity's data. All other objects are stateless services which are devoid of business rules and which cannot affect the consistency of an entity's data.
Without encapsulation, there is not a clear boundary between the inside and the outside of the entity. In order to comprehend and safely refactor an AR entity, it is not enough to look at the entity's code. Instead, one must audit all components which access these public properties. Only this provides the necessary context to safely modify the entity.
Encapsulation is the act of bundling together an entity's data and behaviour into a single class, and this is what the AR pattern does. There are no business rules in any of the other stateless service objects, so there most definitely is a clear boundary between entities and services.
Application code which directly couples to the database through these entities is directly coupled to the database schema.Any application code that interacts with the leaked relational schema is now coupled directly to the database structure. This includes use-case implementations whether they be simple CRUD controllers in smaller applications or more well-defined service layer interactions.
The idea that that having the relational schema "leaked" into a Model "couples" that Model directly to the database structure I consider to be an example of bad phraseology. The term "leaked" implies that it is an accident which should not happen. The term "coupled" should only be used when describing how one module interacts with another through its API. The DBMS, just like the web browser, is not a module within the application, it is an external entity.
The fact that each Model has intimate knowledge of the structure of the database table which it represents is not something to complain about as it does not cause a problem - at least not in my world. That knowledge has to exist somewhere within the application otherwise it would not be able to communicate with those tables, and putting it somewhere other than in the Models would be the cause of problems.
I disagree that CRUD controllers can only be used for simple use cases, and that anything more complicated requires interactions with separate service layer components. It has been my experience that every use case in a database application starts off with the same basic functionality - it performs one or more CRUD operations on one or more tables. A simple use case can become more complicated with the addition of extra business rules. How these business rules are defined and then processed is an implementation detail which is down to the individual developer. Personally I have found the use of the Template Method Pattern and "hook" methods to be very effective. I don't have a separate service layer as all business/domain knowledge exists within the business/domain layer. All standard behaviour is inherited from an abstract table class which means that each concrete table subclass need only contain the processing which is unique to that table.
In normalized databases, relationships between entities are specified using foreign keys. The type of relationship is determined by the location of the foreign key and whether the related entity comes singularly or as a collection.
Relationships in a database are simpler than that. A table is initially defined as a stand-alone entity which means that it can be operated on directly without being forced to go through another table. Each relationship is between just two tables - the parent and the child - where the child has a foreign key which links to the primary key of the parent. Two variations are supported:
Note also that a parent table can be related to any number of child tables, and a child table can be related to any number of parent tables. A table can be the child in one relationship and the parent in another. Note also that a table can be related to itself, or it can have more than one relationship to the same table. If there is a hierarchy of relationships, what may be termed as a "collection" or "aggregation", this cannot be defined in the database structure - each table is only aware of its immediate parents or children, not its grandparents or grandchildren.
To mirror this concept, Active Record entities define relationships by specifying the relationship type and which entity class will represent the related type.
In my implementation no table class contains any code to deal with any relationship, it merely contains metadata which identifies what relationships exist. This metadata is held in two arrays:
It is the framework itself which processes the contents of these arrays as and when necessary. For example, when a record is deleted part of the standard validation is to iterate through the $this->child_relations array and take action according to the value of type:
Access to these entities is typically implemented using the same direct property access pattern that's used for modifying fields.foreach ($invoice->lineItems as $lineItem) { $lineItem->description = "New description"; }
Just because this is the way in which it is typically done does not mean that it is the way in which it must or should be done. No database ever forces you to go through a parent table in order to access a child table, so why on earth should you build such a mechanism into your software? A table may have more than one parent, so which table should you go through? What happens if you want to access a table directly without going through any of its parents? In order to deal with each possibility you should recognise that each possibility requires its own use case which, in my implementation, is built on a different Transaction Pattern. This provides a multitude of solutions:
Note also that I have additional patterns to deal with many-to-many relationships.
Anywhere that you have access to an entity, you have access to its relationships. Since each entity serves as a locator for other entities, they function similar to service locator pattern, in which it becomes easier to include dependencies without concern for the design and more difficult to audit them.
This is only true if you deliberately build into an entity the mechanism to access other entities which are related to it. There is no rule which says that you should do that, so why do that and then complain about it? I have never used an entity in my software as a locator for other entities because I never have to use a table in a database as a locator for other tables. I write my software to mirror how the database works, and no database works that way. By avoiding the use of an artificial construct I avoid the problems that the construct produces. Trying to match something which is artificial with something which is real does not sound like a good idea to me.
Database normalization often inverts the flow of knowledge.
Only if you code it that way.
An Invoice object contains a collection of line items. The invoice holds the reference to the line items. The line items do not have a reference to the invoice.
In a normalized database, this is reversed. Line items hold references to the invoice and the invoice itself is unaware of the line items.
This is not an "either-or" situation. In my implementation it is possible to make any table aware of any relationships with other tables. The Invoice
class has an entry in the $child_relations array for the lineItem
table, and the lineItem
class has an entry in the $parent_relations array for the Invoice
table. Note that there is no processing within any table class to handle any of these relationships as that can be done by framework code as and when necessary.
These two models exist to solve for different concerns. The purpose of the normalized database is to efficiently store related data. The purpose of the code is to maintain encapsulation and manage coupling through knowledge.
You are misusing the word coupling again. The purpose of each table class is to maintain the data on its associated database table by issuing SQL queries using validated data, and for this to happen without any issues the two must work together in unison. The table class is given data which it processes according to its internal business rules, then it sends that data, via the DAO, to the database where the results are stored. The fact that one computes results while the other stores them does not signify different concerns, but two parts of the same concern, the same user transaction.
By inviting the semantics of the normalized database schema into our object models, we've traded away software design principles that enable components to evolve independently.
But why should your software component evolve independently from the database table which it represents? Each table class should be built to handle all the business rules for a single database table, so why would you want to change one of these and force it to be out of step with the other? The table class should always be kept synchronised with the table's structure otherwise problems will appear. You cannot add or remove a column from either the software or the database without making a corresponding change in the other otherwise the generated SQL query could fail. The ability to "evolve independently" is nothing more than a red herring as they must always evolve together.
Rarely do developers make the choice to trade away the benefits of encapsulation.
That depends on their interpretation of encapsulation. In my world it means The act of placing data and the operations that perform on that data in the same class
where I have a separate class for each database table. This means that I do not have a class which is responsible for more that one table, nor do I split a table's data or operations across multiple classes.
Database normalization concerns are leaked into the application's object models and results in multiple opposing idioms, one in which we isolate and manage knowledge dependencies and one in which we turn the object inside out and expose its inner-workings to its environment.
There you go using that word "leaked" again. If a plumber installs pipes that leak then surely he is at fault. If a programmer writes code that leaks then surely he is at fault. There is no rule which says that you have to write code which causes such problems, so I humbly suggest that you change tack and start writing code which solves problems instead of creating them.
Try this: Create a new project and add some business functionality intentionally without any persistence concerns. Write it using object-oriented programming. Once it's all done and tested, create a repository interface and write an implementation.
In my experience that would be a recipe for disaster, so it is not something that I would ever attempt. You cannot possibly write and test code which writes to a database table without having a physical database to write to in the first place. That would be like teaching someone to swim without putting them in the water. I always start with a database design which has resulted from an analysis of requirements, then I build the components to work with that design. In that way I can detect any problems as early as possible. If the database needs to change, then I change it and use my Data Dictionary to keep the software in sync, as explained in The Code is the Database is the Code.
I assert that coupling is directly related to the cost of change. If coupling between two components is non-existent, then we're not accruing additional costs related to a change to the other.
If the coupling between two components is non-existent then there is no call from one of those components to the other, there is no interaction between them. In this case you can change one of the components without having any effect on the other. If there is coupling then it can be classified as either tight or loose, where loose coupling is preferred as it does not produce the ripple effect when a change to one component causes a corresponding change to the other.
The more coupling between components, the more possibility that changes in one will impact others.
In software coupling only exists when one module calls another. If you make a change to an API (a method signature) then you are forced to make a corresponding change in every place where that method signature is called. This is known as the ripple effect which you should take steps to minimise as much as possible. Not only is coupling affected by the complexity of each method call, it is also affected by the number of method calls. The more method calls you have to make in order to obtain a particular result then the more code you will need to change to deal with the ripple effect.
Additional design and development concerns don't only exist once, they exist whenever the relationships between the two components must be considered. In this way, more highly coupled components are likely to become more expensive to change as a function of their coupling. The cost is likely to accrue repeatedly throughout the future change of these components.
The risk of increased costs rise as these components become additionally coupled to others. Limiting coupling can limit cost.
If you look at Figure 1 you will see a simplified view of the RADICORE architecture which shows the interaction (coupling) between the various components. The various method calls are as follows:
There are a large number of Controllers and Models, but smaller numbers of DAOs and Views, so the total number of method calls in the application can run into the tens of thousands. In order to make any changes as easy and inexpensive as possible the aim should be to avoid tight coupling in favour of loose coupling. I have achieved this by taking the following steps:
In this way I have managed to standardise all method calls and make them as loosely coupled as physically possible. Over the years I have modified numerous framework components but with minimal impact on application components.
If I were to create Models with unique methods I would have to also create separate Controllers for each Model in order to call those unique methods. Each Controller would then be tightly coupled to a single Model. By having unique methods I would loose the power of polymorphism and loose the ability to employ Dependency Injection.
One of the most common changes I make to application components, apart from changing the business rules which are confined to the various "hook" methods, is to change the structure of a database table. While this may be a daunting and expensive exercise for some, for me it is a walk in the park. After changing the database structure I import the changes into my Data Dictionary and then export them to the application. I do not have to change any APIs, nor do I have to change any class properties. The only time I may need to change any Models is if a changed column is referenced in any "hook" methods. If any HTML forms are affected then all I need do is change a small screen structure script.
Coupling is necessary for producing value. It's possible to decouple components to a degree in which we lose value. Imagining and implementing models of cohesion that balance this concern is a critical and central aspect of software design.
This is why in my design I have taken steps to maximise my levels of cohesion and to minimise my levels of coupling. I have achieved this using my own version of "best practices".
The idea that you can decouple components is completely wrong. If ModuleA calls ModuleB then they they are coupled, with the only variation being whether the coupling is tight or loose. The only way to remove the coupling is to remove the call, but then the code won't work. When some people use the term "decouple" what they actually mean is calling polymorphic methods using Dependency Injection. This simply means that you call a known polymorphic method on an unknown object where the identity of that unknown object is not supplied until runtime. Whichever way you look at it if you call a method on an object then you are coupled to that object whether you like it or not.
My assertion is that the penalty increases with the addition of coupling between the database and the application code in inverse proportion to the code's cohesion.
Your assertion is incorrect. There is no coupling between the database and the application code as coupling only exists when one module/component calls another. The application contains code which communicates with the database by constructing and issuing SQL queries, and that code may reside inside the same component as the business rules, the Model, or it may reside in a separate component, the Data Access Object (DAO).
Coupling can only exist when one module calls another, such as when a Controller calls a Model, or a Model calls the Data Access object. If there is no inter-module call there is no coupling. This coupling can either be tight or loose, where loose coupling is considered to be better. If you have a separate Controller for each Model then those two are tightly coupled as changes to any APIs require changes in both components. In the RADICORE framework there is no tight coupling as any Controller can be used with any Model, and there is no ripple effect when a table's structure is altered as none of the APIs, as shown in common table methods, refer to any columns by name.
Cohesion relates to the contents of a module, the functional relatedness of the contents of a module. If the functions with a module are not related, or functions which are related are not in the same module, then that is an example of low cohesion. The RADICORE framework achieves high cohesion by grouping functionality into the modules shown in Figure 1. Each of those modules has a specific purpose, and all the functions related to that purpose exist within that module.
In a scenario in which highly cohesive components are coupled, and component A needs to change, there is a higher likelihood that component B needs to be changed for reasons of essential complexity.
In a scenario in which components which have low cohesion are coupled, and one needs to change, there is a higher likelihood that the other will need to be changed for reasons of accidental complexity.
If a change to component A requires corresponding changes to component B, this is called the ripple effect and is caused by tight coupling. It has absolutely nothing to do with the levels of cohesion which describes the contents of each of those components and not how they interact. The ripple effect can only be avoided by having loose coupling
He is also using terms such as "essential complexity" and "accidental complexity" without explaining the difference. I may be naive, but surely if there is any complexity in the code it is because the person who wrote the code put it there. If you follow the YAGNI principle then you only add complexity when it is actually needed. If any complexity suddenly appears by accident it is still down to the person who wrote the code.
This is one of the many reasons why cohesion is such an import aspect of boundary design. Components within a boundary are more highly coupled than components outside the boundary.
This statement makes no sense. If the contents of a component are functionally related then that component is said to have high cohesion. If that functionality is spread across multiple components then those components exhibit low cohesion. A boundary is what separates one component from another, and if one component calls another then that boundary is being crossed and the two components are coupled. This indicates that when related functionality is spread across a number of components with low cohesion there will be an increase in the volume of inter-component calls and therefore an increase in inter-component coupling. The volume of coupling does not contribute to the ripple effect unless that coupling is tight instead of loose. The way to reduce coupling is to produce a small number of components with high cohesion instead of a large number with low cohesion. The way to reduce the ripple effect is to make the coupling as loose as possible.
The RADICORE framework produces coupling which is as loose as it could possibly be, which enables its pre-built Controllers and Views to be linked with any Model by using common table methods which are inherited from an abstract class.
Reducing coupling between components with low cohesion is an effective way to reduce paying additional costs because of accidental complexity.
Reducing the volume of coupling is not the objective - it should be making the coupling as loose as possible. Dealing with low cohesion is not the objective - it should be making the cohesion as high as possible. If you build complexity into your software by designing components with low cohesion and high coupling then this is not accidental, it is deliberate.
Active Record encourages high coupling, low cohesion through a combination of direct property access and relationships.
I disagree entirely, for the following reasons:
AR entity relationships are entities. All entities have active database handles and can execute queries.
Here the author is mixing up his terminology and tying himself in knots. In relational theory a "relation" (a word which I never use) is known as a table while a "relationship" is a link between two tables. An Active Record (AR) entity is not a "relationship", it is a "table". Each class constructed using the AR pattern exists in order to represent the needs of a single database table. If that table has relationships with other tables then the details of those relationships can be stored in the class as nothing more than metadata. This metadata can then be handled by code in the framework and not custom code in each class.
So where do the business rules live?
The business rules for each table should exist nowhere but within that table's class, as described in Martin Fowler's definition. No data should be added to a database table until all the business rules for that table have been passed. Note that the table's structure - the size and data type for each column - is part of those business rules, so it is imperative that each class has accurate details regarding the structure of the table which it represents. In this way all user input can be checked to ensure that when an SQL query is constructed and executed it will not fail because a value for a column is not consistent with that column's specifications in the database.
In my framework I deal with data validation in two different ways - Primary Validation is handled automatically by the framework using the built-in validation object while Secondary Validation is handled using "hook" methods with each table subclass.
Some programmers think that all data should be validated BEFORE it gets loaded into an object. This is wrong as it would violate the principle of Encapsulation which states that ALL the data and ALL the operations which act on that data should be in the SAME class. Putting some of the operations (business rules) in other classes would violate this principle.
The article contains code samples which generate various complaints, but I won't duplicate that code here, just my criticisms of that sample code.
In this example, the Invoice object has reference to the line items. This is the inverse of the relational schema that is used to store the data.
In the DBMS the existence of a parent-child (one-to-many) relationship is indicated by having a foreign key on the child table whose column(s) point to the primary key column(s) on the parent table. There is nothing in the definition of the parent table which indicates that this relationship exists. In the software it is necessary to go through the parent object in order to provide the foreign key for the child, but it is the implementation of the "go through" part which most OO fanatics get wrong. They implement this by going into the parent object, and while still within the parent object have custom code which accesses the child object. The more child objects there are the more custom code is required.
This is not how it is done in RADICORE. I do not need any code in any parent object to handle the communication with a child object. Instead I go into the parent object, obtain the primary key, then leave that object. I then convert that primary key into the foreign key for the child, then I call the child object with that foreign key. This is handled by creating a user transaction using the LIST2 pattern. I can use this pattern for every parent-child relationship in the application without inserting any special code into any parent object. All that is required is an entry in the $child_relations array for the parent object. This information is extracted from the Data Dictionary and passed to the parent object via the table structure file.
In order to deal with the relationship between the Invoice
object and the lineItem
object I would create an instance of the LIST1 pattern to deal with the Invoice
object and an instance of the LIST2 pattern with the Invoice
object as the parent and the lineItem
object as the child. When I activate the ADD2 task from within the LIST2 screen that will automatically pass the correct foreign key to the lineItem
object.
Some OO fanatics will assume that it is wrong to create a row in the Invoice
table without any corresponding rows in the lineItem
table, but that is not how databases work. The Invoice
table contains a status
value that has values such as "In progress", "Complete", "Authorised", "Rejected" and "Paid". There are two business rules within the Invoice
object:
lineItem
rows.lineItem
rows.Invoice
cannot have more than one discounted lineItem
would therefore go in the lineItem
class when attempting to add a discount.When deleting a row from the Invoice
table there are two possible rules which can be enforced using the value for type in the $child_relations array:
It is important to note here that no table class contains any code to deal with any relationships in any way. All it has is metadata which identifies those relationships, and this metadata is processed by the framework as and when necessary.
The author then asks three questions about potential problems the various sample implementations:
- The idiom of Active Record perceives all entities to be equivalent. None are children and none are parents. Will your developers know in which cases you broke with idiom in order to guard business rules in a model?
In the RADICORE framework all tables are equal in that they are all subject to the same CRUD operations as every other table, which is why they all inherit from the same abstract class and implement the same set of common methods. If any relationship exist then entries will appear in one or both of the $parent_relations or $child_relations arrays. Particular relationships can be dealt with by creating tasks from particular Transaction Patterns. The only difficult part is choosing the right pattern. Note that you may process the same Model with as many different patterns as you see fit.
- Would you even guard the business rule in a model, or maybe would you make a separate service object which represents the behavior of "adding a line item" which itself has the rule?
The business rules for each table always go into the Model which deals with that table. Each table will have its own family of tasks, so "Add a lineItem" is a task just like "Add an Invoice". Parent-child relationships are handled by the framework with specific Transaction Patterns, so no custom code is required in any Model. The idea of putting business rules in service objects does not fit Martin Fowler's description of the AR pattern.
- Will your developers have the system knowledge and the discipline not to implement code that bypasses a single method that guards business rules?
When business rules are inserted into the correct "hook" methods they will be automatically processed by the framework. The developer does not have to write code to call any of these methods, so these methods cannot be bypassed in any way.
Once a developer has completed a program it is up the the analyst who produced the program specification to test that the program contains a proper implementation of that specification.
Business rules for a single entity can be validated by that entity at assignment or before save. In this way, the rules related to an entity's properties are located cohesively within the entity.
This is actually correct. I also note the phrase located cohesively within the entity
which means that you agree with me that all the business rules for an entity should be defined within the class which is responsible for that entity. This is different from some people who tell me that, according to their definition of the Single Responsibility Principle (SRP), each business rule provides a different "reason for change" and therefore should be in its own class.
- AR entities are responsible for ONLY their own consistency.
- They have unrestrained control to modify database state.
Point #1 is correct.
Point #2 needs to be qualified by saying that each AR entity can only change the state of that entity provided that all business rules are obeyed.
Aggregates are nothing more than collections of parent-child relationships, and each individual relationship is handled in the same way as described in Referential Integrity.
Consistency larger than a single model (aggregate consistency) is not part of this design paradigm. It must be added by convention by the engineering team and the convention must be followed by each member in all circumstances.Rather than being able to model the system to ensure consistency, it must be implemented by convention.
It is up to each team to decide how best to deal with relationships between tables. You can either have specialised code within each Model, with specialised methods, or you can do what I do and use a framework that provides standard mechanisms for dealing with relationships. The way that the framework deals with different relationships then becomes the "convention".
When designing object-oriented software, entities often arrange themselves into natural hierarchies with parents who manage consistency across other objects. These aggregate "root" entities (like Invoice in our example) have a natural hierarchical position above their children (Line Items).
As explained in Referential Integrity I do not have custom code in any parent class which is responsible for reading records from a child table. Instead I have specialised Controllers which deal with each relationship as and when required. Imagine a collection of tables as shown in Figure 3:
Figure 3 - a compound "order" object
Some people may think that this collection represents a single entity, an Order, therefore would require a single Model class in order to deal with the relationships between each of the components. They may even think that this single Model requires a single Controller to deal with all the different relationships and all the different user transactions (use cases). That is far too complicated for me. Each of those 11 entities is a separate table in the database with its own structure and its own business rules, which is why I always create a separate Model class for each table and then build a number of tasks (use cases) to manipulate each Model. If there is a parent-child relationship between two tables, and it should not be possible to manipulate the child without going through the parent, then I create a task (use case) based on a Transaction Pattern which deals with that relationship. Each of the 11 tables has its own family of forms as follows:
The LIST1 pattern can be activated from a menu button. In does need any selection criteria to be supplied in the $where
string, so can be used to display the entire contents of the table using whatever default ORDER BY has been specified.
The LIST2 pattern can be activated from a navigation button within the LIST1/2 screen for the task which deals with the parent/outer entity. One or more entries on the parent screen must be selected so that their primary keys can be passed down to the LIST2 screen. The remaining ADD, ENQUIRE, UPDATE, DELETE and SEARCH tasks can be activated from a navigation button within the LIST1/2 screen for the same family.
The ADD1 screen, which is activated from a LIST1 screen, does not need anything passed to it in the $where
string. The ADD2 screen, which is activated from a LIST2 screen, does require a non-blank $where
string as it is used to attach a child record to its parent. For example, you cannot add an ORDER-ITEM to an ORDER-HEADER unless you supply the primary key of the ORDER-HEADER record.
While I can create an entry on the ORDER_HEADER table at any time I cannot create entries on the other 10 tables without specifying the foreign key to its immediate parent. In this situation I start at the LIST1 task and then use the relevant navigation button to make my way down the hierarchy of tasks until I reach the one that I want. Note that a valid order need only contain one or more order items, all the other tables are entirely optional.
When writing object-oriented software, this relationship manifests naturally.
The idea of an object which is an aggregation of several objects which should be treated as a single entity with a single controller is an alien concept to me, just as it is alien to every DBMS that I have ever used. All I see is a collection of independent tables which are all subject to the same set of CRUD operations and which all require their own set of tasks (use cases) to perform those operations. When some of those tables exist in a parent-child relationship all that is necessary is to provide a value for the foreign key on the child table. Each table class contains the code to deal with its own business rules while the code to handle parent-child relationships is handled by the framework.
When writing software with Active Record, the naturally occurring and logically coherent approach is entirely bypassed.
Which logically coherent approach is that? I have seen several examples of how other developers have dealt with relationships, but I have never seen any which I would call either "logical" or "coherent", which is why I invented my own.
Instead, each line-item is able to be individually queried, manipulated, and saved outside of the context of its aggregate root, invalidating the root's ability to maintain the consistency of the aggregate.
That is precisely how a relational database works. Each table is a separate entity with its own set of business rules and subject to its own set of CRUD operations. The idea of having to go through an aggregate root in order to access a member of that aggregation does not exist in any DBMS, so why should it exist in the software? All the DBMS requires is that for each parent-child relationship you provide valid values for the foreign key on the child, and how this is achieved is an implementation detail which will be handle differently by different developers. In my world this is handled by the framework, as described in Referential Integrity.
Because the Active Record architectural pattern has no answer for maintaining aggregate consistency, it is poorly suited for systems which benefit from consistency constraints.
Just because the AR pattern does not specify how referential integrity should be done, just as it not not specify how each SQL query should be built, does not mean that it cannot be done. Remember that this is a design pattern which does not specify any particular implementation, so how it is implemented is entirely up to the individual developer. Some will want to have special code within each table class while others will have standard and reusable code in their framework.
Rather than letting individual developers build user transactions using their own conventions, they should use a framework, such as RADICORE, which handles standard processing in a pre-determined and consistent fashion, thus leaving the developers to concentrate their efforts on the business rules. Dealing with aggregations is no more complex than dealing with a number of parent-child relationships, and each parent-child relationship can be dealt with using the relevant Transaction Pattern, as described in Referential Integrity.
Domain modeling is the practice of writing code that matches a conceptual model in structure and function. By creating software with the same shape as our understanding of the domain concepts, when a change occurs in our concepts, we benefit from the change in the software being of proportionate size.
- A small change in our domain understanding results in a relatively small change in the domain model.
- A large change in our domain understanding will result in a larger change in the domain model.
This leads to beneficial outcomes because large conceptual changes generally always require large changes to software.
I agree so far. The domain in which I operate is that of applications which use HTML forms at the front end and a relational database at the back end, with software in the middle to handle the business rules. That means that every entity in the middle Business layer models a single database table. By having a separate class for each table it means that if there is a change to either the structure or the business rules for that table then you only have to change that one class. Adding or removing tables is not a problem. Adding or removing columns is not a problem. Adding or removing relationships is not a problem.
But small conceptual changes may require even massive changes to code, depending on its structure. Transaction scripts are notorious for needing expensive changes to account for small shifts in understanding.
No sensible OO programmer should ever use a transaction script. The main idea behind OOP is that you break the processing down into different components with their own responsibilities similar to what is shown in Figure 1. This increases the amount of reusable code at your disposal and therefore decreases the cost of maintenance.
Domain modeling requires tools that enable the creation of software in the shape of concepts. Some programming languages offer more tools than others.
As I recognised that I would need a separate class file for each table I devised a mechanism for creating those files using data obtained from the database's INFORMATION SCHEMA, similar to what I had done 20 years earlier in my COBOL days. I later enhanced this Data Dictionary with the ability to generate tasks (use cases) for each table from my library of Transaction Patterns.
Object-oriented domain modeling relies on objects serving as much as possible a single master "representation". They exist to represent a concept. They do not need to compromise with other concerns such as persistence. They rely on idiom such as object-oriented encapsulation to ensure expressive components and consistent states.
The AR pattern still requires the use of standard methods which match the different CRUD operations that can be performed on each table, but communication with the physical database can be carried out by utilising a separate Data Access Object, as shown in Figure 1. This enables the choice of DBMS to be switched very easily without required any changes to the table class.
By removing object-oriented idiom such as encapsulation, overriding constructors, and by otherwise forcing persistence concerns into each object, preventing it from serving a single master, it becomes a poor tool for domain modeling.
Active Record is a poor fit for domain modeling as its primary emphasis is on persistence concerns, rather than representation. It removes the ability to create expressive representational models and the benefits that come with that.
I disagree completely. You still encapsulate all the business rules, the properties and methods, for each table in a single class. There is no requirement to modify constructors in any way. Although there are methods which mirror the CRUD operations, the generation and execution of SQL queries should be handled in a separate DAO component. Each table class still handles all the business rules for that table, so still acts as the master representation for that table.
What is the term "expressive representational models" supposed to mean? Each entity in the Business/Domain layer is supposed to represent an entity in the outside world, and in a database application that entity is a database table. How the contents of a table are presented to the user is not the responsibility of objects in the Business/Domain layer as that is what the separate Presentation layer is for.
The Active Record architecture reinforces itself. Injecting anemic data models without aggregate consistency boundaries into a system has an impact.
This is nonsense. The AR pattern does not provide a complete architectural solution on its own, it provides one method of creating the domain models in the structure shown in Figure 1. This is a mixture of the 3 Tier Architecture and Model-View-Controller design pattern. This structure breaks an application down into groups of modules which provide high cohesion.
The AR pattern does not advocate the use of anemic data models as they are supposed to contain both data and behaviour. Nor does it prevent aggregate consistency from being performed in an effective manner as that is an implementation detail which is the responsibility of the individual developer. Anemic data models are those which contain data but no behaviour. Anyone who says that the AR pattern promotes the creation of such models is mistaken as the description of this pattern specifically states that it should have methods which correspond to the CRUD operations which can be performed on any table as well as having properties (or in my case a single property) to hold the data.
Tightly coupling database schema to all application use-cases has an impact.
There is no such thing as tight coupling with the database schema
, as discussed in High Coupling, Low Cohesion. It is logical (at least to me it is) to have one entity in your software for each entity in your database where the structure of the two are always kept synchronised. Difficulties arise when the two are NOT synchronised as that would involve the use of an Object Relational Mapper (ORM). Tight coupling is produced when you create method signatures which are not polymorphic and have to be changed whenever you change the table's structure. The trick is to use method signatures which are generic and which do not have to be changed, as shown in A single property for all table data.
Ill-fitting solutions introduce accidental complexity. This complexity introduces friction. We attempt to mitigate the friction by applying additional solutions. This cascade of reactionary design (being architectural) has a significant impact on the system.It's sometimes difficult to identify poorly performing patterns when they're useful to patch up problems caused by other poorly performing patterns.
The trick here is to not use Ill-fitting solutions. You should take care to use design patterns which are appropriate and implement them in an efficient manner. Using the wrong pattern does not indicate a fault with that pattern, just your choice of which pattern to use. Creating a poor implementation of a pattern does not indicate a fault with that pattern, just your method of implementation.
Injecting anemic data models into an application generally results in a chain reaction of design pattern and approaches that increases coupling between units of low-cohesion. It's a slippery slope.
Then don't create anemic data models. If you implemented the AR pattern correctly then none of your models should be anemic.
Unfortunately, once a developer becomes familiar with many of these problems, they become fond of many of their mitigation strategies. Instead of solving the problem at the source, they build a series of mitigation practices including:
- Correctly identifying other people's implementations as accidental complexity and mistaking their own accidental complexity as essential
- Blaming management for not funding refactoring efforts
- When management finally clears the expense for a rewrite, the developers then build the application making the same architectural mistakes that created the original outcome
- Developers struggle to understand why the negative outcomes persist over and over again, often assuming it's a fundamental aspect of capitalism
This has nothing to do with the AR pattern in particular, or any design pattern in general. It is a fact of life that writing cost-effective software is an art, so it requires people with the right artistic talent. It is not something you can be taught, you either have it or you don't. Someone with talent may attempt to document how they do what they do, but simply reading that document will not let you duplicate that talent. A great pianist may write a book called "Piano Playing for Dummies", but if anyone without musical talent studies that book they will still not become a great piano player no matter how hard they try. Just because some "experts" have documented certain principles or patterns which have helped them does not mean that by copying those patterns you will automatically produce software at the same "expert" level. You should look upon these works as teaching you how to think, not what to think. Different experts can often come up with different solutions to similar problems, so instead of blindly copying one solution, and if that doesn't work then blindly copy another, the true artist should study the different solutions to work out the pros and cons of each one, and then try to come up with a new solution which maximises the pros and minimises the cons. If all you can do is copy the work that other people have done you could end up as being nothing more than a Cargo Cult Programmer. You should never be afraid to innovate, not imitate.
Developers who advocate the use of Active Record are rarely familiar with other approaches. They are generally familiar with years of mitigating the problems of this architecture. This familiarity with these problems can lead to underestimating their negative impact and over-estimating the amount of essential complexity that they're facing.
The same could be said for any pattern or group of patterns. Too many programmers are taught that there is only one way that things should be done, there is only one set of design patterns that should be used, and there there is only one implementation for each pattern which is acceptable. Some programmers are forced to use a particular framework because that is the one which was chosen by their employer. I spent the early part of my career working with other people's ineffective and bug-ridden solutions, and after gaining more experience I said to myself "I can do better that that", and that's precisely what I did when I built my first framework.
It's necessary to invest in and master other approaches, otherwise effective comparisons cannot be made.
That is true, but how many companies will let their workers experiment with other approaches? I managed it simply because I taught myself PHP and built my framework in my own time on my own PC to follow my own thought processes. My previous experience with enterprise applications and enterprise frameworks proved to be invaluable, and I was able to spot early mistakes and refactor around them. I have been extending and enhancing this framework continuously for the past 20 years, and this enabled the creation of an ERP package which can now be found at GM-X Application Suite.
Some concepts that are often mistaken:
- Unfortunate focus on reducing "boiler-plate" over maintaining independent evolvability
- Mistakenly assuming that other approaches also share the "profile" of global data graphs, when in reality this approach results in dramatically different systems than most other approaches
- Over-investment in convention over configuration
The ability to replace boilerplate code which is duplicated with standard functions which can be reused is always a good sign and should be encouraged. There are various ways in which this can be achieved. I have had great success with my abstract table class which is inherited by every concrete table class. This enabled me to implement the Template Method Pattern so that each concrete table class need contain nothing but "hook" methods. Because every concrete table class inherits the same set of common methods from the abstract table class this has created huge amounts of polymorphism which has then enabled me to inject any Model into my collection of Controllers using Dependency Injection. My use of an HTML templating engine means that I can create huge numbers of web pages from a small number of templates.
What on earth are "global data graphs"? If you mean an Entity Relationship Diagram (ERD) then why don't you say so?
There is nothing necessarily wrong with convention over configuration provided that you use sensible conventions. For example, my framework uses the following:
classes
directory which is named <tablename>.class.inc.classes
directory which is named <tablename>.dict.inc (because it is exported from the Data Dictionary).What I do NOT do is assume that every table has a primary key field called id. This is explained in Technical Keys - Their Uses and Abuses.
Without experience with other approaches, without understanding the ramifications to the rest of the system, a developer can't effectively judge the impact that an architectural decision makes on their system. One can NOT choose the right tool for the job if one has only one tool.A developer's intuition about unfamiliar techniques is not a replacement for experience. It's important to observe the manifestation of consequences over time.
I had 20 years of experience with enterprise applications and enterprise frameworks before switching to PHP, so I was familiar with what needed to be done as well as what approaches worked better than others. My ability to leverage the opportunity of creating greater volumes of reusable code that the OO features of PHP presented to me allowed me to increase my levels of productivity by a large factor.
It's one thing to say that poorly performing code could be written in any idiom. That's true. It's another thing to encourage it.
A developer realizes that they have access to an entity and need access to a collection of children, so they directly access the collection. No problem.
Another developer, in a nearby section of code has one of those child instances and realizes that they need a collection of related entities, so they directly access the collection. Now we have an exponential explosion of queries.
There are several concepts here which are totally alien to me.
It's possible to profile each request and each process to find this kind of problem. Most devs probably do. Nonetheless, we keep discovering them long after they've been introduced. The pattern of ad-hoc database access through relationships encourages accidents. These accidentals are made more likely by the Active Record idiom.
Preventing the lazy loading of relationships can improve this significantly. Also, some Active Record implementations allow you to define your own queries, which can be used to hydrate models.
If ad-hoc database access through relationships encourages accidents
then the simple answer is - don't do it. I learned as far back as the 1980s that you should only read those records that you actually need at the time instead of those which you think you may need later. The only exception to this rule is when you have a small number of records which you know you will have to access in multiple places, in which case it is more efficient to read those records just once and store them in memory. Programmers do what they are taught to do, and very few have either the time or the mental ability to find better solutions. Even fewer have the audacity to question that what they have been taught might not actually be the best solution.
I also notice that he has not identified another common reason for poor database performance which is the "N+1 problem". This is caused by the fact that by forcing each model to have separate properties for each column all SELECT queries are limited to dealing with only those columns which exist on that table. This automatically prevents the creation of SELECT queries which contain a JOIN, thus being able to read from several tables with a single query. Why is this a problem? Imagine you have a task which lists entries from the ORDER table, and you want it to include the customer's name. This name appears on a separate CUSTOMER table, and there is a foreign key on the ORDER table which points to the relevant CUSTOMER. In order to combine data from two tables you first have to SELECT a number of rows from the ORDER table, then you loop through that collection and for each row you SELECT the name from the CUSTOMER table. So if you read 10 rows from the order table the number of SQL queries which you need to execute is 10+1 as follows:
The way to reduce these SELECT queries to just 1 is to include a JOIN to the CUSTOMER table when reading the ORDER table. However, this means extracting the value for customer_name
from the ORDER entity, but it does not have a property for that column. However, with my practice of having a single property for all table data I can build queries with as many JOINS as I want and extract all that data with a single call to the getFieldArray() method.
Active Record encourages the proliferation of primitives throughout the application.
This is patently untrue. The use of either primitives (scalars) or value objects is a property of the programming language, not the design pattern. PHP has never had any dealings with value objects. It is a simple fact when when data is sent to a PHP script from an HTML form it is presented as an array of strings, not value objects. When data is read from a database it is presented as an array of strings, not value objects. Anyone who cannot handle programming with primitives should not be using PHP in the first place.
Primitives are poor representatives of domain values. They force behavior into outer scopes which introduces awkward external implementations of domain algorithms and business rules.Primitives are not able to represent behavioral models.
Nonsense. Primitives have always represented data values, not behaviour. The two are separate and always have been. Databases hold their data as primitives, not objects. HTML forms hold their data as primitives, not objects. It is the software which sits between the two which is supposed to contain the behaviour, the business rules. While the data may constantly change the behaviour is fixed in the program code. If the programming language cannot handle the primitive values which are used in SQL databases and HTML forms then it is not fit for purpose.
It's possible that a line item's "amount" can be stored in an integer field.$lineItem->amount = 1100;This property can be used across the application.<li>Amount: {$lineItem->amount}</li>
So what's wrong with that?
All components that bind to the field 'amount' are directly bound to the database schema, which cannot change unless all code references to this field are changed, since there is no boundary between the database and consumers of the object.
This is a meaningless statement. The term "bind" means to have a physical connection, to become attached, and there is no physical connection between a value in a database, a value in the software and a value in an HTML form. They are not actually the same value as changing it in one place does not automatically change it in all the other places. Each place - the database, the software, the HTML form - has its own copy of the value, and it is the responsibility of the software to move data between the HTML form and the database in order to update that copy of the value. A piece of data may have the same name in all three places, but so what? It would cause more problems if it didn't have the same name.
However, when we decide that our system should support multiple currencies, we must add a currency field. Now, the concept of amount, which was previously represented by an integer, must be represented by multiple data and additional algorithms.
The fact that you now need to store both the amount as well as its currency is not a problem for a competent programmer. These can only be stored as two separate fields in the database, and they can only be represented as two separate fields in the HTML form. The idea that you should combine them into a single value object within the software would then incur the overhead of having to combine them on the way in and split them again on the way out. This is not how I dealt with it in my pre-OO days, so I prefer to stick with a simple solution that has already proven its worth.
The idea that you must now have a value object called $money
which now contains a monetary value as well as a currency code may appear to provide theoretical benefits, but in the applications which I write, which are ERP applications for multi-national corporations, this would still fall far short of being perfection personified. One glaring omission is that it is a requirement to store values in both functional and transaction currencies, and have an option on every screen to toggle between the two. In order to do this I need the exchange rate, but where is that in your so-called "proper" solution.
In case you are unfamiliar with the terms "functional" and "transaction" currencies let me explain. A multi-national corporation may be comprised of a group of companies which operate in different countries with different currencies. My partner company, for example, has its main office in the USA and operates in US dollars, but it also has a subsidiary in Thailand which operates in Thai Baht and another in Singapore which operates in Singapore dollars. They all use the same application and the same database, so these parties are defined as separate organisations with the classification of Functional Unit, each with their own functional currency. Every sale or purchase is therefore between a functional unit and another party which will either be classified as a customer or supplier depending on the transaction type. Note that it is common for each financial transaction to be in the currency of the other party, which is why it is vital to identify both the functional currency and the transaction currency, and if they are different then you will also need to store the exchange rate to convert from one currency to the other. This requires the following columns of the following tables:
Invoice
table:
currency_code_fn
is fixed for the functional unit. Neither currency_code_tx
nor exchange_rate
can be changed once any lineItems
have been added.
lineItem
table:
Invoice
record they cannot be changed for individual lineItems
.
Note that all amounts are always input in transaction currency (unit_price_tx
) and then converted to functional currency (unit_price_fn
) using exchange_rate
. The adjusted_price_tx/fn
is then calculated separately. Note that adjusted_price_fn
is not calculated by converting adjusted_price_tx
using exchange_rate
as this may produce a slightly different amount because of rounding to the required number of decimal places.
It's then probable that we end up with code that calculates the full amount of an invoice like thisfunction invoice_amount(Invoice $invoice): string { $invoiceAmount = 0; $invoiceCurrency = $invoice->line_items ->first() ->currency; foreach ($invoice->line_items as $lineItem) { $invoiceAmount += $lineItem->amount; } return $invoiceCurrency . ' ' . number_format($invoiceAmount / 100, 2); }There are multiple problems with this code, including the fact that we may be adding numbers with different currencies.
Then it is your code which is wrong. You should NEVER allow different lineItem
records for the same Invoice
to use different currencies. By storing the amounts in the two currencies in separate and distinct columns it is therefore a simple exercise to accumulate those two columns from each lineItem
into two separate totals for each Invoice
. That is how I have been doing it for years, so I know that it works.
Active Record actively encourages both primitive-obsession and the ability to bypass object-oriented consistency boundaries.
This is nonsense. The AR pattern does not encourage primitive-obsession as the use of primitives is a function of the programming language, not the pattern. It does not encourage the bypassing of any boundaries as it does not specify which boundaries you should or should not have. It simply provides an abstract design for a solution, not a detailed description of how that solution should be implemented. Different programmers may produce different implementations for the same pattern, but the effectiveness of each solution is entirely dependent on how it was implemented.
Dogmatic Active Record fans want to ignore encapsulation and consistency boundaries while throwing together features.
You must be using a peculiar definition of encapsulation. In my book it means The act of placing data and the operations that perform on that data in the same class.
In this context "data" is the plural of "datum", so encapsulating a single value, a single datum, would appear to be going to far and producing an artificial construct which does nothing but introduce unnecessary complexity. By defining a class which deals with the data for a table as well as containing the methods which act on that data, the business rules, the AR pattern is encapsulating an entity which has behaviour, not a piece of data which does not.
Primitive obsession creates significant refactoring costs.
I would say the opposite. Value objects do not exist in PHP, so there would be an instant overhead in writing extra code to create value object classes, merge multiple primitives into a value object only to split them out again as data passes through the entity. I cannot see any benefits from using value objects, so I cannot justify the effort in creating and using them.
When values are stored as primitives, there's no single authoritative location where this behavior belongs.
Data and behaviour exist in separate places in an object - data is held in properties and is transient, passing in and out, while behaviour is held in code and is fixed in place. When data passes through a entity's object it may pass through several methods, and that behaviour may not be required in every method, just a particular method. The behaviour inside the value object will not be activated until code within the entity's class calls the method on the value object. To me it seems less of a hassle for the code inside the entity to execute the behaviour directly than indirectly via a method call on that value object. Whether the code for that behaviour exists in one place within the entity or in one place within the value object it still exists in one place, as either place is as good as the other.
The negative impact that Active Record has on a system can be reduced by ensuring that data graphs are small and encapsulated.
What do you mean by "data graphs"? Where are they described in any OO manual? How can a data graph be encapsulated when encapsulation is only supposed to cover properties and methods?
The smaller the data model and the fewer use case implementations, the less that the entire system must be modified in lock-step in order to implement change.
The number and size of data models and the number of use cases has no bearing on the number of modifications which may be needed to implement a particular change. That is all down to how well the system has been assembled in order to achieve high cohesion, loose coupling and maximum reusability.
I have used my framework to build a large ERP application which has over 400 table classes, over 1,200 relationships, over 4,000 user transactions (use cases), and all this with a fixed set of 45 reusable Controllers, one for each Transaction Pattern. Every transaction has its own component script which does nothing but identify which Model, View and Controller are required to process that transaction. The same Model can be accessed by any number of Controllers, and the same Controller can be used with any number of models
Reduce the number of relationships available in each model. Avoid lazy loading relationships. If possible disable lazy-loading functionality.
The number of relationships is irrelevant, it is how and when you deal with them which is important. Do you have custom code within each Model, custom code within each Controller, or standard reusable code provided by the framework? The AR pattern does not dictate how this should be done, so it is an implementation detail which is down to the skill of the developer. In the RADICORE framework there is no code within a Model to manage relationships with other tables, or to perform lazy loading, as a parent-child (one-to-many) relationship is handled by a LIST2 controller which addresses both the parent and child tables as independent entities and automatically handles the translation of primary key to foreign key.
Perform all retrieval and storage of "root" entities in repositories (like an Invoice in our example). Discourage retrieval and storage of "child" entities (like lineItems) in order to reduce consistency errors
Seek to implement use-cases using command objects that follow this pattern:
- Retrieve an entity aggregate by querying the root and optimize the query with eager relationship loading.
- Operate ONLY upon the entity aggregate through the aggregate's root entity.
- Persist the aggregate's root entity with a repository to centralize storage logic. Cascading persistence of children. (This can be implemented as manually as you like.)
Cascading saves from a root entity to child entities remains awkward, but can be managed.
As stated in Aggregates I never have code in a parent model in order to access its children. This is not how databases work, so it's not how my software works. I do, however, have a Controller which accesses the parent and child models as separate and independent entities. This Controller can be be used to deal with any number of parent-child relationships.
I do not have cascading saves in a root entity to handle the data for any child table because I do not have aggregate roots. Instead all CRUD activity for a table is handled by that table's class.
The reason that Active Record entity relationships tend to form a single monolithic data graph, is that the same entities are used across many different contexts in the system.
There you go, using the term "data graph" without describing what it means. If you are talking about attempting to manage a group of tables, as shown in Figure 3, as a single monolithic entity, then that is a fault in your implementation, not a fault in the pattern.
This not only leads to unwanted low-cohesion coupling,
The term cohesion relates to how well you implement the Single Responsibility Principle (SRP). In my framework this is achieved by implementing a multi-layered architecture as shown in Figure 1 where each type of component has a different instance to deal with different circumstances. Each Model class in the business layer is therefore responsible for a single entity (table) in the database as having it responsible for anything other than a single entity would be a violation of SRP.
The number of tables or the number of relationships between those tables has nothing to do with the level of coupling, as mentioned earlier.
but prevents individual components from evolving independently.
It may do in your implementation, but it does not in mine. Because each table's class is only responsible for the CRUD operations and business rules for that one table it means that I can change the business rules, and even the structure of a table, without having to amend the contents of another table's class.
Instead, seek to define "soft" service boundaries within your application and refuse to communicate across these boundaries with database queries or transactions.
That is why using a structure like that shown in Figure 1 is so important. Control logic exists only within Controllers. View logic exists only within Views. Business logic exists only within Models. Communication between those different objects is only possible through a fixed set of polymorphic APIs. The logic to generate and execute SQL queries only exists within a Data Access Object. It is not possible for anything the Presentation layer to access the database without going through an object in the Business layer. Every database transaction is performed within a single task.
Communicate between these boundaries using messages like commands / events, with class interfaces, or even with HTTP requests depending on your needs.A healthy graph of entity relationships are small clusters of related entities that exist in isolation from other clusters. Instead of coupling across these boundaries with database joins, use messaging.
Every HTTP request starts by going into the Presentation layer via a small component script which will only access those parts of the application which are necessary to carry out the selected task (use case). All communication between components is carried out by synchronous method calls, not asynchronous messaging. I do not use interfaces as I have found abstract classes to be much more powerful, especially as they allow the use of the Template Method Pattern. In a DBMS a relationship describes nothing more than a link between a child table and a parent table using a foreign key. The concept of grouping relationships into clusters is not supported in any DBMS that I have used, so there is no such clustering in my software.
With enough growth and without extreme discipline, this pattern inevitably leads to such highly coupled systems that non-trivial changes cannot be done rapidly without disruption.
The AR pattern does not in itself promote tight coupling. As already discussed in Coupling and Costs and High Coupling, Low Cohesion that is controlled by how the various method signatures are constructed, and that is an implementation detail.
My advice is straight-forward.
- When designing OOP software, program using objects, not using anemic data models.
That happens automatically when you implement encapsulation properly by creating a separate class to contain the data for an entity and the operations that perform on that data. Creating a class that has data but no methods (an anemic data model) is something which every programmer should avoid.
- Respect that some entities are naturally suited to be children of others. The parent entities become a root of an aggregate which exists to manage aggregate consistency.
Whether or not a table is related to other tables should have no bearing on how the class for that table should be constructed. Instead of having code inside each Model to deal with its relationships I have found it easier to move that code to reusable components within the framework. All that is necessary is for each Model to provide a list of its relationships in the form of a $parent_relations array and $child_relations array.
- Relegate data persistence concerns to the data layer, behind repository interfaces. Do not let these concerns leak into the application.
Agreed. That is why I have a separate Data Access Object (DAO) which constructs and executes the SQL queries for each DBMS that my framework supports. Each DAO can then handle any differences in the syntax.
- Provide repositories for entity aggregates only, these repositories can store the entire state for the aggregate. Because there are no repositories for child objects, it's certain that the 'aggregate consistency boundary' is maintained.
As discussed previously I do not see the need to have special classes for aggregates. I do not have repositories for aggregates either. Each table is a separate entity in its own right with its own structure and its own business rules, and it goes through the same DAO to perform CRUD operations. Whenever a child table has to be accessed through a parent then I have a framework component to deal with that situation. I have different components for one-to-many and many-to-many relationships.
These are not RULES, these are just sound advice. Be your own judge, but be critical. Why should this pattern be the exception to almost every bit of good advice that can be found about software design?
While that is good in theory it does not work that way in practice. Far too many developers are only ever exposed to one piece of "advice", so they follow that advice religiously as if it was carved in stone and brought down from the mountain top. If they ever attempt to criticise this "advice" they will be treated as heretics by the paradigm police and threatened with being burnt at the stake if they don't repent and learn to be consistent with everyone else.
It's impossible to cover enough software design in this article to fill the gaps that abandoning Active Record will leave. But here are some concepts to keep in mind that may help avoid pitfalls.
Instead of abandoning Active Record for a different pattern entirely do you think that it would be a better idea to analyse the deficiencies in your current implementation? If you do not take the time to identify the problem areas and work out how those problems can be avoided then how can you guarantee that you wont have similar problems with your implementation of a different pattern?
- The components that you build within knowledge boundaries are able to evolve independently of the others. Anything that leaks from the boundaries become coupled with other components which can no longer evolve independently without consideration.
That is why you should strive to construct an application using components which exhibit high cohesion, such as shown by the structure in Figure 1. By constructing a separate class for each database table to handle the data and business rules for that table you should be able to change that class without having any effect on other table classes.
- Think about how you write objects, with highly coupled implementation on the inside, but only a limited set of public methods available from the outside. Component design works the same way. You can have dozens or hundreds of class types in a component, but the public "api" for that component is a much smaller set of events, class interfaces, http requests, etc.
It is not true that objects must have tightly coupled implementations on the inside. Loose coupling can be achieved anywhere by reducing the number of data elements in each method signature. I have found it to be extremely beneficial to stop using separate data elements as parameters in my methods and to use a single $fieldarray parameter instead, as discussed in A single property for all table data. This enables me to inherit vast amounts of code from an abstract class and to implement the Template Method Pattern on all calls from a Controller to a Model or from a View to a Model.
- When objects maintain their own consistency it results in less defensive programming. Fewer arbitrary checks need to be placed around the system. Logic related to domain concepts finds a home within classes that represent those concepts.
Agreed. That is why I have a separate class for each table to handle the CRUD operations and business rules for that table. The idea of using a composite class to handle a group of tables seems totally wrong to me as it violates both encapsulation and the Single Responsibility Principle (SRP).
- Use-case implementations should be relatively simple and flat in structure. Achieve this by using entirely different object models for different use-cases so that those objects don't need to serve multiple masters. A component that generates payment CSVs and a component that processes payments with banks should not be using the same "Payment" models. They have different concerns. Use bounded contexts as component boundaries to ensure that separated concerns do not melt together into one.
In the RADICORE framework each task (use case) results in combining a Controller with a Model and a View by using a component script. Each Model supports a variety of different methods, and each Controller calls a different combination of those methods, and then calls a particular View to display the results to the user. It is important to note that the component which generates CSVs is a View while the component which records payments is a Model. I do not have a separate component which generates CSVs from a particular model simply because every View can perform its particular service with any Model.
- Avoid primitive obsession. Classes like InvoiceId, Money, Description, or Status can be type-hinted and can have their own comparison and formatting logic.
The preference for value objects is a moot point when the programming language does not support such things. It is perfectly possible to achieve all the necessary functionality using code which manipulates primitives as I have been doing so for 40 years. I do not see any savings in using value objects, so I see no point in using them.
- Avoid inheritance and prefer composition. Class inheritance results in systems so tightly coupled that they can no longer be evolved. Instead, use interfaces for polymorphism. Use composition for code-reuse. Avoid inheritance and traits entirely whenever possible.
This is bad advice as it does not explain what the problem with inheritance actually is. Without any sort of explanation I chose to ignore this "advice", and that turned out to be a good decision. I eventually discovered an article called Object Composition vs. Inheritance in which it stated Most designers overuse inheritance, resulting in large inheritance hierarchies that can become hard to deal with
. In another place it says One way around this problem is to only inherit from abstract classes
. When I began to use inheritance at the turn of the century it seemed natural to start with an abstract class, so that is what I did. By using an abstract class I later found it very easy to implement the Template Method Pattern. So as I was using inheritance properly I had no need to employ a substitute, especially as that substitute would have greatly reduced the amount of reusable code at my disposal.
Class inheritance does not create tight coupling, it is used for sharing code, not for making method calls. Problems with inheritance do not appear if you only inherit from an abstract class. By using inheritance properly you do not need to resort to object composition.
I never use object interfaces as they do not provide anywhere near as much reusability as abstract classes. As I have over 400 concrete classes which inherit from the same abstract class it means that all the methods in that abstract class are available for polymorphism. Abstract classes also open the door to implementing the Template Method Pattern which provides collection of invariant and variable "hook" methods.
- Back your repositories using SQL. Modern IDEs have plenty of tooling to support autocomplete, optimization, and more. Automated tooling such as static analysis can parse and comprehend SQL. Make optimizing and simplifying queries an important part of development and code review.
Avoid the use of an ORM and learn how to write SQL queries yourself. Take the code which generates and executes those queries and place it into a separate Data Access Object (DAO). You should find that you need only one for each DBMS, not one for each table.
- Use domain models to create system state changes. Persist those objects in the database using repositories to centralize the logic for a single entity or aggregate.
According to Martin Fowler the Active Record pattern should only be used for simple CRUD operations while the Domain Model is for business logic which is more complex, and may involve a web of interconnected objects. I do not construct different types of object depending on whether the logic is simple or complex, nor do I construct objects to deal with aggregations, I construct objects using the same basic pattern (a separate class for each table) and add as much business logic into each class as is necessary. As every concrete table class inherits from the same abstract class which implements the Template Method Pattern for each public method, it means that all standard code is provided in the invariant methods while all custom code is added to "hook" methods as and when required.
Each task (use case) is comprised of a Controller and a Model. The Model contains a number of methods which are inherited from the abstract class. I have 45 reusable Controllers, one for each Transaction Pattern, each of which calls a different subset of those methods, and I can access the same Model through as many Controllers as I want. Where I have a particular task which performs some specialist and complex processing all I need do is create a subclass of a concrete class and insert different logic into the relevant "hook" methods. For example in the DICT subsystem I have the following class files:
In this way the class still supports the basic CRUD operations but includes a different set of business logic which may or may not include communicating with other Models.
- Dispatch events to communicate important state changes to your system. Components listen to these events and project local state into a format specific to the consumption use-case. This defers the burden of database computation to write-time instead of read-time and improves database performance by multiple orders of magnitude.
What on earth is a "consumption use-case"?
If a value needs a complex calculation before it can be displayed then it is good design to perform that calculation and store the result in the database so that it can be retrieved and displayed as many times as you like without having to repeat the calculation.
- Create use-case read models in order to service consumption patterns. If you're not familiar with doing this, now is a good time to start researching.
What on earth are "consumption patterns"? This sounds like a continuation of the previous point, so it has nothing to do with the Active Record pattern.
This section contains a large number of comments which I shall not duplicate here. It talks about problems which many companies face when they discover that the software cannot change fast enough to support the business strategies. As this goes far beyond the scope of discussing the merits of the Active Record pattern I shall say no more.
In a system that will continually grow, Active Record's pattern of high coupling / low cohesion will inevitably become harmful. With sufficient growth, without the discipline to create rigid communication boundaries, system development is guaranteed to slow to a crawl. This negative consequence is built-in to the entire purpose of Active Record, which is to leverage extreme coupling between application code and the database schema.
You are repeating the assertion that the AR pattern automatically produces tight coupling and low cohesion which I have discussed and dismissed in previous sections.
It's difficult to mitigate this extreme coupling. As the application grows, development slows and becomes prohibitively expensive. Mitigating the coupling is so difficult that most engineers intuit that it's not worth doing.
Tight coupling is not a problem caused the AR pattern, it is caused by a faulty implementation. I started off with a framework that contained 4 subsystems, then added 6 application subsystems for my first ERP application. This required adding database tables, creating tasks to maintain the contents of those tables, some with very simple business logic but some with more complex logic. I can change the structure of any database table without having to change any properties or methods in their associated classes. This application has now grown to over 20 subsystems with 400+ tables, 1,200+ relationships and 4,000+ tasks. While the number of components has grown, the speed at which I can add new subsystems, new tables and new tasks has not diminished over the years. Each piece of logic exists in the correct component, and when a task is activated it only accesses those components which are necessary for that task.
Refactoring is much more expensive. What would have been code-only changes usually requires database schema changes and data migrations. This process is so difficult that organizations rarely do it.
How well your code can absorb database schema changes is down to how well it was designed and built in the first place. The RADICORE framework was specifically designed to produce database applications from database schemas, which is why I have a separate class for each table. THis provides the following advantages:
Instead of modifying the behavior of the existing system to match changes to the business, they opt to develop new features by working around the behavior of the existing system. The applications become hacks built upon hacks built upon hacks.
This is a bad practice which should be discouraged. Years before I switched to PHP I learned that every database application must start with a properly normalised database, and to organise my software structure around that database structure. If a change in business requirements results in changes to the database then I change the database, then change the software to be synchronised with that change. To do anything else, such as inserting a quick and dirty fix as doing it "properly" would take too long, always results in pain, and the more you delay it the more pain you will eventually suffer. The trick is to design your application in such a way so that you can change your database structure with minimal impact on your code. I achieved this by using a single property for all table data and multiple properties for each table's metadata. This means that I can create or modify application components with relative ease, and because it is so easy I always have the time to do it properly instead of by a quick and dirty hack. I never have to work around the behaviour of the system as the components within the system always work in a predictable and consistent way - as much common code as possible is provided by the framework while special business rules for specific tables are isolated in the subclasses for those tables.
One can argue that the programmers are to blame, but given the high cost of refactoring these systems due to the database coupling, I find it easy to sympathize with their plight.
While it is an inescapable fact that you must have a separate class for each database table to deal with the business rules for that table, you can eliminate any tight coupling by not having a separate class property for each table column. This in turn means that you can change the structure of a table without the need to also change any class properties or method signatures. A change that does not have a ripple effect is an easy change to implement.
The fault lies with the selection of tooling.
Perhaps it could be down to not using those tools correctly, or even by not using the right tools in the first place.
The art of designing with Active Record shifts focus away from encoding domain concepts and processes into code and toward implementing these processes through discrete database manipulations, leaving future engineers to try to reverse-engineer the intended business ideas from the manipulations.
My experience in writing database applications and recognising patterns of behaviour has taught me one inescapable fact - every use case in such an application always performs one or more CRUD operations on one or more tables. By having a separate class for each database table you immediately isolate the business rules for each table. The only difference between a simple use case and a complex one is the addition of business rules. By having the common behaviour of moving data between the user interface and the database taken care of by framework components, which can easily be learned and remembered by the development team, they can concentrate their efforts on the uncommon business rules. By aiming for a system with high cohesion you should end up with a properly organised structure which is easy to maintain instead of a disorganised mess. In other words a place for everything and everything in its place
. If a problem arises in one area of an application, and that area is covered by a small number of components, then you only have a small number of components to examine in order to identify and fix the problem.
Due to the inside-out nature of Active Record entities, modeling object consistency (easy in an object-oriented paradigm) becomes more difficult, and potentially impossible. Because they directly expose their guts and because entities can be created, modified, and stored in isolation despite being inconsistent with entity aggregate rules, it is impossible to model aggregate consistency.
I have been using my version of this pattern for 20 years and I have encountered no such problem. If you have a problem with aggregate consistency then perhaps it is your method of dealing with aggregates which needs to be questioned. As I have stated in Modeling Consistency is Impossible and Aggregates I do not have code within any Model to deal with any associations or relationships, I merely have metadata which identifies what relationships exist. I then have different components in the framework to deal each relationship.
Aggregate consistency can only be achieved through convention and ubiquitous awareness and discipline from its engineers. It is an optimistic gamble.
This may be true, but where is this convention defined? I have never heard of it, which is why I created my own convention and built it into my framework.
The penalty for not wanting to write a handful of simple, easy to read queries, is that we end up with 1 generalized data object for each and every normalized database table used to construct an entity aggregate.
That is why I never construct a single object to handle an aggregation of several tables. Each table is a stand-alone object, but when I have to deal with a pair of tables in a parent-child relationship is a particular way, then I create a task using a Controller which handles that particular parent-child relationship.
Each of these generalized data objects can be individually manipulated against the consistency of the whole. The very idea that these are equivalent peers of one another is a side effect of the normalized database schema leak, not something that we would consider implementing in object-oriented programming.
If look look at the aggregate in Figure 3 you will see 11 objects and 10 parent-child relationships. I do not have a separate component to manage the consistency of the entire aggregate, instead I have a separate component to manage the consistency of each parent-child relationship. By having this as a reusable component supplied by the framework I have less code to write and less code to get wrong.
When the alternative is writing code that is easier to work with, I cannot advise Active Record be used for any project that doesn't have a short lifespan or limited chance for growth.
My experience has been the total opposite. I started building my framework around this pattern in 2003, and I used it to build an ERP application which has grown to 20 subsystems, 400+ database tables, 1,200+ relationships and 4,000+ use cases. Perhaps my success has been down to the fact that I followed a different set of rules when I implemented this pattern, and I also made it just one part of a cohesive structure, as shown in Figure 1. You cannot create an application from just one pattern, you have to have a group of different patterns working in unison to satisfy the different parts of each use case.
Here are some statements which most people will recognise as being simple truths, but some people will regard as being unforgivable heresy for which the punishment is being burnt as the stake:
Anyone who believes the opposite is narrow minded, a fanatic, a dogmatist who thinks that following a set of rules is more important that achieving the best result. Designing and building software is an art, not a science, so it is not something which can be taught to someone who does not have the raw talent to begin with. If you don't have that talent then you will never become more than a simple hacker, a cargo cult programmer, a monkey instead of an organ grinder. I myself prefer a more pragmatic approach where I concentrate on using my skills to achieve the most cost-effective result instead of being hindered by the lesser skills of others. When somebody tells me You should be doing it this way
I always ask the question Why?
Unless I am given absolute proof that their method is better than mine then I will stick with a method which, in my experience, has already proved its worth.
In this article I have identified several of Shawn McCool's complaints regarding his implementation of the Active Record pattern and shown that he could have avoided those mistakes by using a different implementation. Why did I not encounter the same mistakes in my own implementation? Because I was not taught to make those mistakes. I was not taught that OO programming required a totally different way of thinking. I was not taught that I had to use encapsulation, inheritance and polymorphism in particular ways. I was not taught that I had to litter my code with design patterns. I was not taught that each design pattern also had a "preferred" implementation. When I read these "rules" after completing my framework I concluded that they must have been written by academics who had little or no experience with commercial database applications, especially an ERP application with hundreds of tables and thousands of use cases. Their area of "expertise" was not the same as mine, therefore their "advice" did not seem to be relevant or even practical.
When I began OO programming with PHP I already had 20 years of programming experience, I had already built libraries of reusable code, and I had built a framework in two different languages. I did not go on any courses to learn PHP as the online manual, coupled with online tutorials and various books which I bought, provided all the material that I needed. I read how to build classes with properties and methods, I read how to use inheritance, so I used that basic knowledge as my starting point. I saw several different ways in which things could be done, but I never regarded them as the only way in which things should be done. I already knew what I wanted to achieve, so I experimented with different ways in order to find the one that worked best for me, meaning that it had be be both simple and reliable. I was aided by the fact that instead of a basic text editor I had managed to find a proper IDE with a symbolic debugger so that I could step through the running code line by line to see exactly what was happening if a problem appeared. I had already used an IDE with a debugger in my previous language, so I knew how valuable it could be.
There were only two design decisions which I made before building my framework:
Everything else I made up as I went along. I did certain things in ways which seemed logical to me using simple code which worked. Anything which at first glance seemed very complicated I broke down into a series of little steps, then I dealt with each step one at a time. When I solved the last step the entire problem was solved.
getData($where)
to deal with SELECT queries.load()
, validate()
and store()
as I learned decades ago that when a group of functions must always be called in the same sequence then it is more efficient to insert those calls into a wrapper method so that you can replace that sequence of calls with a single call. It then becomes very easy to make adjustments to that wrapper and have those adjustments instantly available to all users of that wrapper.Active Record uses the most obvious approach, putting data access logic in the domain objectI took this as a suggestion and not a requirement. I had already decided on implementing the 3-Tier Architecture so that I could easily switch from one DBMS to another, and I wasn't going to change that for anybody.
Some people seem to think that the AR pattern can only be used in simple applications which perform nothing but basic CRUD operations, but as I had never heard of this limitation I did not build it into my implementation. By using inheritance properly I found that I could put all standard logic which can be applied to any database table into invariant methods in an abstract class, which enabled me later to add "hook" methods which made it very easy to add custom logic into any table's subclass. I have used this architecture to build a large ERP application which contains some very complex logic, so as far as I am concerned this is not a limitation of the pattern, it is down to a lack of imagination.
By using a single property for all table data alongside a series of standard methods to deal with the CRUD operations which are common to every database table I produced a collection of components which are as loosely coupled as it is possible to be. This means that instead of having Controllers which have hard-coded references to particular Models I have been able to create a library of pre-built Controllers which can, using the power of Dependency Injection, be used with any Model.
Here endeth the lesson. Don't applaud, just throw money.
The following articles describe aspects of my framework:
The following articles express my heretical views on the topic of OOP:
These are reasons why I consider some ideas on how to do OOP "properly" to be complete rubbish:
Here are my views on changes to the PHP language and Backwards Compatibility:
The following are responses to criticisms of my methods:
Here are some miscellaneous articles:
10 Jul 2024 | Amended What is the Active Record (AR) pattern? to include the description from the PoEAA book. |