Tony Marston's Blog About software development, PHP and OOP

Object Associations are EVIL

Posted on 2nd December 2023 by Tony Marston
Introduction
Databases do not have "associations"
Different types of relationship
Viewing a relationship
Dealing with foreign keys
Referential Integrity
Object Aggregations
References
Comments

Introduction

It was not until many years after completing my framework that I was informed that I was not following "best practices" when it came to the use of object associations. When I searched for this term on the internet I found the following:

In object-oriented programming, association defines a relationship between classes of objects that allows one object instance to cause another to perform an action on its behalf. This relationship is structural, because it specifies that objects of one kind are connected to objects of another and does not represent behaviour.

In generic terms, the causation is usually called "sending a message", "invoking a method" or "calling a member function" to the controlled object. Concrete implementation usually requires the requesting object to invoke a method or member function using a reference or pointer to the memory location of the controlled object.

https://en.wikipedia.org/wiki/Association_(object-oriented_programming)

https://en.wikipedia.org/wiki/Object_composition#UML_modeling_technique

In database design, object-oriented programming and design, has-a (has_a or has a) is a composition relationship where one object (often called the constituted object, or part/constituent/member object) "belongs to" (is part or member of) another object (called the composite type), and behaves according to the rules of ownership. In simple words, has-a relationship in an object is called a member field of an object. Multiple has-a relationships will combine to form a possessive hierarchy.

https://en.wikipedia.org/wiki/Has-a

The phrase allows one object instance to cause another to perform an action on its behalf is interpreted as meaning that if you have a group of related objects (known as an aggregate) then there is code in one object (the container) which calls methods on the other (contained) objects to perform whatever actions are required to maintain that relationship. The more objects which are in this container then the more complicated is the code. It also means that in the container class each contained object is defined as a property, just like the table's columns.

Databases do not have "associations"

In a database every relationship is between a parent table and a child table where the child has a foreign key which points to the primary key in the parent.

Databases do not have "associations", they have relationships. A relationship is between 2 tables (relations) where one is regarded as the parent and the other is regarded as the child. A relationship exists when the child table has a foreign key whose columns have a logical link to corresponding columns in the primary key of the parent table. This is also known as a "one-to-many" relationship because the parent can have many related rows on the child table, but the child can only link back to a single row on the parent table. In an Entity-Relationship Diagram (ERD) this is often depicted as shown in Figure 1:

Figure 1 - One-to-Many relationship using arrowhead notation

database-design-04 (1K)

Note that the column names used in the child's foreign key need not be the same as the names used in the parent's primary key, but the types and sizes of each column in the foreign key must be the same as the corresponding column in the primary key. Note also that a table can be related to any number of child tables and also to any number of parent tables.

Notice that the wikipedia definition shown in the Introduction states that the "correct" way to deal with associations is to have code in the controlling object (the parent) which calls methods within each controlled object (the child). This implies that you have to "go through the parent object in order to access a child object" which is interpreted as "you must access the child object from within the parent object". This may be the only method which is taught, but it is not the only method which exists, and it is certainly not the best method.

You should first examine the requirements of how to deal with a relationship before you decide on the best way to meet those requirements. These requirements can be boiled down to the following steps:

  1. Extract the primary key from the parent table
  2. Convert it into the foreign key for the child table
  3. Access the child table using this foreign key

Some people seem to think that all this activity must be performed from within the object for the parent table, but as I had never heard of this rule I chose a simpler and more logical solution: instead of having all this code within the class for the parent table I have a special Controller, which is based on the LIST2 pattern, which accesses two Models, first the parent and then the child. This means that all the necessary processing is performed within the Controller when it calls the getForeignKeyValues() function and not within any related table class. This function is given both the parent and child objects, and uses the contents of the $parent_relations array for the child to locate the entry which matches the parent. This then provides the mapping between the column names of the parent's primary key with the corresponding column names of the child's foreign key. This information is then used to construct the WHERE string which will be used to access the child table. This is explained in more detail in Dealing with foreign keys below.

You should be able to see here that the act of "going through the parent object in order to access the child object" does not mean that all this processing MUST be performed within the parent object. It is code within the LIST2 pattern which first accesses the parent object, converts the primary key to the foreign key, and then passes the foreign key to the child object. This is achieved using the standard methods which exist in every concrete table (Model) class and not by special methods which have to be added to any Model class. This technique is therefore standard, pre-written and reusable and is obviously superior to any technique which is not.

Different types of relationship

The following types of relationship are possible:

Different types/flavours of database relationship
One-to-Many This is where the child table has a primary key and a separate foreign key. Each parent in this type of relationship can have zero or more children, and the child can have no more than one parent.
One-to-One This is where the foreign key on the child table is exactly the same as its primary key. Each parent in this type of relationship can have no more than one child, and the child can have no more than one parent.
Many-to-Many This is often shown using the image to the right, meaning that "many of entity A can be related to many of entity B". This arrangement is not valid in a database. many-to-many-01 (1K)
Instead it has to be implemented as a pair of one-to-many relationships using an additional intersection table as shown as entity "X" in the image to the right. This intersection table then requires a separate foreign key for each of the parent tables, and a primary key which is comprised of both foreign keys in order to prevent the same combination of foreign keys from being added more than once.

Further thoughts on this type of relationship can be found at How to handle a Many-to-Many relationship - standard.

many-to-many-03 (1K)
Multiple This is where a child table has more than one foreign key which pointing to the same parent table. It has two variations:
  • Recursive - where an entry on table 'X' (the child) indicates a senior-to-junior relationship between two entries on table 'A' (the parent), thus forming a hierarchy of infinite depth. In this case table 'X' acts as an intersection table. The entire structure can be viewed with a can be viewed in its entirety using a Tree View.
  • Non-recursive - where table 'X' (the child) is an entity, not an intersection table. Each foreign key indicates that the linked entry on table 'A' (the parent) has a different purpose.
many-to-many-04a (1K)
Self-referencing This is where a table is related to itself. In this case the name(s) of the column(s) in the foreign key must be different from the name(s) of the corresponding column(s) in the primary key. No row should be related to itself.
many-to-many-04b (1K)
Optional This is where a row in the child table can exist without a reference to an entry in the parent table. This is done by designating each column in the foreign key as NULLable instead of NOT NULL. The relation_type on the DICT_RELATIOINSHIP table should also be set to NULLABLE so that when an entry on the parent table is deleted the foreign key on all related child entries is set to NULL.

Regardless of how each of these different types of relationship will be handled in the application, in the database they require nothing more than a link between a foreign key on the Many/Child table and the primary key on the One/Parent table.

Regardless of how each of these different types of relationship will be handled in the application, in the database they require nothing more than a link between a foreign key on the Many/Child table and the primary key on the One/Parent table. All the necessary processing is carried out by the framework by means of code in the Controller, the View and the abstract table class which is inherited by every Model (table subclass).

Note that it is possible for a foreign key to exist without a foreign key constraint, in which case all referential integrity must be carried out within the program code.

Prior to switching to PHP I had developed many applications and had dealt with hundreds of tables and relationships, so I knew what had to be done and how to do it. When I saw the code samples written by OO "experts" what immediately struck me was that their solutions were totally different, more convoluted and more complicated than mine. It became quite obvious to me that these people had no prior experience of database applications, had no experience of dealing with different kinds of relationships, but had come up with theories of how it could be done in a OO way without understanding how it had actually been done in non-OO languages. This lack of understanding led to a totally different approach:

In the RADICORE framework every relationship, regardless of its flavour, is defined in exactly the same way:

  1. Each table has its details imported from the database schema into the framework's Data Dictionary.
  2. Each relationship is defined manually using the screen provided.
  3. When the details for a table are exported from the Data Dictionary they will be written to two files in the file system - the table class file and the table structure file. The relationship details will appear in two places:

Viewing a relationship

There are just two ways in which a Parent-Child relationship can be viewed - Parent followed by Child, or Child followed by Parent

 

There are two ways in which the two tables in a parent-child relationship can be viewed, as shown in Figure 2 and Figure 3:

Figure 2 - Parent first, then Child

oop-for-heretics-07 (1K)

In this view, used by the LIST 2 pattern, the two tables have separate zones in the screen, and each zone is subject to its own set of method calls. A row from the Parent table is read first, and the primary key is extracted and converted into the equivalent foreign key for the Child table using the getForeignKeyValues() function which is called from within the Controller. This is then used as the $where string to read associated rows from the Child table. Note that with this pattern it is impossible to access entries on the Child table without first going through an entry on the Parent table.

Figure 3 - Child first, then Parent

oop-for-heretics-08 (1K)

In this view, which is common to all Transaction Patterns, there not a separate zone for the Parent table as the SELECT statement which is generated for the Child table will be customised to include one or more columns from the Parent table by means of an SQL JOIN. This can either be done manually by inserting code into the _cm_pre_getData() method, or you can get the framework to do this for you using the mechanism described in Using Parent Relations to construct sql JOINs. This means that all the data from both tables can be retrieved using a single call to the getData() method on the Child table.

It is precisely because I use a single $fieldarray variable to hold the table's data that I can include any number of columns from any number of tables. This avoids the restriction of only being able to display columns from a single table if I were to define each column as a separate variable with its own getter and setter.

Dealing with foreign keys

In the RADICORE framework each Model does nothing more than identify any tables which are related. How they are handled is the responsibility of the Controller and how they are displayed is the responsibility of the View.

Whenever a user transaction (use case) is executed it does not involve code in a single module, it uses several modules which work together in harmony, like those shown in Figure 2. While the Model contains a number of different methods it is the Controller which controls which methods are called in which sequence and with what context. This means that some of the logic for a user transaction is contained within the Controller instead of being completely within the Model. It is only after having worked on thousands of user transactions that I have been able to notice patterns of behaviour that have been repeated for different database tables, and I have managed to abstract out these patterns into a set of reusable controllers which are contained within my library of Transaction Patterns.

In my previous language, called UNIFACE, there was no separate Controller and View as these were both combined into a single component in the Presentation layer which communicated with one or more components in the Business layer. There was a separate component in the Business layer for each entity (table) in the Application Model. While each of these entities identified the table structure and the business rules they did not contain any code to deal with relationships as this was handled exclusively in the Presentation layer. If two tables were linked in a parent-child relationship then those two tables were painted in the screen, one inside the other, so that at runtime the UNIFACE software would first read the outer entity, then use the relationship details in the Application Model to convert the outer's primary key into the inner's foreign key so that it could then read the associated rows from the inner entity. This behaviour was logical and simple, so I duplicated it in my PHP code by putting the necessary code in my Controllers where it could be shared with any number of related entities instead of having to insert specific code inside each entity.

I do not have any methods inside any class to deal with any relationships, I simply have properties which identify those relationships which exist.

In the RADICORE framework each table has its own class, but none of these classes contains either properties or methods to deal with any relationship. Instead the existence of each relationship is identified in either the $parent_relations property or the $child_relations property of the two tables which are involved in that relationship. This information is then used by different components within the framework to deal with that relationship in the appropriate manner. Typically this involves creating a user transaction from a Transaction Pattern which has been designed specifically for that flavour of relationship. While a large number of tables can be maintained using the family of transactions shown in Form Families, others may require a different set of patterns. For example, in those cases where a child table requires the existence of a row in a parent table, because it contains one or more foreign keys, there are two possible approaches:

The code to deal with relationships is provided by the framework, with a different Transaction Pattern for each type of relationship

Note that there are several different patterns which may be used to deal with many-to-many relationships.

Referential Integrity

Referential integrity checks the validity of the link between the foreign key and the associated primary key in order to ensure that data integrity is maintained. In the RADICORE framework's Data Dictionary each relationship has a type column which specifies how the relationship is to be treated when deleting entries from the parent/senior table. This has the following options:

If a foreign key constraint has been defined within the DBMS then the framework will do nothing and allow the DBMS to take the necessary action.

While foreign key constraints are processed by the DBMS during insert, update and delete operations, they are totally ignored when performing a SELECT query. However, the RADICORE framework can utilise the contents of the $parent_relations array to automatically retrieve columns from a foreign/parent table during a getData() operation. This is described in Using Parent Relations to construct sql JOINs.

Object Aggregations

Martin Fowler defines an aggregate as follows:

Aggregate is a pattern in Domain-Driven Design. A DDD aggregate is a cluster of domain objects that can be treated as a single unit. An example may be an order and its line-items, these will be separate objects, but it's useful to treat the order (together with its line items) as a single aggregate.

An aggregate will have one of its component objects be the aggregate root. Any references from outside the aggregate should only go to the aggregate root. The root can thus ensure the integrity of the aggregate as a whole.

Aggregates are the basic element of transfer of data storage - you request to load or save whole aggregates. Transactions should not cross aggregate boundaries.

While I agree that the components of an aggregate are separate objects, just like those shown in Figure 4 and Figure 5, I do not agree that the components of the aggregate should be accessed through an aggregate root. This concept does not exist in the database, and has never existed in any software which I have worked on in the last 40 years. No table in a database has any special operations to deal with related tables, so I do not see any reason to put any special methods in any table class to deal with those relationships. It is an alien and artificial concept which does not exist in my universe. I cannot see any advantages of going through an aggregate root, only disadvantages. The way that I deal with relationships is through standard code which is built into components in my framework.

OO theorists like to over-complicate matters with the following distinctions:

In relational theory it is much simpler than that. A relationship between two tables is signified by one table having a foreign key which points to the primary key of the other table. All accessing is performed using the standard CRUD operations. A composition is achieved by setting all the foreign key fields to NOT NULL, in which case the child row must always contain a reference to a row that exists on the parent table. By setting the type in Referential Integrity to CASCADE all the child records will be deleted when the parent is deleted. An aggregation is achieved by setting all the foreign key fields to NULLable, in which case the child row either contains a reference to a row on the parent table or it does not contain a reference at all. By setting the type in Referential Integrity to NULLIFY all the child records will be updated when the parent is deleted.

In my experience this thing called an object aggregation is nothing more than a collection of parent-child relationships which can be arranged into a hierarchy which could be several levels deep, such as parent-child-grandchild-greatgrandchild-whatever. Two types are supported in the RADICORE framework:


Figure 4 - an aggregate ORDER object (a fixed hierarchy)

order-object (2K)

An aggregation is simply several pairs of tables which are joined together in parent-child relationships, and each relationship can be handled separately and in the same manner.

This collection of tables is joined together to form a fixed hierarchical structure. An inexperienced person would look at this collection and immediately think that it is so unique that it requires a special non-repeatable solution. However, a more experienced person, one who has been trained to look for repeating patterns which can be turned into reusable code, should to able to see something which is quite obvious - this collection contains ten pairs of tables which are joined in a one-to-many/parent-child relationship, and every such relationship will always be handled in exactly the same way. No row can exist in the child table unless it contains a foreign key which contains the primary key of a row in the parent table, and the RADICORE framework has a standard method for dealing with foreign keys. This means that I can deal with this collection of tables by creating 66 tasks which use the following Transaction Patterns:

It is the use of the ADD2 pattern which ensures that no child record can be created without a reference to its parent record.

The only time I would want to read all the data from all of these tables would be if I wanted to produce a printable copy of the entire order, in which case I would construct a task based on the OUTPUT3 pattern.

I do not have to construct an aggregate root in order to access any component within an aggregation as this is controlled by a hierarchy of navigation buttons

The idea that I should be forced to go through the aggregate root in order to access any component within the aggregation is also handled differently. Instead of creating a class to handle the responsibilities of the aggregate root I can achieve the same effect by only allowing the LIST1 task for the root table, which is this example is ORDER-HEADER, to be accessible from a menu button. All the LIST2 tasks for each child table are only accessible from a navigation button on the parent task. This means, for example, that you would have to go through both the ORDER-HEADER and ORDER-ITEM tasks before you can access any ORDER-ITEM-FEATURE entries.

Some OO afficionados might spot that this arrangement, where the ADD1 task for the ORDER_HEADER table is totally separate from the ADD2 task for the ORDER-ITEM table, allows me to create an ORDER_HEADER record without any corresponding ORDER_ITEM records, which would technically be invalid. My logic for doing it this way is that it would be far too cumbersome for the user to enter data for the entire order using multiple screens before pressing the SUBMIT button, so I separate the data into one screen at a time so that the order can be built up incrementally. When the ORDER-HEADER record is first created it has an order_status which is set to "Pending", and while it has this status the user can make whatever additions, deletions and corrections to any part of the order as is necessary. Once the user is satisfied that all the details have been entered correctly he can change the order_status to "Complete", but this will not be allowed if there aren't any entries on the ORDER-ITEM table. Once the order comes out of the "Pending" status no further amendments will be allowed except to advance the status to the next value.

Note that in this particular hierarchy the only child table in any relationship which is required is the ORDER_ITEM table. All others are entirely optional. If an entry on a Parent table is deleted then all related entries on the Child table will also be deleted.

What is not shown in Figure 4 is that the ORDER-HEADER table has an additional foreign key to the CUSTOMER table, and the ORDER-ITEM table has an additional foreign key to the PRODUCT table. These are handled using a POPUP button.

Notice that in my implementation I only ever read data from a component table when the user actually requests a task which operates on that table. This seems sensible to me, but there are others out there who seem to think that when dealing with an aggregation every member is a property of the aggregate root and should be instantiated and loaded with data whenever that root object is created. I remember reading a newsgroup post several years ago from someone who had written an application for his school. In his database he had a group of related tables called SCHOOL, TEACHER, STUDENT, ROOM, SUBJECT and LESSON, but he was complaining that his application was taking too long to load. It turned out that when he instantiated the SCHOOL class he was also instantiating all the other classes and loading in all their data in a single operation. No competent database programmer would ever do it this way. Nobody would ever load that much data into a single object as it would never be displayed to the user in a single screen. He needed to stop loading all his data into a single object and concentrate on building separate tasks to display the contents of each table when it was actually required, and then only reading from the database that data which can fit into a single screen. This is precisely what I had done in a similar application called a Prototype Classroom Scheduling Application which is available in the download of my RADICORE framework.


Figure 5 - an aggregate BILL-OF-MATERIALS (BOM) object (a recursive hierarchy)

bill-of-materials (2K)

This is a pair of tables which form a many-to-many relationship where both foreign keys on the intersection (child) table refer back to the same parent table. This produces a recursive hierarchy which can extend to an unknown number of levels as each parent can have a number of children, and each of those children can also be a parent to its own collection of children, and so-on and so-on. This produces what is commonly known as a Bill Of Materials.

With this arrangement an entry on the PRODUCT table can exist without any entries on the PRODUCT-COMPONENT table, but the reverse is not true. You cannot insert an entry into the PRODUCT-COMPONENT table without specifying the identities of two different rows in the PRODUCT table. There is no logic in the PRODUCT class which deals with the contents of the PRODUCT-COMPONENT table, just two entries in the $child_relations array. Similarly there is no logic in the PRODUCT-COMPONENT class which deals with the contents of the PRODUCT table, just two entries in the $parent_relations array.

Note that in this particular hierarchy although the effect is to relate one PRODUCT to another there is no direct relationship between the PRODUCT table and itself, instead there is an indirect relationship through the PRODUCT_COMPONENT table which is known as an intersection/link table. An entry cannot exist on this Child table without corresponding entries on the Parent table. If an entry on this Child table is deleted it has no effect on the related entries in the Parent table.

In this example the PRODUCT table contains a primary key called product_id while the PRODUCT-COMPONENT table has the following structure:

FieldTypeDescription
product_id_snr string Identifies the parent (senior) product in this relationship. Links to an entry on the PRODUCT table.
product_id_jnr string Identifies the child (junior) product in this relationship. Links to an entry on the PRODUCT table.
quantity number Identifies how many of this product are required in the parent product.

I have seen numerous examples on the interweb where a BOM which is described as "a Car has an Engine, an Engine has Pistons and Valves" is implemented by having a separate class for each of those parts and where each class contains built-in references to its immediate set of components. This means that when the Car class is instantiated it also instantiates the Engine class which, in its turn, instantiates both the Piston and Valve classes. In a database application this is absolutely, emphatically, totally wrong. None of the different products has its own class, it has its own row in the PRODUCT table, and each row in a table shares/inherits the same structure and behaviour as every other row in that table. There is nothing within the product class which identifies a row as being either a container or within a container - to find out that information you activate a separate task with looks at the PRODUCT-COMPONENT table.

Note that product_id_snr and product_id_jnr are separate foreign keys which both link back to the PRODUCT table. They are also combined in the primary key to ensure that the same combination is not used more than once. This forms a recursive hierarchy as it can contain more than the two levels which are indicated by the two tables.

Note also that products can be added or removed from the PRODUCT-COMPONENT table without affecting the contents of the PRODUCT table. While the PRODUCT table can be maintained with a forms family starting with a LIST1 pattern, the PRODUCT-COMPONENT table would be maintained by a forms family starting with the LIST2 pattern. This would show as its parent entity the product that was selected in the PRODUCT table's LIST1 screen, and below it would appear that product's immediate children. To see the entire hierarchy in a single screen you would create a task using the TREE2 pattern, or you could export it to a spreadsheet using the OUTPUT6 pattern.

This shows that the two tables can be handled independently of each other. The fact they they are related is built into the database structure which is then copied into the $child_relations and $parent_relations arrays of each table class. The rule that says that an entry on the PRODUCT table cannot be deleted if it has any entries on the PRODUCT-COMPONENT table is enforced by the framework using the settings in the $child_relations array. The rule that an entry cannot be added to the PRODUCT-COMPONENT table without supplying valid values for two entries from the PRODUCT table is enforced by the ADD2 task where the identity of product_id_snr is passed down from the parent entity in the LIST2 task and the identity of product_id_jnr is selected from a POPUP task.

Here endeth the lesson. Don't applaud, just throw money.


References


counter