The common computer application comes in two parts: program code (typically object oriented) which allows users to view and maintain different sets of data, and a database (typically relational) which allows the data to persist between executions of that program code. Both of these components should have some sort of structure (if they don't then you are in BIG trouble), but although they dealing with exactly the same data, each of them is designed and constructed using totally different principles:
As a consequence of these two design techniques being totally different, the structures produced from them are also totally different, so much so that they are totally incompatible. This incompatibility is known as the Object-Relational impedance mismatch, and is why the the Object-Relational Mapper (ORM) was invented. An ORM is a component which sits between an in-memory object (the computer program being executed) and a relational database in order to convert the structure of one into the structure of the other in all communication between the two. However, building an ORM is not an easy process. Martin Fowler says the following in his article OrmHate:
The object/relational mapping problem is hard. Essentially what you are doing is synchronizing between two quite different representations of data, one in the relational database, and the other in-memory.
In solving the problem of the incompatible structures the introduction of an ORM produces problematic side effects of its own. Because it is an additional component it involves the following:
Eventually the effort in maintaining this particular "solution" can be just as large, if not larger, as the original problem it was meant to solve. This leads to a fundamental question:
If a solution has side effects which are just as significant as the problem it was meant to solve, then is it the right solution?
Is there a different solution which produces fewer side effects? How about an obvious one - if the difference between the two structures causes a problem, then why can't the two structures be made less different? This will remove the need for an ORM as well as any problems caused by using an ORM. This is not a new idea. Some will say that it has already been tried, and failed, therefore it should not be considered as a possible solution at all. Some say that the differences between relational theory and OOD are so great that it is impossible to combine them without sacrificing some of the fundamental features that OO provides, therefore there is no choice but to employ an ORM. Let us examine this in more detail.
As I mentioned earlier a computer application has two parts, software and a database, which both work with and manipulate exactly the same data. The only difference is that one does it in memory while the other does it to disk. However, the design of the software (Object Oriented Design) and the design of the database (Database Normalisation) follow different rules, and the result of applying these different rules is a different data structure. Why on earth should these two sets of rules produce different and incompatible structures when applied to exactly the same data? Surely this would indicate that one of these sets of rules is broken and in need of serious repair? Let us examine these rules in more detail.
A well-structured relational database is designed by applying a process called Database Normalisation, which starts with First Normal Form (1NF) and progresses all the way up to Sixth Normal Form (6NF). The principles of normalisation are simple, common sense ideas that are easy to apply. Each can be defined in a single sentence, with practical "before" and "after" examples showing how they can be applied. A design cannot be considered for the Nth Normal Form until it has first passed through the N-1 Normal Form. Although some of the higher levels of normalisation are optional (this is known as de-normalisation), even if a designer stops at 3NF he must have progressed through 1NF and 2NF to get to that point.
Object Oriented Design (OOD), combined with its physical implementation in Object Oriented Programming (OOP), does not have a clear and concise set of rules or processes. There is no "step 1" to "step 6". All it has is a basic set of "features" which must exist in a language in order to support a method of programming using objects. These features can be defined as follows:
Object Oriented Programming | Writing programs which are oriented around objects. Such programs can take advantage of Encapsulation, Inheritance and Polymorphism to increase code reuse and decrease code maintenance.
Note that the effectiveness of your implementation can be measured by the amount of reusable code that you produce. The more reusable code you have at your disposal then the less code you need to write to get the job done, the less time it will take and the more productive you will be. |
Class | A class is a blueprint, or prototype, that defines the variables and the methods common to all objects of a certain kind. |
Object | An instance of a class. A class must be instantiated into an object before it can be used in the software. More than one instance of the same class can be in existence at any one time. |
Encapsulation | The act of placing data and the operations that perform on that data in the same class. The class then becomes the 'capsule' or container for the data and operations. This binds together the data and the functions that manipulate the data.
More details can be found in OOP for heretics |
Inheritance | The reuse of base classes (superclasses) to form derived classes (subclasses). Methods and properties defined in the superclass are automatically shared by any subclass. A subclass may override any of the methods in the superclass, or may introduce new methods of its own.
More details can be found in OOP for heretics and Using "IS-A" to identify class hierarchies. Inheritance is used for "is-a" relationships. |
Object Composition | A way to combine simple objects or data types into more complex ones.
More details can be found in Using "HAS-A" to identify composite objects and Use inheritance instead of object composition Object Composition is used for "has-a" relationships. |
Polymorphism | Same interface, different implementation. The ability to substitute one class for another. By the word "interface" I do not mean object interface but method signature. This means that different classes may contain the same method signature, but the result which is returned by calling that method on a different object will be different as the code behind that method (the implementation) is different in each object.
More details can be found in OOP for heretics |
There are other "features" such as multiple inheritance, visibility (private/protected/public), interfaces, exceptions, et cetera, but these are all later add-ons and therefore not fundamental to OO.
The major problem with these features is that there is no simple progression from "not object oriented" to "object oriented" as there is from "not normalised" to "normalised". Some of these features are in fact optional and not mandatory - it is possible to write a class which does not have inheritance, or which does not share any methods with other classes (polymorphism). It is even possible to split an entity's data and/or operations across more than one class, thus breaking encapsulation. The bare minimum then is to have a class from which you can instantiate one or more objects.
Because there is no simple and verifiable step-by-step progression from "not object oriented" to "object oriented", just a set of features (some of which are optional), it is left up to the individual to decide how to implement these features. Unfortunately this leads to the situation where 100 different programmers, when given the same problem, will produce 100 different implementations. There is no single, universally-accepted opinion on what OOP is and what it is not, or what constitutes "good OOP" or "bad OOP". Because there is so much interpretation involved, this inevitably leads to a great deal of mis-interpretation. As an example, before you can design a class you are supposed to go through a process called "abstraction", but what does abstraction actually mean? Unfortunately the dictionary provides two different definitions:
The result of one is a summary of essential points, the result of the other is unreal and difficult to understand. If the result of this abstraction process is wrong, it surely follows that every step taken from that point is a step in the wrong direction, even more so when every step taken does not have to follow an easily-verifiable formula.
The failure of some (most?) OO programmers to understand what the term "abstraction" really means causes them to reach conclusions and make decisions which are, IMHO, fundamentally wrong, such as:
Abstract concepts are classes, their instances are objects. Classes are supposed to represent abstract concepts. The concept of a table is abstract. A given SQL table is not, it's an object in the world. Having a separate *class* for each table is therefore bad OO.
I have to disagree. This is a prime example of someone totally misunderstanding the terms "abstract", "concept" and "real". When I see the terms "the concept of an SQL table" and "a given SQL table" I read them as follows:
An SQL table is not an object, it is merely the blueprint for a type of object, and is therefore a class. It is not until you create a record in that table that you have an actual instance of that blueprint. Thus a table definition is a class while a table row is an instance of that class. The terms "concept" and "real" can be implemented as follows:
While the abstract class may be quite large as it needs to contain code for every possible SQL query, each concrete subclass is very small as it only identifies the barest of details for a single specific database table. When instantiated into a object this combines all the possibilities of the superclass with the actualities of the subclass.
It would appear, then, that having a separate class for each table is not so bad after all. In fact, if you examine my critic's statement you will see that it is his interpretation of that statement which is questionable:
The concept of a table is abstract.
This is why I have an abstract table class which identifies every operation which can be performed on any (as yet unspecified) database table.
A given SQL table is not, it's an object in the world.
This is why I have a concrete table class for each database table, which inherits from the abstract table class. Objects are instantiated from a concrete class, not an abstract class. It really is that simple, and definitely not as complicated as some OO proponents would lead you to believe.
While there are undeniable differences between relational and OO theory, too many of today's OO programmers spend far too much time in exaggerating those differences and complaining that they are totally incompatible. I am sure that they only do this in a feeble attempt to justify their perceived need for an ORM to act as an intermediary between the two. If you actually examine these so-called differences in greater detail you will see that it is actually possible to diminish their scale - in other words, to make molehills out of mountains. Take a look at the list of "differences":
"But what about the methods?" I hear you say. Each table definition does not need to define the methods that can be performed on that table for the simple reason that the same basic methods - create, read, update and delete - are universal across all tables. You may point out that a class can have many more methods than these, but I would point out that ANY method, regardless of its complexity, is nothing more than a variation of one of these four.
In a lot of tutorials on OO I see examples of class hierarchies created just because something IS-A type of something else. For example, "dog" is a class, but because "alsatian", "beagle" and "collie" are regarded as types of "dog" they are automatically represented as subclasses. This results in a structure similar to that shown in Figure 1:
Figure 1 - hierarchy of "dog" classes
With this approach you cannot introduce a new type (breed or variety) of dog without creating a new subclass which uses the "extends" keyword to inherit from the superclass.
This is not how it is handled in a database. The DOG entity would have its own table, and in my software each table, because it has its own business rules, would have its own class. Each table can handle multiple rows, so its class should do so as well. In a database the idea of being able to split the contents of the DOG table into different types, breeds or varieties would not involve separate tables, it would simply require an extra column called DOG-TYPE which would be just one of the attributes or properties that would be recorded for each dog. If there is no need for a separate table for each DOG-TYPE I can see no reason to have a separate subclass for each DOG-TYPE.
If there were additional attributes to go with each DOG-TYPE then I would create a separate DOG-TYPE table to record these attributes, and make the DOG-TYPE column of the DOG table a foreign key which points to the DOG-TYPE column of the DOG-TYPE table, which would be its primary key. This would produce the structure shown in Figure 2:
Figure 2 - structure of "dog" tables
With this design all the attributes of a particular type/breed of dog are stored on the DOG-TYPE table, so instead of a separate subclass for each DOG_TYPE I would have a separate row on the DOG-TYPE table. When reading from the DOG table you can include a JOIN in the SQL query so that the result combines the data from both tables. This is how you can "inherit" attributes in a database. The introduction of a new type of dog requires no more effort than adding a record to the DOG-TYPE table. There are no changes required to the software, no new classes, no new screens, no new database tables, no nothing. From a programmer's point of view this simple 2-table structure is far easier to deal with than an unknown number of subclasses.
There may be cases where the number of different "types" is fixed, but the difference between them are quite significant and therefore require different table structures, in which case I would use a structure similar to what is shown in Figure 3:
Figure 3 - hierarchy of tables (1)
Here a PARTY can either be an ORGANISATION or a PERSON. The PARTY table holds the data which is common to both, while the other tables hold the data which is specific to that type. Now, if both ORGANISATION and PERSON can be broken down into different types I would use the structure shown in Figure 4:
Figure 4 - hierarchy of tables (2)
Many OO programmers just haven't a clue about relational databases and the universal SQL language, so they do not see the benefit of making their object structure as close as possible to the database structure. Instead they have this nasty habit of designing structures which are so obtuse, so off the wall, so far removed from the more sensible, normalised structure of the database, that it is virtually impossible to make the two communicate with each other without the intervention of a translation mechanism or ORM. To them their class structure comes first, and the database structure is left till last as a mere "implementation detail", an afterthought. In my humble opinion a good database design is the foundation for a good application, and anyone with more than two brain cells to rub together will tell you that you always start any construction with a solid foundation. Anything else is a disaster waiting to happen.
It appears that I am not alone in this opinion. The following book title was found on www.oreillymaker.com:
Figure 5 - Book for ORM fanatics
Instead of exaggerating the differences between the two design methodologies and making the use of an ORM virtually mandatory, my personal approach is to minimise the differences, or preferably eliminate them altogether, and try to get them to work with each other as closely as possible, thus making the use of an ORM totally redundant. I achieve this with one very simple technique - every table in the database has its own class. One table, one class, no exceptions. My critics (of whom there are many) are quick to come up with arguments such as:
To use a relational model in memory basically means programming in terms of relations, right the way through your application. [...] Some problems are well suited for this approach, so if you can do this, you should.
Writing programs which are oriented around objects. Such programs can take advantage of Encapsulation, Inheritance and Polymorphism to increase code reuse and decrease code maintenance.
As the software which I have written clearly contains classes, objects, encapsulation, inheritance and polymorphism, and it has high levels of reusability and thus low maintenance, I have clearly satisfied all the necessary criteria.
Speaking as a programmer who has built several systems which involve OO software communicating with a relational database I can state quite categorically that it can be done for the simple reason that I have done it. Perhaps my approach is more successful because I had over 20 years of experience in software development before moving into object oriented programming, and this experience made it easier for me to get to grips with how best to implement the OO paradigm. Contrast this with a lot of todays newbie programmers who have zero experience and are taught utter rubbish by clueless dunderheads who have the nerve to call themselves "experts". Instead of being allowed to experiment with various approaches to find out which one works best they are told "there is only one way", and so they follow like sheep and are never allowed to learn anything better. Those people only know what they have been taught, whereas I know what I have learned through experience. Believe me, there is a BIG difference between the two.
Unlike so many others I was not a clueless newbie when I jumped into the quagmire of object oriented programming. I had decades of experience behind me, and I used this experience to separate the wheat from the chaff. I started off my programming career using COBOL, that well known procedural language, using indexed files, hierarchical and network databases, and then some 16 years later I moved to UNIFACE, a model-driven and event-driven language using relational databases. During all this time I learnt the following valuable lessons:
The majority of this experience was with software houses where the job involved designing and building an application for one customer before moving on to another application for a different customer. This was a high pressure environment which involved bidding against other software houses for the contract, and then having to complete that project to budget and within timescale. No room there, then, for wooly-headed theories which did not cut the mustard.
In 2002 I decided to teach myself PHP so that I could move into web development, and because it had OO capabilities I decided to learn about OOP as well. From reading various books and online tutorials I discovered that the basic principles of OOP are encapsulation, inheritance and polymorphism, so I tried to combine my decades of previous experience with these new principles in order to write software. My starting point was to rewrite a development framework which I had originally designed and written in COBOL (using a single tier architecture) in 1985, then rewrote in UNIFACE in the 1990s, firstly using a 2-tier architecture then again using a 3-tier architecture when that capability was introduced into the language. My original COBOL framework was successful in reducing developer effort as it removed a lot of boring, repetitive coding and provided a lot of features "out of the box". My first 2-tier rewrite in UNIFACE was better, and my 3-tier rewrite better still, so I wanted to see if PHP+OOP could continue this trend. I'm happy to say that I was not disappointed. I went through the following sequence of events:
Later on I added other methods such as insertMultiple(), updateMultiple() and deleteMultiple() which could deal with any number of records at a time instead of being limited to just one.
To summarise, in order to define the classes which an application needs I must first identify all the different entities and their properties, and the operations which can be performed on them. If I am developing a database application then the entities and their properties have already been defined in the database structure, where each entity has its own table and its own set of properties. Because they are database tables the only operations that can be performed on them are SELECT, INSERT, UPDATE and DELETE. It makes sense to me to use the database structure as my software structure instead of going through a separate process which produces a different set of entities, properties and methods. This has two distinct advantages:
It seems that this concept is too simple for some people.
The standard answer from OO purists would be to create a new class which contains elements from all the relevant tables, but this type of solution simply is not in my repertoire. My solution incorporates any one of the following options:
Figure 6 - parent and child data in different zones
There is no rule which says that a controller may only communicate with a single model (database object), so I have built my controllers to access a separate database object for each zone. In this example it will call the getData() method on the PARENT object using whatever selection criteria has been passed down to it. Only one record will be displayed, but if more than one is retrieved the screen will contain hyperlinks to scroll back and forth between them. The primary key will be extracted from the current PARENT record and used in the getData() method on the CHILD table. The number of CHILD records actually displayed on each page will be determined by the page size, which can be varied by the user. If more records are available than can fit on a single page then hyperlinks will be available to scroll back and forth between them.
This controller can be used for any two tables which exist in a parent-child relationship as the logic is exactly the same, only the table names are different.
Figure 7 - parent and child data in the same zone
The most efficient way of combining data from more than one table in the same result set is to use an SQL JOIN. It is a feature of my framework that it is possible to have the JOIN statements constructed automatically based on relationship information which is obtained from the Data Dictionary. There is no rule that says that the data which is extracted from a database object must be obtained from a column within that table, so it is possible to construct a data array that contains columns from any number of different sources. It is therefore possible to construct SQL queries which are as complicated as you like as the result set which is produced is extracted wholesale, converted into XML, then transformed using an XSL stylesheet. The XSL stylesheet does not care where the data came from, as the fact that it exists within the XML document is good enough.
function _cm_getForeignData ($fieldarray) // Retrieve data from foreign (parent) tables. { if (!empty($fieldarray['foreign_key'])) { $dbobject = RDCsingleton::getInstance('foreign_table'); $data = $dbobject->getData("primary_key='{$fieldarray['foreign_key']}'"); if (count($data) > 0) { $fieldarray = array_merge($fieldarray, $data[0]); } // if } // if return $fieldarray; } // _cm_getForeignData
The following options are available:
$dbobject = new "table1"; $array = $dbobject->insertRecord($_POST);
Although this array contains data for fields which do not exist in "table1" no action needs to be taken as the object will only validate what it is told to validate - everything else will be ignored. When the array is passed to the DAO for the construction of the SQL query only those fields which exist in the table definition exported from the Data Dictionary will be included in that query.
function _cm_post_insertRecord ($rowdata) // perform custom processing after database record has been inserted. { $dbobject = RDCsingleton::getInstance('table2'); $data = $dbobject->insertRecord($rowdata); if ($dbobject->errors) { $this->errors = array_merge($this->errors, $dbobject->errors); } // if return $rowdata; } // _cm_post_insertRecord
Note again that even though the whole data array is passed to the object for "table2" no action needs to be taken as anything which does not belong in that table will be ignored.
It may not be rocket science, but it works, and thus adheres to the KISS principle.
Martin Fowler, in his article OrmHate wrote the following:
In-memory data structures offer much more flexibility than relational models, so to program effectively most people want to use the more varied in-memory structures and thus are faced with mapping that back to relations for the database.
I totally disagree. I have been building database applications for decades. I have built frameworks for building database applications in three different languages, and each of these frameworks has specifically targeted the database structure and not some airy-fairy, arty-farty "real world" conceptual representation which is as divorced from reality as it is possible to get. My latest web application framework has the following characteristics:
As a result of this approach I now have a Rapid Application Development Toolkit for building Administrative Web Applications. Building a new web application is now an easy process:
Using this framework it is therefore possible to generate a web application to maintain the contents of a number of database tables without having to write any HTML or SQL. Indeed, the initial maintenance screens do not require the writing of any code at all. The only time that it is necessary to write any code is to customise the screen layout, or to modify a database table class to include any business rules or to override the default behaviour.
All this and not an ORM anywhere in sight, so don't tell me that it cannot be done.
The following articles describe aspects of my framework:
The following articles express my heretical views on the topic of OOP:
These are reasons why I consider some ideas on how to do OOP "properly" to be complete rubbish:
Here are my views on changes to the PHP language and Backwards Compatibility:
The following are responses to criticisms of my methods:
Here are some miscellaneous articles: