Monday, February 23, 2009

Things to consider when using Zend_Db_Table_Row

Before you start using Zend_Db_Table_Row, there's a few things you need to be aware of. By default, I would say Zend_Db_Table_Row isn't very performant. Everytime you use it to insert or update a row, the row will then be fetched automatically from the dabatase to update it's content. Why? Because it's possible some external process (like a trigger) or default values did assign values to some fields after your insert and your update. It will ensure that you keep on working on valid, up to date data if you keep on using the Zend_Db_Table_Row after an operation.

Imagine you're on a site where you're using a Zend_Db_Table_Row to insert IP addresses in some ip_address table. If you get hundreds of visits per minute, you'll not only have hundreds of insert statements, but hundreds of select statements too.

From my point of view, there are two solutions. You could simply *not* use Zend_Db_Table_Row to insert or update data in tables under heavy loads (you could use Zend_Db_Table directly) or you could subclass Zend_Db_Table_Row with some extra property that will allow the class *not* to update itself after an insert or update. This is what I did at work, and I pretty much always choose not to update the row because 99.99% of the time, once the insert or update is done, I don't need the row instance anymore.

Sunday, February 15, 2009

Statement-based versus row-based replication

I'm no database expert and actually only recently became aware of the challenges in programming in a replicated environment. As I stated in another post, you need to insert data on the master, read from the slaves... and deal with some exceptions to this.

Anyhow, MySQL has been using up until recently a replication method called Statement-based which actually gives the whole SQL statement to the slave so that the slave can execute it to keep in sync with the master. It works well... in most cases. There are a couple of problems. Let's say you have an auto-increment table, and you're sending a bunch of insert statements to the slaves... there is *no* garantee that the statements will be executed in order on the slave. It could be multi-CPU where different inserts are done in different threads, or many other reasons... and you'll end up with the wrong incremented IDs. Also, let's say you'd like to use the NOW() function to get the current date... you might end up with a different date on all slaves for the same record. MySQL is trying to help you as much as it can: if you assign a random value to a field using the rand() function, it will also send the seed information to the slaves to that the random number ends up being the same.

We were going to implement some Oracle-style sequences to deal with auto-increment, but some hero at work made this amazing discovery: MySQL now supports row-based replication since version 5.1. What is row-based replication? Well, the idea is that the master will simply give the slave *data* information, not a statement. You will propragate the content of your modifications. This is so very cool. Most other databases work that way, but MySQL has been lagging in implementing this. Why is row-based so cool? Well, first of all, it fixes all problems mentioned above. Auto-increment table will be allright, using NOW() is OK... and also, if you have a bunch of triggers on the master, there's no need to replicate them on the slave because no trigger will get triggered since you're not executing statements anymore. Row-based will also create less table lockings and replicate faster.

I've always said: to any technique that offers loads of advantages, there's often at least one disadvantage that comes with it. Row-based has one of them: when you use a statement that will mass delete or mass insert records (insert into ... select from) it will will actually fill the replication log with one entry for each row that has been changed. So if you clear a table that has one million records... you're going to get massive replication going on there. At our company, we decided that this was an acceptable drawback, as in our system mass deletes, inserts or updates are very rare.

Row-based replication is relatively new to MySQL, you can expect many changes related to this mode in the few next releases but so far with the limited tests we've been conducting it seems to work very nicely.

Friday, February 13, 2009

Friday the 13th: Protected variables can be evil.

The following post on Nabble this week brought up an important 'issue':

Read the post

Remember, private variables and functions are visible only to the actual class you're in, protected will be visible also to its descendants. The use of protected variables isn't evil in itself. It can be good. It's good if it's usage is well documented and has a real reason to exist.

When you make a class that is meant to be derived, you are in fact providing other programmers a base class that can be used by other programmers to do the things they need. You are providing a service, and it is your duty to make sure the service you are providing works well. When you create a protected variable, programmers that use your base class will be able to use this variable. If you then decide to change the functionality/content of this protected variable, you can actually break the code of your descendant. This is unacceptable in real-world applications.

What can you do? There are two possibilities. You can use a protected variable and document it and tell the users of your class they can safely use it, and *never* change the functionality of this varilable. It *is* possible! Look at Zend_Db_Table_Abstract. It contains a protected variable called $_metadata. This variable is *meant* to be used, as stated in the Zend Framework documentation.
To take metadata caching a step further, you can also choose to hardcode metadata.
They then go on with an example on how to setup the variable with metadata information. This is a good example of a protected variable used the right way.

Now let's take a look at the $_cols protected variable. There is also a $_getCols() method. Why? Why is $_cols protected, and not private? There is a protected getter here, why make $_cols also protected? Here, programmers might start using $_cols directly, and when it's functionality changes, it will break their code. If you force the programmers to use $_getCols(), you can hide the internal functionality of the $_cols variable and provide a return value for $_getCols() any way you want.

Bottom line, create protected variables only when it makes sense, if not, make your variable private, and provide protected getters and setters so that your descendants can access the variable in a safer way.

Don't forget, you can't blame your class users for using a variable that was not meant to be used. It's your duty to make it private if you don't want them to use it. Imagine if you had a new TV with a big, red button, with no sign around it, that destroys your TV when you press it. Do you think the TV maker could blame *you* in court for pushing a button that was not documented in the manual? I don't think so. It's the same with OOP. Design well, and don't blame your class users for your own mistakes.

Thursday, February 12, 2009

MySQL Replication and Zend Framework

We recently switched to a replicated database system using MySQL and I had to modify our own framework (that is built on top of Zend Framework) to adapt to this new model.

In such model, you have one or many masters, which will be used for all write operations (insert, delete, update) and many slaves, which will be use for read-only operations (select).

The idea is to have a database adapter, or any class, that will handle this for us. We didn't want to have to use two adapters all the time, and use the 'master' adapter for writes, 'slave' adapter for reads, etc, on the fly manually. Maybe in your project such technique would work, but we have a more complicated framework, with business objects (which are derived from Zend_Db_Table, a Table Data Gateway pattern) which require a db adapter as parameter and will work on this adapter for database operations.

Since we wanted to make master/slave transparent for our classes that inherit from Zend_Db_Table, we needed a class that would make all the branching for the programmer and underlying Zend Framework classes.

At first, my idea was to make a class that would instantiate two adapters internally, one for the master, one for the slave. This class wouldn't derive from anything, it would be some kind of MyCompany_Replicated_Db_Manager with methods similar to Zend_Db_Adapter. But nope, this solution is impossible. Zend Framework sadly doesn't use interfaces intensively, and a class like Zend_Db_Table does validate the db adapter you're passing it against Zend_Db_Adapter class, not some Zend_Db_Adapter_Interface. I need to derive from Zend_Db_Adapter, there's no other way around with Zend Framework implementation of Zend_Db_Table.

Allright, I then jumped to solution #2. I created a MyCompany_Db_Adapter_Pdo_Mysql which, when used for write operations, uses the master connection, which is the main connection of the adapter... but dispatches all read calls to a db adapter I create internally which is mapped to the slave.

To create this adapter, you can pass it an extra 'slave' parameter which will give connection information for the slave server. If no 'slave' paramater is passed, this adapter will work like the normal one, everything will be done on your 'master' which is the default connection.

So basically, you get something like:

$myDbAdapter->insert(); // will insert automatically on the master
$myDbAdapter->fetchAll(); // will get data automatically from the slave

Sometimes, you need to make sure all read operations are also made on the master. Let's say a new user registers to your site, and the next page, you select the newly inserted user data from the database and display it to the user. By default, the fetch will be done on the slave... if the replication wasnt fast enough, you won't have the user information ready in time! You simply need to 'force' the adapter to stay on the master.


Once you're done and want the adapter to act normally, you reset the forced adapter:


There's more to it, but hopefully you get the main idea. By the way, this adapter doesn't support multiple slaves or masters. It's not its job to 'pool' connections. If you want to pool many servers, slaves and /or masters, I suggest making a class that does only that: pools different connections and gives you one randomly (or any other algo you might want to use). It's often a mistake newcomers to OOP will make: they try to make a class do much more than the thing it's supposed to do. An adapter is a single connection, it's not a pool. If you make an adapter become a pool, you have a problem with your design.

I have to admit I'm walking a fine line myself. In theory, an adapter should represent one connection, not two. But I will blame Zend Framework for this forced "OOP prostitution" ;-) Give me some Table Data Gateway interface and I'll make a connection manager, not a derived adapter...

I've seen other solutions on the internet that involved making a change to Zend_Db_Table, and not Zend_Db_Adapter. The problem with this is that it will only work when you use Zend_Db_Table. But sometimes you can't use this class, when the query is too complicated and/or when the query involves more than one table and doesn't fit in a Table Data Gateway pattern. When you extend Zend_Db_Adapter, you get your master/slave functionality at the root/top, not at the bottom.


I have been using Zend Framework since it's beta version 0.6 and have recently come across many problems and found a couple of solutions I thought could be helpful. There's a few people that suggested to me I put this information in a blog for future reference. I hope my posts will be useful to some people... if not, well, it will hopefully make me a look a bit smarter than I am.