Some PHP statistics

If you ever had your C++ course you would then know something about object oriented programming, memory allocation and such. But years have passed and those words don’t have the same meaning anymore. Of  course nowadays because of languages like PHP you don’t have to worry about data types and year-lasting-debugging memory leak bugs. Everything is simple nowadays but this comes at a cost.

Of course nowadays memory is so much cheaper that you can throw in as much as you like while getting paid by your memory supplier for it. Of course nowadays you don’t have to worry about processing power because everything is scaled on an unlimited number of cores or even in the aetherous almighty cloud where all apps get eternal happiness just for being there.

But every once in a while you get to write an application where memory consumption and required processing power do matter. Of course if you’re one of the “nowdays people” you don’t even know what this means. But for people like me it’s still important to write efficient applications.

PHP does not help much here. If you get past of the nowadays upgrades like interfaces, traits, generators and of course the magic methods (can’t get any more magic than that, can you) you will find that all these waste a lot of you’re time if you really want to put them to work and you will eventually miss some of the good old day simpleness like multiple inheritance, simple malloc where you can write simple binary data in whichever form you choose. Well… here you’ll have some trouble with PHP.

Now suppose you want to just want to store some data in a reasonable amount of RAM and to access it conveniently. Perhaps you need an array or a list or a hash table. It sounds simple but it’s not so simple if you care about your resources in PHP. So I devised a little benchmark to see how different data structures compare and how they could be put to any usefulness. I just wanted the most simple thing: to store many ints. Here are the results that I got:

 

Data Structure Bytes / Item Insert Time (microseconds) Access Time (microseconds)
QuickHashIntHash 20 0.87 1.37
SPLFixedArray 48 0.09 0.6
QuickHashStringIntHash 52 1.46 0.9
SPLDoublyLinkedList 96 0.32 N/A
array 149 0.24 0.49

As you can see I ordered the table according to memory consumption. And the winner is… QuickHashIntHash with an astonishing 20 bytes per item. Not bad, not bad at all but wait a minute… I just want to store some ints which on my 64-bit computer have 8 bytes. So in order to store 8 bytes I have to use 20 bytes but at least they’re indexed which in the end turns out to be quite a performance even if I waste 60% of my memory just for indexing. The timings are not bad at all compared to the other competitors and if we look how the others competitors perform in memory consumption QuickHashIntHash is quite a winner.

If you never cared about memory efficiency because you have a lot of it take a look at the most convenient and most used ever: array. It takes 149 bytes for storing 8 bytes which means that you waste more than 94% of your memory. RAM may be cheap but if you bought let’s say 32 GB your useful data is actually 1.7 GB. So you could have forgotten about the RAM upgrade if you had a memory efficient app. And this is somehow called progress – making a 30 GB RAM upgrade so you can store the same amount of data as yesterday. It’s good that RAM is so cheap nowadays after all.

If you ever care about the program I used to build the table here it is:


$a = new QuickHashIntHash(1e7);
$mstart=memory_get_usage();
$tstart=microtime(true);
for($i=0; $i<1e7; $i++) {
$a[$i] = rand();
}
$tstop=microtime(true);
$mstop=memory_get_usage();
echo "\nMem usage: ";
echo ($mstop-$mstart)/$i;
echo "\nInsert time: ";
echo ($tstop-$tstart)*1e6/$i;
$tstart=microtime(true);
for($i=0; $i<1e7; $i++) {
$x=$a[rand(0,1e7-1)];
}
$tstop=microtime(true);
echo "\nAccess time: ";
echo ($tstop-$tstart)*1e6/$i;

 

Of course there are a few things worth noting:

  • The program generates just one line from the table. In order to generate the other lines I had to change the data structure and re-run the program
  • For QuickHashStringIntHash I had to use string keys which I generated with md5(rand())
  • I measured how much it takes rand() assignment to a variable and md5(rand()) and extracted the values from the values obtained with the little benchmark program. Thus the results in the table don’t include rand() or md5() times and not even simple variable assignment times. Therefore they are close to reality.
  • Therefore you could argue that there’s no need for an “insert time” since the assignment is not taken into account. This column is actually important if you compare it to access time to see how access time depends on the size of the array if it depends at all.
  • For the SPLDoublyLinkedList the N/A value means that after about 15 minutes I got bored with the entire benchmark because it was obviously quite useless in this department. For all other data types the entire bechmark completed well below one minute.
  • I do have frequency scaling on my CPU so the results are influenced by this although I do believe that not significantly. You probably have this as well even if you disabled it everywhere. There’s always some obscure BIOS setting that silently overrides everything – trust me on this one.

 
One final word: if you ever need really efficient memory usage in PHP and you don’t mind getting your hands dirty use php://mem wrapper and write your own indexing/hashing function as needed.

Tags: , , , , ,

Slashdot     Delicious  

Leave a Reply

You must be logged in to post a comment.