Hi guys,
recently I've discovered one interesting thing in the PHP engine: memory consumption of returned values is optimized for user space code, but is not optimized for internal code. This deeply affects Phalcon, because, passing high amount of data through it, may become quite ineffective.
The issue is found in any internal method/function (in any PHP extension), which returns a value. If you look at the mechanism, how 'return_value' variable is passed in and out of internal PHP function in C, then you realize, that you need to fully copy all the returned data into this variable. There is no ability just to increase a 'refcount' of an already existing zval and just return it. This means, that it is not possible to engage PHP's memory optimization mechanism (refcounting for "copy-on-write").
At the same time, returning values from a user space method/function is fully optimized by PHP - "copy-on-write" technique is engaged there.
I've composed and verified the issue in the following script: https://pastebin.com/MZ0AZmGA . And you can see the results of running it here: https://pastebin.com/uhgfpLUn
In the test I took the most simple object in Phalcon and tested, how the memory changes, when passing data in and out non-modified. Essentially, that is what tested:
$string = str_repeat('a', 5000000);
$rawValue = new Phalcon\Db\RawValue($string);
$tmp = $rawValue->getValue();
The test confirms, that memory consumption increases by 5Mb, when returning value from Phalcon object. Same implementation of the object, made in a normal user space PHP (see links above - the test has it), doesn't suffer from such an issue. Upon returning data, no additional memory is consumed, as the returned value is just a refcounted reference to the original data.
One good note: the issue is important for string and array zvals only; it doesn't affect internal methods/functions that either return objects, or return value by reference. One bad note: this issue is not obvious for a PHP developer, because an experienced programmer knows about reference counting in PHP. So, while heavily relying on this mechanism in an everyday work, a programmer becomes deceived in his expectations of an application performance.
Overall, this all brings a challenge to Phalcon, because internal functions/methods cannot compete with user space code under scenarios, where high amount of data is traveling across the system mostly for read-only purposes. Not only Phalcon consumes much more memory under those conditions, it also spends much computer time on performing unnecessary duplication, which is high for complex structures like associative arrays.
So I'm just wondering, guys, what do you think about this issue?