I'm doing scientific research, processing through millions of combinations of multi-megabyte arrays.
For you to be capable of answering this question you will need to have knowledge/experience of all of the following
I need to process the entire arrays, so it's a lot of data to be loaded and processed. (millions of requests within minutes on a LAN). The faster I can complete requests the quicker I can complete my work. If HHVM has to do work loading this data on each request, it accounts for a significant fraction of the time to complete the request (sometimes more than half, it depends on the complexity of the analysis I'm doing at the time).
I have found a method that has allowed me to keep these data structures cached in RAM (no loading from files, interpreting code, pushing to the array hundreds of thousands of times for no reason, no pointless repetitive unserialize etc), and thus I have eliminated this massive measurable delay.
I have 3 questions regarding how I can make this even faster:
Keep in mind that anything that prevents HHVM from caching the data structure in RAM between requests should be regarded as unacceptable.
Lookuptable35543.php
<?php
$data = [
["uuid (20 chars)", 5336, 7373],
["uuid (20 chars)", 5336, 7373],
#more lines as above
];
?>
Some of these files are many MB in size and there are a lot of them Main.php
<?php
function main() {
require /path/to/Lookuptable35543.php;
#(Do stuff with $data)
}
?>
This is working quite well, as Main.php gets thousands of requests, in a short period of time, HHVM keeps Lookuptable.php's data structure in memory. Avoiding pointless processing and IO, as it just sits in RAM, ready for use. (I have more than enough RAM)
Unfortunately, the only way I know how to make HHVM hold the lookup table in RAM is, I set $data in the global scope inside my lookup####.php file (then require the lookup file into a function in the data processing file: Main.php)? This way HHVM doesnt bother loading the file or re executing the code to create $data, because it can see that $data can be determined at compile time, and it will not ever change during runtime. This works but I dont know if there is a penalty from having the $data exist in the lookup###.php file's global scope. (Or maybe its not global at all because it is require
d into main.php's function?)
What if I return $data from a function inside Lookup.php and call that function from Main.php like this Main.php
Would the HHVM JIT the result of getData() in RAM? Somehow I associate functions with unpredictability... but maybe HHVM is clever enough to know that the functions result can be determined at compile time, and never changes?
I can't put the lookup table inside Main.php because I require
different lookup tables based on the type of request.
I'd prefer not to use OO or create a class. How can I declare types, procedural style? If a class is absolutely necessary can you please give example code suitable for my requirements above?
If you're not familiar with this HHVM caching technique please do not waste mutual time suggesting a database, redis, APC, unserialize, etc. The fastest is for HHVM to just keep my various $data variables in RAM. Even unserializing $data from a ramdisk file is slow, because then the entire data structure must be parsed as a string and converted into a data structure in memory for every request. APC has the same problem as far as i know. I dont want to even have to copy $data. The lookup tables are immutable, read only. They must just stay fully structured in RAM. My current caching solution (at the top of this question) has already given me huge gains, but as per my 3 questions I think there may be more gains to be had?
Incase you're wondering, I have measured the latency of various data loading or caching methods. Now I basically want to keep the caching situation I have, but give the HHVM JIT maximum confidence about how to type my data, so it can save time not running type or even bound (array size) checks.
Edit Ok so nobody has been able to give me any code examples yet, so I'm just trying stuff out.
Here's what I've found out so far.
const arrays don't work yet in HHVM. const foo = ['uuid1',43,43];
throws an error about HHVM only supporting constants with scalar values.
Vector with Array values: I don't know how it will perform yet... I expect it will be better than a normal array. This is valid HH code.
This is progress, because HHVM should be able to cache this in the same way, HHVM knows this whole structure is constant, and HHVM knows the indexes are all integers.
What I'm still not entirely happy about with this structure is this: Consider this code
for ($n=0;$n<count($iv);++$n) if ($x > $iv[$n][1]) dosomething();
Will HHVM perform a type check on $if[$n][2]
on every loop iteration?
In my definition of $iv
above, there is nothing that says the 2nd element of the inner array will be an integer.
How can I improve on this?
Can disabling the type checker be of any use? Does this only hide errors from the external type checker, or does it prevent HHVM from constantly doing type checks? (I'm thinking it's the first thing)
Perhaps if I could make my own user-defined type that would solve the problem?
<?hh
#I don't know what mechanisms for UDT's exist, so this code is made-up
CreateUDT foo = <string,int,int>;
$iv = ImmVector<foo> {
['uuid1',425,244],
['uuid2',658,836]
};
print_r($iv);
I found a reference to this at Hack Collections Literal Syntax Vector<Foo> unfortunately it might not be available to use yet.
Okay I've solved my first question.
I don't have any global scope issues. My require is being done from inside function main(), so it's as if the code from lookuptable####.php is being inserted into function main(). HHVM docs: "If the include occurs inside a function..." Basically if you were to open lookuptable####.php it looks like the code is in global scope, but that's not the file that is being requested from hhvm. main.php is the one being requested, thus there is no code in global scope.
I think I've answered my 2nd question, it's currently at the bottom of my question. I'm not 100% convinced, but I'm pretty happy to move ahead and test it.