
Originally Posted by
Syranide
Sorry everyone, I got really busy around the time I posted that patch and haven't really had any time since then. There has not been any further progress due to time and motivation. So I'm sad to say that it's currently on ice, it's mostly in a working state, but there's a couple of new features that has to be added to before it's safe for the public, one being benchmarking all functions/operators which I'm not even quite sure how to do reliably at the moment.
I may have just the answer for you. A couple of months ago I was playing with loops in single player to see the maximum number of Ops I could get with reasonable framerates. I realized that beyond a certain Ops level my framerate would be less than 10 so I was sticking below that number. Then for fun I went even higher to see if there was any benefit. It turns out that beyond a certain work threshold nothing improves but the game clock actually runs slower. There's also a way to measure how much slower, and this can reveal things about E2 operator efficiencies.
I suspect that the game engine allocates a certain portion of each tick to physics calculations, a certain portion to other work (such as running Lua and E2 code), and gives the rest to rendering. If you use up more and more of its time with E2 code, there's an equal drop in the allocated time it uses to render the game. A linear increase in work produces a linear decrease in FPS, down to the apparent limit of 10 FPS where the game has exactly as much work as it can handle. If you give the game twice as much work as it can handle, it will stretch out its ticks to accommodate it and "net_graph 1" will still report 10 FPS, but those 10 frames actually took 2 seconds.
For example, say you start a single player game and use arbitrarily high quotas (1,000,000 seems good). You code an E2 to do 1000 loops of a certain block of code, some basic math or whatnot. This code generates 50 Ops on it's own, so your E2 is now doing ~50,000 Ops per tick. You set runOnTick(1) and, using curtime() as your clock, time it to run for 1 second. It's now just run for 1 second at 50,000 Ops and with a reasonable FPS hit.
If you increase the loops per tick to the point where your game FPS hits exactly 10 (say that's 2000 loops) you've reached the Ops limit for your computer. Your E2 will run for 1 second and do 2000 loops per tick (~100,000 Ops) and your FPS will be 10.
If you further increase the loops per tick to 4000, what should happen? Your E2 will say it's doing ~200,000 Ops and your net_graph framerate will continue to report as 10, but now your expression finishes in 2 seconds instead of 1, and the game actually feels even slower than when you had it at 2000. If you compare the start and finish times you got using curtime() it will say the difference in time is 1 second, but if you compare the difference using realtime() it will report a difference of 2. The game has stretched out that second into 2 seconds to accommodate the work.
An E2 can use this knowledge to calculate the real CPU cost of a certain block of code by calculating the burden it places on the game. The burden is the measured time it actually takes to complete divided by the reported time it takes (delta realtime() over delta curtime()). In the example above, we could calculate the burden of the 4000 loop expression to be 2.0 (2 seconds/1 second). The burden of the 2000 loop expression would be 1.0 because it is using exactly as many Ops as are available (the Ops limit). The burden of the 1000 loop expression should be 0.5, but since smaller loops generate less work and all work fits within the tick, there's no time discrepancy and no way to determine what the burden is when it's less than 1. A burden of greater than 1 is necessary for accurate measurements to be made.
Here is the E2 I came up with: It executes a loop of code 1000 times per tick for 2 seconds. It keeps multiplying the number of loops by 1.5 every 2 seconds until the burden becomes measurable. Once the burden surpasses 2.5 it scales the number of loops back to give a burden of around 2 for consistency. After this burden is reached, it reports the maximum number of loops that would be needed for a burden of 1.0 as well as the maximum number of Ops based on the current Opcount. If the costs for all operations are accurate, this Max Ops should be the same for any type of operation that you're testing. In reality, the pre-patch numbers are pretty close.
You will need to increase the singleplayer quotas beyond their defaults to about 1 million. Add this to your GMod shortcut and then start and exit the game a few times for it to take effect:
+wire_expression2_quotatick 1000000 +wire_expression2_quotahard 1000000 +wire_expression2_quotasoft 1000000 +map gm_flatgrass
I chose gm_flatgrass because it has a low graphical impact so the numbers should be more accurate and repeatable.
Code:
@name Loadtest
@inputs
@outputs Ops Burden Loops Rtime Max MaxOps
@persist Rstart Cstart Loops Run
@trigger all
if(first()|duped()) {
Rstart=realtime()
Cstart=curtime()
Loops=1000
Run=1
}
if(Run) {
for(Loop=1,Loops) {
#The block of code to stress test
FOO=randint(4,100)
Bar=cos(FOO)
}
Rtime = realtime()-Rstart
if(Rtime > 2) {
Ops = opcounter()
Ctime = curtime()-Cstart
Burden = Rtime/Ctime
if(Burden > 2.5) { Loops = Loops*2/Burden } #Obtain a burden of 2.0 for consistency
elseif(inrange(Burden, 1.9, 2.1)) {
Max=round(Loops/Burden)
MaxOps=round(Ops/Burden)
print(" ")
print("Max loops for a burden of 1: "+Max)
print("Max Ops for a burden of 1: "+MaxOps)
Run=0
}
else { Loops*=1.5 }
Rstart=realtime()
Cstart=curtime()
}
}
runOnTick(Run) On my 4.0 GHz i7-860 this code reports a Max Loops of ~8600 and a Max Ops of ~110,000. I suspect these numbers are specific to my computer. Please post your own and then feel free to modify the code to test different operations.
The way that this can be used for benchmarking is to compare the Max Loops you get with a loop of just a single operator type or function. If the numbers improve with the patch, then it can be assumed the performance of that operation has improved by the same amount.
Also, by comparing different operators post-patch it should give you a reasonable estimate of Costs that each operation type has based on actual CPU time used. Costs can be determined relative to each other by dividing a large constant number by the Max Loops for that operation.
Hope this helps!
Note: The attachment system is dumb and added an extra period to the file name. Please remove it if you download the .txt file.
Bookmarks