So I've been working on adding drag-and-drop rule rearrangement to my Mac-native client, and it's presenting a problem. Refreshing the rule positions after a drag operation would require re-fetching the access-rulebase to confirm the new ordering on the management server. On the 2200 I'm using as my development target, that's can be a pretty time-consuming operation.
I decided to run a bunch of queries with varying parameters to see just how it performs. I picked 'show application-sites' to start, because there are well over 500 included with a clean management server. I tried with details-level full and details-level UID, and with limit 500 and limit 1, and I ran each version of the call 1000 times to get good data.
Code:
For 500 full: 10.993s min, 14.9846s mean, 2.33993s std dev, 32.776s max
For 500 UUID: 4.662s min, 6.6676s mean, 2.56184s std dev, 35.852s max
For 1 full: 2.927s min, 4.9519s mean, 2.98319s std dev, 22.434s max
For 1 UUID: 2.867s min, 4.9269s mean, 2.55149s std dev, 21.436s max
Then on a whim, I built a SmartCenter VM on my development machine (a macpro4,1 with two Xeon X5675 processors). I started with 4 cores and 16 GB of RAM and got figures which blew me away, so I stepped it down to 2 cores and 8 GB of RAM to match my 2200, but with faster cores which support more instructions. Still unbelievably better performance:
Code:
For 500 full: 3.145s min, 3.96025s mean, 0.816168s std dev, 16.131s max
For 500 UUID: 1.288s min, 1.75462s mean, 0.464386s std dev, 4.292s max
For 1 full: 1.015s min, 1.22274s mean, 0.224978s std dev, 3.046s max
For 1 UUID: 0.983s min, 1.20113s mean, 0.235541s std dev, 2.980s max
That's right, the max time the VM ever took to get 500 UUIDs of a given type of object was less than the mean time it took the 2200 to get the UUID for one object. What really surprised me was how much more consistent the performance is in the VM (the worst standard deviation in the VM is just over 1/3 the best standard deviation in the 2200!). It's using a sparse-allocated disk image on a midrange consumer-level SSD, while my 2200 has an Intel S3700 in it. Clearly API performance is not disk-bound.
The 2200 has an Atom D525, which is 1.8 GHz and has two cores. The Xeon X5675 in my workstation runs at up to 3.3 GHz. I'm running the comparison again with only one core to better even the field, but the numbers I have so far are still very much in favor of the VM. At a guess, it looks a lot like the API makes heavy use of some instruction the X5675 has in hardware which the D525 has to emulate in software. They both have MMX, SSE, SSE2, SSE3, and SSSE3, but the Xeon also has SSE4.1, SSE4.2, and AES-NI (I'm running the test via mgmt_cli locally on the system, so I wouldn't expect TLS to be involved; still, it might be).
Still collecting data, but thought my findings so far might be interesting to others.
Bookmarks