CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


First, I hope you're all well and staying safe.
Second, I want to give a "heads up" that you should see more activity here shortly, and maybe a few cosmetic changes.
I'll post more details to the "Announcements" forum soon, so be on the lookout. -E

 

Results 1 to 7 of 7

Thread: Management API performance

  1. #1
    Join Date
    2007-03-30
    Location
    DFW, TX
    Posts
    406
    Rep Power
    15

    Default Management API performance

    So I've been working on adding drag-and-drop rule rearrangement to my Mac-native client, and it's presenting a problem. Refreshing the rule positions after a drag operation would require re-fetching the access-rulebase to confirm the new ordering on the management server. On the 2200 I'm using as my development target, that's can be a pretty time-consuming operation.

    I decided to run a bunch of queries with varying parameters to see just how it performs. I picked 'show application-sites' to start, because there are well over 500 included with a clean management server. I tried with details-level full and details-level UID, and with limit 500 and limit 1, and I ran each version of the call 1000 times to get good data.

    Code:
    For 500 full: 10.993s min, 14.9846s mean, 2.33993s std dev, 32.776s max
    For 500 UUID:  4.662s min,  6.6676s mean, 2.56184s std dev, 35.852s max
    For   1 full:  2.927s min,  4.9519s mean, 2.98319s std dev, 22.434s max
    For   1 UUID:  2.867s min,  4.9269s mean, 2.55149s std dev, 21.436s max
    Then on a whim, I built a SmartCenter VM on my development machine (a macpro4,1 with two Xeon X5675 processors). I started with 4 cores and 16 GB of RAM and got figures which blew me away, so I stepped it down to 2 cores and 8 GB of RAM to match my 2200, but with faster cores which support more instructions. Still unbelievably better performance:
    Code:
    For 500 full: 3.145s min, 3.96025s mean, 0.816168s std dev, 16.131s max
    For 500 UUID: 1.288s min, 1.75462s mean, 0.464386s std dev,  4.292s max
    For   1 full: 1.015s min, 1.22274s mean, 0.224978s std dev,  3.046s max
    For   1 UUID: 0.983s min, 1.20113s mean, 0.235541s std dev,  2.980s max
    That's right, the max time the VM ever took to get 500 UUIDs of a given type of object was less than the mean time it took the 2200 to get the UUID for one object. What really surprised me was how much more consistent the performance is in the VM (the worst standard deviation in the VM is just over 1/3 the best standard deviation in the 2200!). It's using a sparse-allocated disk image on a midrange consumer-level SSD, while my 2200 has an Intel S3700 in it. Clearly API performance is not disk-bound.

    The 2200 has an Atom D525, which is 1.8 GHz and has two cores. The Xeon X5675 in my workstation runs at up to 3.3 GHz. I'm running the comparison again with only one core to better even the field, but the numbers I have so far are still very much in favor of the VM. At a guess, it looks a lot like the API makes heavy use of some instruction the X5675 has in hardware which the D525 has to emulate in software. They both have MMX, SSE, SSE2, SSE3, and SSSE3, but the Xeon also has SSE4.1, SSE4.2, and AES-NI (I'm running the test via mgmt_cli locally on the system, so I wouldn't expect TLS to be involved; still, it might be).



    Still collecting data, but thought my findings so far might be interesting to others.

  2. #2
    Join Date
    2014-09-02
    Posts
    377
    Rep Power
    10

    Default Re: Management API performance

    Definitely "interesting". Just to clarify, you're running management on the 2200?

    -E

  3. #3
    Join Date
    2007-03-30
    Location
    DFW, TX
    Posts
    406
    Rep Power
    15

    Default Re: Management API performance

    I am indeed. A while ago, I found out how to modify config_system to let me set it up as a standalone. The firewall part has one rule: any, any, any, accept.

    This performance is surely why management is no longer supported on the 2200 even though firewall still was last I checked.

  4. #4
    Join Date
    2014-09-02
    Posts
    377
    Rep Power
    10

    Default Re: Management API performance

    I love trying to squeeze proverbial water from a stone, but getting R8x management to run well on an Atom is even more frustrating.

    I'd messed around a bit in the past in attempts to optimize, especially when we were still trying to help support older Smart-1 appliances. I'd usually just focused on RAM to try and improve things. Usually gave up on the Atom-based devices early, attributing most to slow and low cores.

    Cool to have some deeper insight as to why they bogged down. I'd love to know for sure what instruction set they're taking advantage of, but only as a curiosity.

    Maybe next you could try porting it to a RPi? ;)

    -E

  5. #5
    Join Date
    2007-03-30
    Location
    DFW, TX
    Posts
    406
    Rep Power
    15

    Default Re: Management API performance

    It may just be down to having more thermal headroom. The Atom was originally a reimplementation of the core x86 instructions without power-hungry features like branch prediction and speculative execution (which, interestingly, means the 2200 and other Atom-powered devices are not subject to Spectre-class issues). For every feature added, its increase to performance had to be larger than its increase to power budget.

    I know some SSE4 instructions can be emulated with earlier instructions with only a 2x-3x performance hit (seriously, not bad for instruction emulation). That's why I'm suspecting they may play a part.

    Ran the API call stat collection overnight with one core and 8 GB. It's currently running with one core and 4 GB. Even that cut down, I'm getting 4.39336s mean, 1.6944s std dev on 500 full-detail app/site objects. I'm going to have to look into VirtualBox's fractional core settings. Maybe the Xeon will match the 2200 with only 1/4 of a core.

  6. #6
    Join Date
    2007-03-30
    Location
    DFW, TX
    Posts
    406
    Rep Power
    15

    Default Re: Management API performance

    Here's the script I've been using with VMs:
    Code:
    #!/usr/bin/env bash
    TIMEFORMAT='%R'
    filePrefix="vm$(egrep "^processor\s" /proc/cpuinfo | wc -l)$(grep MemTotal /proc/meminfo | awk '{GB = $2/1000000} END {printf "%02.0f\n", GB}')"
    limits=(500 400 300 200 100 1)
    details=("Full" "UID")
    for callLimit in ${limits[@]}; do
    	for callDetails in ${details[@]}; do
    		{ for iteration in $(seq 1 1000);do
    			sleep 2
    			time mgmt_cli -r true --format json show application-sites limit "${callLimit}" details-level "${callDetails}" > /dev/null
    			done } 2> "${filePrefix}AppSites${callLimit}${callDetails}.txt"
    		done
    	done
    Important note: Both of the single-core VMs had API crashing problems. I added the 'sleep 2' between calls for them to finish the test.

    And here are some results so far:
    Code:
    ➜  CPAPI Stats for FILE in $(ls -1 *500Full.txt);do echo $FILE;echo "min = $(sort -n $FILE | head -n 1)";awk '{sum += $1} END {print "mean = " sum/NR}' $FILE;awk '{sum+=$1; sumsq+=$1*$1} END {print "stdev = " sqrt(sumsq/NR - (sum/NR)**2)}' $FILE;echo "max = $(sort -n $FILE | tail -n 1)";echo "";done
    2200AppSites500Full.txt
    min = 10.993
    mean = 14.9846
    stdev = 2.33993
    max = 32.776
    
    vm104AppSites500Full.txt
    min = 3.278
    mean = 4.39336
    stdev = 1.6944
    max = 24.743
    
    vm108AppSites500Full.txt
    min = 3.199
    mean = 4.00702
    stdev = 1.57602
    max = 25.957
    
    vm208AppSites500Full.txt
    min = 3.145
    mean = 3.96025
    stdev = 0.816168
    max = 16.131
    
    vm408AppSites500Full.txt
    min = 3.061
    mean = 3.39654
    stdev = 0.332331
    max = 5.579
    
    vm416AppSites500Full.txt
    min = 3.008
    mean = 3.29321
    stdev = 0.37936
    max = 7.772
    Each file is prefixed to indicate where it came from. For example, vm416 is a VM with four cores and 16 GB of RAM. The file prefix code doesn't handle fractional cores, so I'm not sure if I'm going to continue cutting down on resources.

    All of the systems under test had 1024 MB CPM heap, 256 MB API heap. The profile string changed with the amount of RAM, but that's it.

    Standard deviation of the response times goes up when I cut cores. Mean response time is mildly sensitive to core count and to RAM. I'm running the API calls serially, so I expect running several in parallel would be more likely to care about multiple cores.
    Last edited by Bob_Zimmerman; 2021-06-05 at 15:10. Reason: Fixed an issue with the core counting in the stat collection script.

  7. #7
    Join Date
    2007-03-30
    Location
    DFW, TX
    Posts
    406
    Rep Power
    15

    Default Re: Management API performance

    I've collected enough data for what I care about. It's posted here:

    https://github.com/Bob-Zimmerman/CPAPI-Stats

    There's an Excel spreadsheet with a tab for each configuration and a column for each test in that config. I'm not terribly good with stats in Excel, so I've just added basic box-and-whisker plots.

Similar Threads

  1. Security Management Performance on VMware
    By ba3113 in forum Security Management Server (Formerly SmartCenter Server ((Formerly Management Server))
    Replies: 14
    Last Post: 2017-08-12, 12:13
  2. management HA when the Primary and Secondary management servers are on separate Net
    By ba3113 in forum Security Management Server (Formerly SmartCenter Server ((Formerly Management Server))
    Replies: 3
    Last Post: 2017-03-06, 04:35
  3. Replies: 2
    Last Post: 2013-08-14, 02:04
  4. VSX BGP Performance
    By irom77 in forum Dynamic Routing
    Replies: 3
    Last Post: 2010-11-16, 10:34
  5. VPN Performance
    By gavvys in forum IPsec VPN Blade (Virtual Private Networks)
    Replies: 4
    Last Post: 2009-01-27, 13:13

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •