CPUG: The Check Point User Group | |
Resources for the Check Point Community, by the Check Point Community.
| |
Tim Hall has done it again! He has just released the 2nd edition of "Max Power". | |
|
Hello ShadowPeak
This is what I saw in sk98767
The SmartEvent Sizing Tool is not suitable for R80.
The above Sizing Table applies to R80 version as well (the numbers were received based on estimations and testing in the lab).
Another reason why I think this LogInvestigator tool cannot be used for R80 is when I ran the tool on my Mgmt server, I used the sk you've mentioned and saw that I need Smart-1 205 appliance which I dont think is enough to deal with 75GB of logs each day (logging + indexing). I could be wrong in this approach.
Just want to know your thoughts on this.
Thanks.
Personally I wouldn't want to do R80+ management on anything lower than a Smart-1 225 which has 4 cores and 16GB of RAM. It will certainly work on a Smart-1 205 or 210 but the performance will not be good as these boxes only have 2 cores and 4GB RAM (205) or 8GB RAM (210). I guess you could try loading up a 205/210 with RAM to help out but the processor will still be a major bottleneck. Not recommended.
The Smart-1 205 was definitely not designed for that scale of logs (75GB/day), even in R77.30.
http://phoneboy.org
Unless otherwise noted, views expressed are my own
Hello Team
Can someone shed some light on my situation?
I have an Open server running Checkpoint SMS + Logging and here are the stats of current environment:
Number of clusters: 7
Logs per second: 5500-7000
Indexed logs/sec: 500-700
Log size per day: 60GB - 75GB max (45GB - 55GB : real time )
LEA Connections: 1 (Splunk)
SSH connections: 1 (Algosec)
Specs of Open Server:
8 core Intel(R) Xeon(R) CPU E5-2637 v3 @ 3.50GHz
128G Memory
OS Drive = Raid 1 (Mirrored)
Data Drive = RAID 5 ( 3 drives)
We are experiencing performance issue with the Management server while using smart console (response time is very slow at times), some admins sessions get disconnected. When looking at "top" output, I can see that java, log_indexer are the top consumers of CPU. So we decided to offload either Logging or Policy Management from the Open server and buy a 3050 Appliance (Not that this will solve all the issues, but want to offload the load)
Specs of 3050: CPAP-NGSM3050
indexed logs/sec: 26000 / 16000 for single domain
event logs/sec: 3000
events per day: 4000000
log size per day: 40GB
Raid type: 5,10
My Question: Is running logging on the Open server and running policy management on 3050 preferred or other way around is preferred? Will load of "java" process be split between Open server and 3050 once I separate logging and Policy Mgmt?
As the CPU of the Open server is little higher than the 3050, I am tending to use logging on the Open server as we also do Log Indexing.
Hello, here is the output of the commands (Not during the access is slow, I will try and get that later next week)
[Expert@xxxx]# free -m
total used free shared buffers cached
Mem: 128729 126102 2627 0 1719 71654
-/+ buffers/cache: 52728 76001
Swap: 8189 0 8188
[Expert@xxxx]#
[Expert@xxxx]#
[Expert@xxxx]# mpstat 2 5
Linux 2.6.18-92cpx86_64 (xxxx) 12/08/17
21:10:53 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
21:10:55 all 23.72 44.44 3.43 0.06 0.06 0.62 0.00 27.65 2468.34
21:10:57 all 21.31 49.62 3.12 0.06 0.00 0.31 0.00 25.56 1467.50
21:10:59 all 22.92 43.66 3.06 0.19 0.00 0.44 0.00 29.73 2891.50
21:11:01 all 18.95 42.09 2.56 0.00 0.06 0.25 0.00 36.09 1486.07
21:11:03 all 23.61 38.23 12.18 0.06 0.00 0.81 0.00 25.11 2088.50
Average: all 22.10 43.61 4.87 0.07 0.02 0.49 0.00 28.83 2079.40
[Expert@xxxx]#
[Expert@xxxx]#
[Expert@xxxx]# iostat 2 5
Linux 2.6.18-92cpx86_64 (xxxx) 12/08/17
avg-cpu: %user %nice %system %iowait %steal %idle
17.44 34.60 4.30 0.15 0.00 43.52
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 54.93 28.79 7689.83 385513966 102962977947
cciss/c0d1 288.83 590.88 36687.55 7911562490 491227713104
dm-0 158.82 6.31 1269.50 84456346 16998006000
dm-1 6.37 611.60 43107.67 8188972530 577189925768
sda 0.00 0.00 0.00 41820 0
avg-cpu: %user %nice %system %iowait %steal %idle
34.69 41.69 2.50 0.00 0.00 21.12
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 5.53 8.04 168.84 16 336
cciss/c0d1 13.07 1672.36 0.00 3328 0
dm-0 21.61 8.04 168.84 16 336
dm-1 13.07 1672.36 0.00 3328 0
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
31.31 37.56 4.69 0.12 0.00 26.31
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 216.50 0.00 5488.00 0 10976
cciss/c0d1 939.50 1540.00 151604.00 3080 303208
dm-0 70.00 0.00 560.00 0 1120
dm-1 18.50 1540.00 156532.00 3080 313064
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
29.48 39.16 2.87 0.06 0.00 28.42
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 7.04 0.00 329.65 0 656
cciss/c0d1 12.06 1543.72 0.00 3072 0
dm-0 41.21 0.00 329.65 0 656
dm-1 12.06 1543.72 0.00 3072 0
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
32.46 40.76 2.43 0.00 0.00 24.34
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 7.46 0.00 175.12 0 352
cciss/c0d1 12.94 1655.72 0.00 3328 0
dm-0 21.89 0.00 175.12 0 352
dm-1 12.94 1655.72 0.00 3328 0
sda 0.00 0.00 0.00 0 0
[Expert@xxxx]# /sbin/cpuinfo
HyperThreading=disabled
Thanks.
Plenty of RAM, no swap space usage. This assumes of course that the Smart-1 has not been rebooted since the last slow period(s).
A total of 43.61% CPU time is nice'd (has a lower priority) in process space which is SOLR doing log indexing. No excessive waiting for I/O.[Expert@xxxx]#
[Expert@xxxx]#
[Expert@xxxx]# mpstat 2 5
Linux 2.6.18-92cpx86_64 (xxxx) 12/08/17
21:10:53 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
21:10:55 all 23.72 44.44 3.43 0.06 0.06 0.62 0.00 27.65 2468.34
21:10:57 all 21.31 49.62 3.12 0.06 0.00 0.31 0.00 25.56 1467.50
21:10:59 all 22.92 43.66 3.06 0.19 0.00 0.44 0.00 29.73 2891.50
21:11:01 all 18.95 42.09 2.56 0.00 0.06 0.25 0.00 36.09 1486.07
21:11:03 all 23.61 38.23 12.18 0.06 0.00 0.81 0.00 25.11 2088.50
Average: all 22.10 43.61 4.87 0.07 0.02 0.49 0.00 28.83 2079.40
[Expert@xxxx]#
[Expert@xxxx]#
These statistics were not taken during a slow period, yet the total CPU is only 28% idle. The box just looks very busy from a CPU perspective, my guess is during the slow periods the CPU is running at 100%, even if quite a bit of process space time is nice'd there can still be contention on the hard drive as well. I guess we will find out when you capture some stats during a slow period.[Expert@xxxx]# iostat 2 5
Linux 2.6.18-92cpx86_64 (xxxx) 12/08/17
avg-cpu: %user %nice %system %iowait %steal %idle
17.44 34.60 4.30 0.15 0.00 43.52
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 54.93 28.79 7689.83 385513966 102962977947
cciss/c0d1 288.83 590.88 36687.55 7911562490 491227713104
dm-0 158.82 6.31 1269.50 84456346 16998006000
dm-1 6.37 611.60 43107.67 8188972530 577189925768
sda 0.00 0.00 0.00 41820 0
avg-cpu: %user %nice %system %iowait %steal %idle
34.69 41.69 2.50 0.00 0.00 21.12
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 5.53 8.04 168.84 16 336
cciss/c0d1 13.07 1672.36 0.00 3328 0
dm-0 21.61 8.04 168.84 16 336
dm-1 13.07 1672.36 0.00 3328 0
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
31.31 37.56 4.69 0.12 0.00 26.31
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 216.50 0.00 5488.00 0 10976
cciss/c0d1 939.50 1540.00 151604.00 3080 303208
dm-0 70.00 0.00 560.00 0 1120
dm-1 18.50 1540.00 156532.00 3080 313064
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
29.48 39.16 2.87 0.06 0.00 28.42
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 7.04 0.00 329.65 0 656
cciss/c0d1 12.06 1543.72 0.00 3072 0
dm-0 41.21 0.00 329.65 0 656
dm-1 12.06 1543.72 0.00 3072 0
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
32.46 40.76 2.43 0.00 0.00 24.34
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 7.46 0.00 175.12 0 352
cciss/c0d1 12.94 1655.72 0.00 3328 0
dm-0 21.89 0.00 175.12 0 352
dm-1 12.94 1655.72 0.00 3328 0
sda 0.00 0.00 0.00 0 0
[Expert@xxxx]# /sbin/cpuinfo
HyperThreading=disabled
Thanks.
Please provide output of raidconfig status to make sure your RAID setup is healthy, although based on the very low wio percentage I doubt your RAID setup is degraded.
Finally you can run cpview in historical mode with the -t option and step through minute-by-minute a period where the access was known to be slow. I'd recommend using cpview and look at its CPU screen during a past slow period, see sk101878 for how to invoke cpview in historical mode.
Last edited by ShadowPeak.com; 2017-12-09 at 09:43.
Hello ShadowPeak
Looks like historic mode is not available for SMS versions R77 and above - "history mode is not supported" (Found this in Supported Deployements part of the sk). Our SMS is R80 Take 76.
Any thoughts?
Thanks.
Hello
Here are the stats during slow times:
[Expert@xxx]# free -m
total used free shared buffers cached
Mem: 128729 127648 1081 0 2489 59586
-/+ buffers/cache: 65572 63157
Swap: 8189 0 8188
[Expert@xxx]#
[Expert@xxx]#
[Expert@xxx]# mpstat 2 5
Linux 2.6.18-92cpx86_64 (xxx) 12/18/17
15:08:55 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
15:08:57 all 11.12 59.12 3.62 0.00 0.06 0.50 0.00 25.56 2503.48
15:08:59 all 16.49 58.84 4.06 0.25 0.00 0.44 0.00 19.93 2522.39
15:09:01 all 7.87 62.23 3.12 0.06 0.00 0.12 0.00 26.59 1909.09
15:09:03 all 10.69 61.50 5.81 0.00 0.00 0.69 0.00 21.31 3176.12
15:09:05 all 8.99 69.08 4.06 0.00 0.00 0.81 0.00 17.05 1858.29
Average: all 11.03 62.16 4.14 0.06 0.01 0.51 0.00 22.09 2396.40
[Expert@xxx]#
[Expert@xxx]#
[Expert@xxx]# iostat 2 5
Linux 2.6.18-92cpx86_64 (xxx) 12/18/17
avg-cpu: %user %nice %system %iowait %steal %idle
17.29 34.93 4.43 0.15 0.00 43.19
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 55.55 27.34 7729.58 389102390 110004375971
cciss/c0d1 291.19 586.85 37088.98 8351799834 527835848344
dm-0 159.87 6.16 1277.90 87698570 18186559392
dm-1 6.31 606.35 43540.47 8629309394 619650905040
sda 0.00 0.00 0.00 44340 0
avg-cpu: %user %nice %system %iowait %steal %idle
7.81 65.38 3.81 0.00 0.00 23.00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 5.50 8.00 796.00 16 1592
cciss/c0d1 1565.00 1276.00 95480.00 2552 190960
dm-0 13.00 8.00 100.00 16 200
dm-1 10.50 1276.00 96176.00 2552 192352
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
9.69 67.00 3.06 0.00 0.00 20.25
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 6.50 0.00 200.00 0 400
cciss/c0d1 17.50 2180.00 0.00 4360 0
dm-0 25.00 0.00 200.00 0 400
dm-1 20.50 2180.00 0.00 4360 0
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
9.24 68.52 4.62 0.00 0.00 17.61
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 3.50 0.00 120.00 0 240
cciss/c0d1 14.50 1856.00 0.00 3712 0
dm-0 15.00 0.00 120.00 0 240
dm-1 14.50 1856.00 0.00 3712 0
sda 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
8.86 67.85 4.56 0.06 0.00 18.66
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 8.50 0.00 880.00 0 1760
cciss/c0d1 2078.00 2176.00 132344.00 4352 264688
dm-0 24.50 0.00 196.00 0 392
dm-1 17.00 2176.00 133028.00 4352 266056
sda 0.00 0.00 0.00 0 0
We have 3 LEA connections and one Algosec connection from the same SMS.
Thanks.
Quick Update:
So we have purchased a 3050 appliance and we are doing policy management + logging (indexing) + LEA connections on the 3050 appliance. (Just transferred from open server to appliance) and I see java CPU consumption went drastically low.
One thing that struck me was: My open server was configured as RAID5 but the appliances are RAID10. I am guessing this has an huge impact on logging operations. If am not sure if that's the culprit.
Note: My Open server specs are better than the appliance. (CPU and Mem wise)
Thanks.
Bookmarks