CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


Tim Hall has done it again! He has just released the 2nd edition of "Max Power".
Rather than get into details here, I urge you to check out this announcement post.
It's a massive upgrade, and well worth checking out. -E

 

Results 1 to 9 of 9

Thread: Periodic High latency problems on a single VS

  1. #1
    Join Date
    2007-11-18
    Posts
    34
    Rep Power
    0

    Default Periodic High latency problems on a single VS

    I have a VSX cluster running on open-software on 12 core 256GB RAM HP DL380p Gen 8 boxes.
    The cluster has Just 3 virtual Firewalls on it.

    2 of these VS's have no problems what so ever, one of them has 10K+ connections 24 hours a day.
    my 3rd which has maybe 25 to 30 workstations behind it with one average less that 1000 concurrent connections keeps having problems.
    About every 9 days or so, user behind the firewall complain the network is slow.
    I've put a ping and jitter monitor on one of the networks behind the firewall to monitor.
    All of a sudden ping times and Jitter start to rise over a period of an hour the ping times to a device on the other side of the firewall ( it's an internal firewall) rise from <1ms to between 800ms - 1000ms
    the only fix I've found to bring the ping times down is the "vsx_util vsls" and move the VS's all to the other node of the affected firewall ping times drop and I can redistribute the load again.
    I have tried leaving it on the single node but ping times still rise.
    I have even rebuilt the VS from scratch and it when 20 days before we had the incident again.
    Support have not found an answer yet ( it's been several months so far.)
    has anyone come across this or have any ideas?

    Blades running on the VS are
    Firewall
    IPS
    Identity Awareness
    Url Filtering
    Application Control

    The cluster is running R77.20 no hot fixes.
    CoreXL is on.


    So whats been done.

    Cluster was upgraded from R77 to R77.20
    CoreXL for some reason was not on, has been turned on.
    VS having the problem has been removed and rebuilt from scratch.
    This morning after having the issue recurred over the weekend during the early morning when there was little traffic through the firewall Iíve change the IPS profile for my custom profile to the default profile to see if that helps.

  2. #2
    Join Date
    2014-10-10
    Posts
    250
    Rep Power
    5

    Default Re: Periodic High latency problems on a single VS

    I would start from applying 77.20 jumbo hotfix. What does 'top' show when issue reoccurs, did you watch 'netstat -ni' ?

  3. #3
    Join Date
    2007-11-18
    Posts
    34
    Rep Power
    0

    Default Re: Periodic High latency problems on a single VS

    Thanks for the suggestions
    so i checked the max connections and it's way under the set limit.

    fw vsx stat -vs 32

    Connections number: 946
    Connections peak: 5635
    Connections limit: 44900


    When I ran "top" for support using the -H to show threads it showed fwk32_0 hitting 100% its normally around 6 to 9 %, bit other than that nothing, wait was fine.
    so something caused the VS 32 to go to 100% cpu, this was before CoreXL was turned on, have to tried it since.

    I'll have to wait for the event to happen again before I can double check.

    as for netstat -ni, I've not run it during the event but will add it to my list if it happens again.

    running netstat -n a few moments ago show no error's on any of the interfaces.

    While running the hotfixes may help and I've already had to get approval for an outage to upgrade to R77.20 which support said should fix the problem, convincing management to do the hotfixes and take even a 30 second outage ( which is what we had when upgrading to r77.20) without definite proof the hotfix will help is going to be a hard sell.

  4. #4
    Join Date
    2014-10-10
    Posts
    250
    Rep Power
    5

    Default Re: Periodic High latency problems on a single VS

    Applying R77_20_jumbo_hf on cluster doesn't cause any outage. It fixed my monitord and confd processes consuming 100% CPU taking 100% , symptom "monitord[]: time shift detected !!!" appears repeatedly in /var/log/messages file. Refer to sk102988. Looks like yours id different...also 'netstat -ni' doesn't show a lot. Do you a see lot of 'swap' used ? Also you can quickly do 'ips off' when it happens again

  5. #5
    Join Date
    2007-11-18
    Posts
    34
    Rep Power
    0

    Default Re: Periodic High latency problems on a single VS

    OK so the even happened again yesterday and I was able to get some top -H info.
    I'm planning to apply HotFix take 99 tonight, hope it will fix the problem but I'm not holding my breath at this point.

    As can seen below fwk32_0 is hitting 99% normal operation is between 22 and 52%
    I checked the interfaces with netstat -ni and found no errors, i also checked the Cisco switch they the trunks the interfaces connect to and found zero errors there too.

    connections show we have not come close to our max

    Connections number: 2273
    Connections peak: 24958
    Connections limit: 44900

    and this is the second busiest firewall on the cluster the busiest is not effected in anyway. It has nearly half again as many connections, has double the rules too (fwk9_0).

    All or firewalls (3 of them) on the cluster are running the same blades, IA, IPS, App Awareness.

    During event

    Context is set to Virtual Device VSX-FWHQ-NODE1_INT-VSX-DEPT (ID 32).
    [Expert@VSX-FWHQ-NODE1:32]# top -H
    top - 21:14:38 up 55 days, 16:24, 1 user, load average: 2.18, 2.30, 1.94
    Tasks: 890 total, 4 running, 871 sleeping, 0 stopped, 15 zombie
    Cpu(s): 5.4%us, 0.9%sy, 0.0%ni, 93.3%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st
    Mem: 131924208k total, 16115500k used, 115808708k free, 537728k buffers
    Swap: 67103496k total, 0k used, 67103496k free, 7533828k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    23486 admin RT -20 1380m 845m 39m R 99 0.7 3368:00 fwk32_0
    24362 admin 0 -20 654m 118m 28m S 24 0.1 18977:05 fwk9_0
    16984 admin 16 0 41200 23m 9944 S 10 0.0 4:09.05 rad
    20169 admin 18 0 156m 99m 5140 S 4 0.1 2253:53 DAService
    25509 admin 15 0 205m 55m 23m S 2 0.0 460:20.40 cpd
    19353 admin 15 0 2644 1576 828 R 1 0.0 0:00.21 top
    23487 admin RT -20 1380m 845m 39m S 1 0.7 23:31.81 fwk32_hp
    26436 admin 15 0 429m 53m 23m S 1 0.0 438:54.76 fw_full
    20 admin RT -5 0 0 0 S 0 0.0 2:03.56 migration/6
    1765 admin 15 0 0 0 0 S 0 0.0 16:06.78 pdflush
    18036 admin 0 -20 600m 57m 17m S 0 0.0 41:31.76 fwk14_dev
    18181 admin 0 -20 599m 57m 17m S 0 0.0 41:32.76 fwk16_dev
    18718 admin 0 -20 600m 60m 20m S 0 0.0 40:54.15 fwk5_dev
    19925 admin 0 -20 600m 60m 20m S 0 0.0 192:37.15 fwk5_0


    normal operation

    Context is set to Virtual Device VSX-FWHQ-NODE1_INT-VSX-DEPT (ID 32).
    [Expert@VSX-FWHQ-NODE1:32]# top -H
    top - 09:21:08 up 56 days, 4:31, 1 user, load average: 1.25, 1.22, 1.46
    Tasks: 878 total, 1 running, 862 sleeping, 0 stopped, 15 zombie
    Cpu(s): 1.9%us, 1.2%sy, 0.0%ni, 95.9%id, 0.0%wa, 0.1%hi, 0.9%si, 0.0%st
    Mem: 131924208k total, 15372772k used, 116551436k free, 539388k buffers
    Swap: 67103496k total, 0k used, 67103496k free, 7527036k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    23486 admin 0 -20 670m 134m 38m S 27 0.1 3521:50 fwk32_0
    24362 admin 0 -20 653m 116m 28m S 26 0.1 19134:31 fwk9_0
    20169 admin 18 0 158m 100m 5140 S 5 0.1 2285:02 DAService
    30565 admin 15 0 2640 1584 828 R 2 0.0 0:00.59 top
    4348 admin 15 0 224m 59m 20m S 1 0.0 49:57.53 pdpd
    7595 admin 0 -20 867m 325m 59m S 1 0.3 171:52.23 fwk0_dev
    9099 admin 15 0 233m 71m 28m S 1 0.1 80:28.04 cpd
    19925 admin 0 -20 600m 62m 21m S 1 0.0 194:36.27 fwk5_0
    21245 nobody 16 0 7576 2268 1768 S 1 0.0 0:09.87 wmic
    23148 admin 0 -20 653m 116m 28m S 1 0.1 469:41.53 fwk9_dev
    23481 admin 0 -20 670m 134m 38m S 1 0.1 127:13.89 fwk32_dev
    26436 admin 15 0 429m 53m 23m S 1 0.0 442:04.87 fw_full
    4475 admin 15 0 224m 59m 20m S 0 0.0 83:51.61 pdpd
    4350 admin 15 0 201m 25m 11m S 0 0.0 4:25.27 usrchkd


    Netstat

    [Expert@VSX-FWHQ-NODE1:32]# netstat -ni
    Kernel Interface table
    Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
    lo32 16436 0 77289974 0 0 0 77289974 0 0 0 LRU
    wrp2048 1500 0 1204672 0 0 0 16812 0 0 0 BMRU
    wrp2049 1500 0 1206134 0 0 0 16821 0 0 0 BMRU
    wrp2050 1500 0 3098313 0 0 0 117446 0 0 0 BMRU
    wrp2051 1500 0 1892792 0 0 0 608461 0 0 0 BMRU
    wrp2052 1500 0 1325410 0 0 0 45340 0 0 0 BMRU
    wrp2053 1500 0 1305743 0 0 0 1555924 0 0 0 BMRU
    wrp2054 1500 0 2000087 0 0 0 687372 0 0 0 BMRU
    wrp2055 1500 0 1204652 0 0 0 16780 0 0 0 BMRU
    wrp2056 1500 0 1204665 0 0 0 16829 0 0 0 BMRU
    wrp2057 1500 0 1245386 0 0 0 157531 0 0 0 BMRU
    wrp2058 1500 0 1663050 0 0 0 353476 0 0 0 BMRU
    wrp2059 1500 0 44072284 0 0 0 172329828 0 0 0 BMRU
    wrp2060 1500 0 1282327 0 0 0 108023 0 0 0 BMRU
    wrp2061 1500 0 1204614 0 0 0 16811 0 0 0 BMRU
    wrp2062 1500 0 1489650 0 0 0 307941 0 0 0 BMRU
    wrp2063 1500 0 1589369 0 0 0 327157 0 0 0 BMRU
    wrp2064 1500 0 3596540 0 0 0 3078273 0 0 0 BMRU

  6. #6
    Join Date
    2014-10-10
    Posts
    250
    Rep Power
    5

    Default Re: Periodic High latency problems on a single VS

    good luck.. did you try 'ips off' . I would look at Tracker what kind of traffic is passing at the time it repeats (vuln scan, etc ?) ?

    btw , do you have 'Balancing CoreXL and SecureXL' pdf from mendrizzi at midpointtech.com ? Might be usefull to review , also recent book from http://www.maxpowerfirewalls.com/ ?

  7. #7
    Join Date
    2007-07-18
    Posts
    15
    Rep Power
    0

    Default Re: Periodic High latency problems on a single VS

    We have 6 VSX clusters running on 4800s and 13800 pair, R77.20 with HFA 135 and from time to time we also run into the same issue. Support doesn't have a fix yet. We failover to the standby member and everything works great. Then reboot the problematic node and back to normal. Annoying to say the least... No clue why it happens. Connections table is fine and so is top.

  8. #8
    Join Date
    2017-11-16
    Posts
    1
    Rep Power
    0

    Default Re: Periodic High latency problems on a single VS

    I have this problem with a VSX customer. Any solution for this problem?

  9. #9
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    1,030
    Rep Power
    15

    Default Re: Periodic High latency problems on a single VS

    Look here: 23486 admin RT -20 1380m 845m 39m R 99 0.7 3368:00 fwk32_0

    Your VS 32 is running 100% CPU. Why is a good question, but this is a clear CPU utilisation issue. May be caused by million of things. With one of my customers similar symptom was coming and going with very quick port scan from internet. Rogue packets were filtered through about 400 rules and dropped eventually, but that was a high CPU effort to match every one of them.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

Similar Threads

  1. weird tcpdump in GAIA and high latency
    By cciesec2006 in forum R75.40 (GAiA)
    Replies: 5
    Last Post: 2013-04-23, 07:50
  2. UTM-1 Edge N high latency
    By dvanr in forum Check Point UTM-1 Edge Appliances
    Replies: 2
    Last Post: 2012-11-26, 12:56
  3. upgrade ip 560 from r65 to r75.20 : high cpu problems
    By johan in forum Check Point IP Appliances and IPSO (Formerly Sold By Nokia)
    Replies: 0
    Last Post: 2012-03-19, 07:56
  4. Periodic FIB Failures
    By yheffen in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 9
    Last Post: 2009-08-17, 16:26
  5. High latency via checkpoint firewall
    By bravobritto in forum Nortel ASF/NSF
    Replies: 2
    Last Post: 2007-05-16, 12:19

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •