CPUG: The Check Point User Group

Resources for the Check Point Community, by the Check Point Community.


I'd like to thank everyone involved for making "The CPUG Challenge" a great success.
We helped a lot of people see and learn a bit more about R80.10, while having some fun.
We will be using this success to try and bring more events to more locations soon. -E

 

Page 1 of 2 12 LastLast
Results 1 to 20 of 25

Thread: MDS failed to start after mds_backup in R77.30 with JHFA 205

  1. #1
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default MDS failed to start after mds_backup in R77.30 with JHFA 205

    All,
    I've run into this issue three times in the past two weeks that I am trying to find out why:

    Scenario: A provider-1 R77.30 with JHFA 205 MDS manager and container with a single CMA.

    Every Sunday, Wednesday and Friday morning, I perform an mds_backup via script. This has worked well for me in R75.47.

    With R77.30, for the past two weeks, it has failed to start up the MDS after the mds_backup even though I have the mdsstart in the script. If failed with this message:

    Packing up Backup Folder

    Starting MDS

    Starting cpWatchDog
    Failed to start CPWD. Aborting.


    I have to go in and manually restart the MDS with mdsstart.

    The issue has happened about 50% of the times in the past two weeks.

    Any ideas, thoughts?

    Below is my mds_backup script:

    ----

    #!/bin/sh -x
    . /etc/profile.d/CP.sh

    DATE="`/bin/date +%b_%d_%Y_%Hh_%Mm`"
    FILENAME="mds_backup_`uname -n`_$DATE"
    LOG=/var/tmp/backup_log_$DATE
    BACKUPDIR=/var/backup/tmp
    LOCALSTORAGE=/var/backup/storage


    ### First, do no harm by entering the /var/tmp directory
    cd /var/tmp

    ### Set the mds environment with the mdsenv command

    mdsenv

    ### create /var/backup/tmp if one does not exist


    if ! [ -d /var/backup ]
    then
    echo "Backup folder missing, creating" >> $LOG
    mkdir /var/backup
    fi
    if ! [ -d /var/backup/tmp ]
    then
    echo "Backup tmp folder missing, creating" >> $LOG
    mkdir /var/backup/tmp
    fi
    if ! [ -d /var/backup/storage ]
    then
    echo "Backup storage folder missing, creating" >> $LOG
    mkdir /var/backup/storage
    fi
    if ! [ -d /var/backup/log ]
    then
    echo "Backup log folder missing, creating" >> $LOG
    mkdir /var/backup/log
    fi


    ### Enter $BACKUPDIR directory
    cd $BACKUPDIR

    ### Remove EVERYTHING inside $BACKUPDIR directory
    echo -e "Cleaning /var/backup/tmp" >> $LOG
    rm -rf /var/backup/tmp/*

    ### Create today mds_backups with today time directory
    echo -e "Creating $BACKUPDIR/$FILENAME directory\n" >> $LOG
    mkdir $BACKUPDIR/$FILENAME

    ### Enter $BACKUPDIR/$FILENAME

    cd $BACKUPDIR/$FILENAME


    ### Gather important system information

    /bin/clish -c 'show configuration' >> $BACKUPDIR/$FILENAME/configuration.txt

    # STOP MDS
    echo -e "==========\nStopping MDS\n==========\n" >> $LOG
    $MDSDIR/scripts/mdsstat >> $LOG
    $MDSDIR/scripts/mdsstop >> $LOG
    sleep 5

    # Pass 2
    $MDSDIR/scripts/mdsstat >> $LOG
    $MDSDIR/scripts/mdsstop >> $LOG
    sleep 5


    ## Perform backup
    echo -e "\n==================\nBeginning mdsbackup\n==================\n" >> $LOG
    echo y | $MDSDIR/scripts/mds_backup -b -d $BACKUPDIR/$FILENAME 2>> $LOG >> /dev/null
    echo -e "\n==================\nCompleted mdsbackup\n==================\n" >> $LOG

    ### Pack up
    echo -e "Packing up Backup Folder" >> $LOG
    md5sum $BACKUPDIR/$FILENAME/* > $BACKUPDIR/$FILENAME/md5sum.txt
    tar -cvf $BACKUPDIR/$FILENAME.tar $BACKUPDIR/$FILENAME
    mv $BACKUPDIR/$FILENAME.tar $LOCALSTORAGE/

    echo -e "\nStarting MDS\n" >> $LOG
    $MDSDIR/scripts/mdsstart >> $LOG
    #sleep 60
    sleep 30
    $MDSDIR/scripts/mdsstat >> $LOG

    echo -e "Off-loading backup files to network" >> $LOG
    scp -i /etc/scripts/.ssh/id_rsa $LOCALSTORAGE/$FILENAME.tar backup@192.168.1.250:/data/p1-mc >> $LOG
    scp -i /etc/scripts/.ssh/id_rsa $LOCALSTORAGE/$FILENAME.tar backup@192.168.1.251:/data/p1-mc >> $LOG
    scp -i /etc/scripts/.ssh/id_rsa $LOCALSTORAGE/$FILENAME.tar backup@192.168.1.252:/data/config/p1-mc >> $LOG


    ################################
    #### Compress Rule Hit Table ###
    ################################

    #echo -e "\n================\nCompressing Rule Hit Table\n================\n" >> $LOG

    mdsenv 192.168.1.1
    mcd conf
    rm $FWDIR/conf/hit_count_rules_table.sqlite.backup

    cp $FWDIR/conf/hit_count_rules_table.sqlite $FWDIR/conf/hit_count_rules_table.sqlite.backup
    ls -l $FWDIR/conf/hit_count_rules_table.sql* >> $LOG

    $FWDIR/conf/hit_count_unification_tool.sh >> $LOG
    ls -l $FWDIR/conf/hit_count_rules_table.sql* >> $LOG
    sleep 5
    $FWDIR/conf/hit_count_unification_tool.sh >> $LOG
    ls -l $FWDIR/conf/hit_count_rules_table.sql* >> $LOG
    sleep 5
    $FWDIR/conf/hit_count_unification_tool.sh >> $LOG
    ls -l $FWDIR/conf/hit_count_rules_table.sql* >> $LOG
    sleep 5
    $FWDIR/conf/hit_count_unification_tool.sh >> $LOG
    ls -l $FWDIR/conf/hit_count_rules_table.sql* >> $LOG
    sleep 5
    $FWDIR/conf/hit_count_unification_tool.sh >> $LOG
    ls -l $FWDIR/conf/hit_count_rules_table.sql* >> $LOG

    echo -e "\n================\nCompleted Compressing Rule Hit Table\n================\n" >> $LOG

    mv $LOG /var/backup/log

    #echo "REBOOT Phase" >> $LOG
    #/sbin/shutdown -r now

  2. #2
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    has anyone run into this issue? it has happened to me twice in the past two weeks. Every time it does that, I have to manually log into the MDS and manually restart it.

  3. #3
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by cciesec2006 View Post
    has anyone run into this issue? it has happened to me twice in the past two weeks. Every time it does that, I have to manually log into the MDS and manually restart it.
    add manual mdsstop (pause) before backup and mdsstart afterwards to the script.

    in essence, mdsbackup tries to stop MDS and all processes, but may hand sometimes, especially when someone did not logout from a GUI by mistake.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  4. #4
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by varera View Post
    add manual mdsstop (pause) before backup and mdsstart afterwards to the script.

    in essence, mdsbackup tries to stop MDS and all processes, but may hand sometimes, especially when someone did not logout from a GUI by mistake.
    Thank you for the advice; however, I am not sure if you read my script. I have "mdsstop" and paused twice prior performing the mds_backup.

    However, on the "mdsstart", ALL mds processes are stopped at that time, why did it fail?

    anymore thoughts?

  5. #5
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    You are right, I did not. However, sleeping 5 sec after the mdsstop is not enough. How many CMAs do you have? mdsstop stops CMAs by 5, waiting for the next bunch awhile.

    If you want to do this in the most controlled manner, either stop CMA by CMA, with mdsstop -m at the end, while waiting or do not run it twice but rather check after a while if all processes are down before going into the next phase.

    It seems CPWD is still up, although you have stopped anything else. Best, as a trial, check if all MDS processes are down during the script run. May give you some better ideas on what's going on.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  6. #6
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by varera View Post
    You are right, I did not. However, sleeping 5 sec after the mdsstop is not enough. How many CMAs do you have? mdsstop stops CMAs by 5, waiting for the next bunch awhile.

    If you want to do this in the most controlled manner, either stop CMA by CMA, with mdsstop -m at the end, while waiting or do not run it twice but rather check after a while if all processes are down before going into the next phase.

    It seems CPWD is still up, although you have stopped anything else. Best, as a trial, check if all MDS processes are down during the script run. May give you some better ideas on what's going on.
    I only have a single CMA in the MDS. I've also increased sleeping to 40 seconds but it does not do any good.

    My point is that after the "mds_backup" is completed, all mds processes are down at that point right? pause for another 60 seconds after the mds_backup is finished (which I did) just does not make any sense.

    Btw, I have a single CMA on this MDS (manager+container). The server is a Dell R720 with 4 CPUs quad-cores (16 CPUs) total with 128GB of RAM and 1TB of RAID-5 15K RPM disk drives. That's pretty much rule out hardware issue.

    any more suggestions?

    Thank you.

  7. #7
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by cciesec2006 View Post
    I only have a single CMA in the MDS. I've also increased sleeping to 40 seconds but it does not do any good.

    My point is that after the "mds_backup" is completed, all mds processes are down at that point right? pause for another 60 seconds after the mds_backup is finished (which I did) just does not make any sense.

    Btw, I have a single CMA on this MDS (manager+container). The server is a Dell R720 with 4 CPUs quad-cores (16 CPUs) total with 128GB of RAM and 1TB of RAID-5 15K RPM disk drives. That's pretty much rule out hardware issue.

    any more suggestions?

    Thank you.
    CPWD is only stopped after any other process is down. It might hangs sometimes, so I have suggested taking a look.

    You can also try running mdsstart twice, although that is a bit dumb
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  8. #8
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by varera View Post
    CPWD is only stopped after any other process is down. It might hangs sometimes, so I have suggested taking a look.
    I am not following what you're saying for the following reasons:

    1- mds_backup will NOT run unless all mds processes and checkpoint processes are stopped, to my knowledge,

    2- the mdsstart was performed after the mds_backup was completed. At that point, all mds processes and checkpoint processes should not be even running because if it is, mds_backup should not be completed in the first place.

    anymore thoughts?

  9. #9
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    mds_backup does not monitor status of cpwd. mdsstart, however, does.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  10. #10
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by varera View Post
    mds_backup does not monitor status of cpwd. mdsstart, however, does.
    This is what I have in my backup script:

    1- perform mdsstop
    2- sleep 10
    3- perform mdsstop
    4- sleep 10
    5- perform mdsstop
    6- sleep 1
    7- perform mds_backup
    8- sleep 30
    9- perform mdsstart

    you would think that on the 2nd and 3rd mdsstop all mds process should be done right? Furthermore, if I go back in and perform a manual mds_backup, then it works. Shouldn't that be failing as well?

  11. #11
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by cciesec2006 View Post
    This is what I have in my backup script:

    1- perform mdsstop
    2- sleep 10
    3- perform mdsstop
    4- sleep 10
    5- perform mdsstop
    6- sleep 1
    7- perform mds_backup
    8- sleep 30
    9- perform mdsstart

    you would think that on the 2nd and 3rd mdsstop all mds process should be done right? Furthermore, if I go back in and perform a manual mds_backup, then it works. Shouldn't that be failing as well?
    I think running mdsstop three times is unnecessary, but sleep timer with 10 seconds is too low to finish all background works. i would put at least 5 minutes there. Also, if I have understood you correctly, it is not mds_backup that fails but mdsstart afterwards.

    please let me know if this assumption was incorrect
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  12. #12
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by varera View Post
    I think running mdsstop three times is unnecessary, but sleep timer with 10 seconds is too low to finish all background works. i would put at least 5 minutes there. Also, if I have understood you correctly, it is not mds_backup that fails but mdsstart afterwards.

    please let me know if this assumption was incorrect
    Your assumption is correct. it is the mdsstart that failed afterwards.

    However, that being said, If I do not run the backup script and manually run the run the following commands within 2 seconds after the previous command is completed, I experience no such issue:

    1- mdsstop
    2- mds_backup
    3- mdsstart

    wouldn't that disprove your theory that 10 seconds is too low?

  13. #13
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Good. So we have the following facts:

    1. mdsstop, md_backup, mdsstart commands are running just fine if done manually one after another.
    2. scripted sequence of these commands fails on mdsstart, complaining about CPWD being up.


    The only logical explanation is that it is the script which is faulty, right? If you do not want to pursue "sleep" times (and you may be right), try considering the environment that is incorrect.

    Try using at the beginning:
    -----
    #!/bin/bash
    source /opt/CPshrd-R77/tmp/.CPprofile.sh
    -----

    Also, does it fail when you are running it from cron or even if ran manually?
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  14. #14
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    [QUOTE=varera;95552]Good. So we have the following facts:

    1. mdsstop, md_backup, mdsstart commands are running just fine if done manually one after another.
    2. scripted sequence of these commands fails on mdsstart, complaining about CPWD being up.


    The only logical explanation is that it is the script which is faulty, right? If you do not want to pursue "sleep" times (and you may be right), try considering the environment that is incorrect.

    Try using at the beginning:
    -----
    #!/bin/bash
    source /opt/CPshrd-R77/tmp/.CPprofile.sh /QUOTE]

    I am not sure if you read my original post from beginning to the end, which has the following line at the beginning


    #!/bin/sh -x
    . /etc/profile.d/CP.sh

    cat /etc/profile.d/CP.sh
    if [ -r /opt/CPshrd-R77/tmp/.CPprofile.sh ]; then
    . /opt/CPshrd-R77/tmp/.CPprofile.sh
    fi

    Isn't that the same thing as what you proposed?

    Quote Originally Posted by varera View Post
    Also, does it fail when you are running it from cron or even if ran manually?
    When I run the script manually, it fails sometimes as well.

    - when run the script manually, failure rate is about 5% of the times,
    - when run in cron, failure rate is about 33% of the times,
    - when manually run mdsstop, mds_backup, mdsstart, failure rate is 0%

  15. #15
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    I did read it many times. you are doing:

    #!/bin/sh -x
    . /etc/profile.d/CP.sh


    I am suggesting

    #!/bin/bash
    source /opt/CPshrd-R77/tmp/.CPprofile.sh



    It might be that mdsstop fails to kill CPWD if ran from a different shell. You could check that by troubleshooting process while running script from cron. As said before, one needs to see why cpwd is not going down. I suggest creating a separate script, with just "mdsstop, sleep, mdsstart" and run a separate session to see what's going on. I think cpwd is getting jammed by "too fast to follow" sh commands.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  16. #16
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by varera View Post
    I did read it many times. you are doing:

    #!/bin/sh -x
    . /etc/profile.d/CP.sh


    I am suggesting

    #!/bin/bash
    source /opt/CPshrd-R77/tmp/.CPprofile.sh



    It might be that mdsstop fails to kill CPWD if ran from a different shell. You could check that by troubleshooting process while running script from cron. As said before, one needs to see why cpwd is not going down. I suggest creating a separate script, with just "mdsstop, sleep, mdsstart" and run a separate session to see what's going on. I think cpwd is getting jammed by "too fast to follow" sh commands.

    Update: I modified my script to comment out the "mds_backup" command. After I did that, I re-run the script 100 times with 100% success rate. this is what I commented out in the script:

    ## Perform backup
    #echo -e "\n==================\nBeginning mdsbackup\n==================\n" >> $LOG
    #echo y | $MDSDIR/scripts/mds_backup -b -d $BACKUPDIR/$FILENAME 2>> $LOG >> /dev/null
    #echo -e "\n==================\nCompleted mdsbackup\n==================\n" >> $LOG

    so I guess the issue is with mds_backup? Btw, I use the exact same script for Provider-1 R75.47 prior to upgrading to R77.30 with JHFA205. On the R75.47 I have 100% success rate.

    So my conclusion is another "bug" from Checkpoint?

    Thoughts?

  17. #17
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Okay, so mds_backup itself seems to be problematic.

    I have taken a look on it, it uses yet a different shell:

    #! /bin/csh -f

    Also, why are you doing mdsstop and mdsstart? mds_backup has an option to perform both as part of the procedure.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  18. #18
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Basically I suggest changing shell in the script to be in line with mds_backup internal settings and to use "-s" option to stop and start MDS processes inside mds_backup script.

    If you are concerned about logged admins, you can also do "-L all option" to log them off before running anything else.

    Let us know if this works for you.
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

  19. #19
    Join Date
    2006-09-26
    Posts
    2,958
    Rep Power
    13

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Quote Originally Posted by varera View Post
    Also, why are you doing mdsstop and mdsstart? mds_backup has an option to perform both as part of the procedure.
    Why not? by doing an mdsstop 3 times, I want to make sure that all mds processes are stopped before I go ahead with the mds_backup. Isn't that the logical thing to do?

    so the script works perfectly in R71.30 and R75.47 and it becomes an issue in R77.30, wouldn't that suggest a "bug"?

    Btw, I also test this on a Provider-1 R80 environment and the script works flawlessly too after running it 100% on P-1 R80, 100% success rate.

    my next step is to upgrade to JHFA216 and see what happen. I hate that because the minute I go to JHFA216, I am sure while it might solve this problem, it will cause other issues as well

  20. #20
    Join Date
    2006-03-08
    Location
    Lausanne
    Posts
    938
    Rep Power
    12

    Default Re: MDS failed to start after mds_backup in R77.30 with JHFA 205

    Okay, if the whole purpose of of the exercise was to say "it is a bug", I will not interfere. I have had an impression, erroneous as I see it now, that you wanted script to work.

    All the best
    -------------

    Valeri Loukine
    CCMA, CCSM, CCSI
    http://checkpoint-master-architect.blogspot.com/

Page 1 of 2 12 LastLast

Similar Threads

  1. R777.30 JHFA 205 failed installation
    By cciesec2006 in forum Miscellaneous
    Replies: 27
    Last Post: 2016-12-23, 07:29
  2. cpwmd and cphttpd failed to start
    By mirelaqssbh in forum Clustering (Security Gateway HA and ClusterXL)
    Replies: 0
    Last Post: 2011-07-01, 14:47
  3. SecureClient failed to start due to an internal error
    By RRunner316 in forum SecureClient/SecuRemote
    Replies: 14
    Last Post: 2008-12-09, 23:22
  4. mds_backup failed
    By cciesec2006 in forum Provider-1 (Multi-Domain Management)
    Replies: 8
    Last Post: 2008-02-22, 09:14
  5. fwm failed to start (coredump)
    By jdesroches in forum Security Management Server (Formerly SmartCenter Server ((Formerly Management Server))
    Replies: 0
    Last Post: 2008-01-25, 09:23

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •