Re: System Crashed when using Mondo Backup

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: JD Austin
Date:  
To: plug-discuss
Subject: Re: System Crashed when using Mondo Backup
Bill Wesson wrote:

>Bill Wesson wrote:
>
>
>
>>I run a nightly backup at 2AM of our Fedora Core 1 server using Mondo.
>>About once every two weeks the system will freeze up. It won't process
>>email, present web pages, and I can't log into it using SSH. The
>>console screen is blank. We have to cold start server.
>>
>>The easiest suggestion is to stop using Mondo, but I like the fast
>>recovery Mondo offers.
>>
>>So we're trying to figure out how to diagnose this problem and where
>>to start. Memory, Hard Drive (is new), IDE controller, etc?
>>
>>Does anyone have any ideas where to start?
>>
>>Thanks,
>>
>>--Bill Wesson
>>
>>This is my script to run Mondo daily at 2AM:
>>
>>mkdir -p /home/mondo/`date +%A`
>>
>>mondoarchive -Oi -d /home/mondo/`date +%A` -E "/home/mondo"
>>
>>Log files don't provide any clues.
>>
>>CRON log for Sunday-2AM
>>
>>Oct 24 01:50:00 payson CROND[24793]: (root) CMD (/usr/local/bin/weblogs)
>>
>>Oct 24 02:00:00 payson CROND[25052]: (root) CMD (run-parts
>>/etc/cron.daily-2am)
>>
>>Oct 24 02:00:00 payson CROND[25056]: (root) CMD (nice --adjustment=15
>>/usr/local/sbin/update_site_summary_cache)
>>
>>Oct 24 02:00:00 payson CROND[25054]: (root) CMD (/usr/local/bin/weblogs)
>>
>>Oct 24 02:01:00 payson CROND[25961]: (root) CMD (run-parts
>>/etc/cron.hourly)
>>
>>Oct 24 02:10:00 payson CROND[7866]: (root) CMD (/usr/local/bin/weblogs)
>>
>>CRON log for Monday-2AM
>>
>>Oct 25 01:50:00 payson CROND[29959]: (root) CMD (/usr/local/bin/weblogs)
>>
>>Oct 25 02:00:00 payson CROND[30059]: (root) CMD (run-parts
>>/etc/cron.daily-2am)
>>
>>Oct 25 02:00:00 payson CROND[30063]: (root) CMD (nice --adjustment=15
>>/usr/local/sbin/update_site_summary_cache)
>>
>>Oct 25 02:00:00 payson CROND[30061]: (root) CMD (/usr/local/bin/weblogs)
>>
>>Oct 25 02:01:00 payson CROND[30953]: (root) CMD (run-parts
>>/etc/cron.hourly)
>>
>>Oct 25 07:28:59 payson crond[2982]: (CRON) STARTUP (fork ok)
>>
>>MESSAGES log for Monday-2AM
>>
>>Oct 25 01:50:08 payson logger: weblogs: (29959) done.
>>
>>Oct 25 02:00:00 payson logger: weblogs: (30061) starting.
>>
>>Oct 25 02:00:01 payson autofs: automount shutdown succeeded
>>
>>Oct 25 02:00:09 payson logger: weblogs: (30061) done.
>>
>>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com'
>>(in 'dogg.c
>>
>>om'?): 68.157.222.155#53
>>
>>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com'
>>(in 'dogg.c
>>
>>om'?): 207.65.0.25#53
>>
>>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com'
>>(in 'dogg.c
>>
>>om'?): 68.157.222.155#53
>>
>>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com'
>>(in 'dogg.c
>>
>>om'?): 207.65.0.25#53
>>
>>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com'
>>(in 'dogg.c
>>
>>om'?): 68.157.222.155#53
>>
>>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com'
>>(in 'dogg.c
>>
>>om'?): 207.65.0.25#53
>>
>>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com'
>>(in 'dogg.c
>>
>>om'?): 68.157.222.155#53
>>
>>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com'
>>(in 'dogg.c
>>
>>om'?): 207.65.0.25#53
>>
>>Oct 25 07:27:44 payson syslogd 1.4.1: restart.
>>
>>1AM - Monday
>>
>>total used free shared buffers cached
>>
>>Mem: 773208 713100 60108 0 66540 522432
>>
>>-/+ buffers/cache: 124128 649080
>>
>>Swap: 2048248 90048 1958200
>>
>>2AM - Monday
>>
>>total used free shared buffers cached
>>
>>Mem: 773208 768768 4440 0 102908 491276
>>
>>-/+ buffers/cache: 174584 598624
>>
>>Swap: 2048248 90048 1958200
>>
>>Thanks,
>>
>>Bill Wesson, Network Administrator
>>
>>*Vision Engraving Systems*
>>
>>http://www.visionengravers.com
>>
>>17621 N. Black Canyon Hwy
>>
>>Phoenix, AZ 85023
>>
>>602-439-0600
>>
>>
>>
>I've seen that sort of thing happen when a machine goes I/O or CPU bound.
>I would suggest actually logging in via ssh and having a few xterms
>going with top, free, etc so that you can see what is going on.
>Are things actively running when you're backing it up? Contention for
>the same resource can cause issues like this.
>Look for memory leaks (procesess that grow and grow over time) too.
>
>JD
>
>
>
>
>

Set up cron to take a top measurement every 5 minutes or so while the
backup is running.
0,5,10,15,20,25,30,35,40,45,50,55 2,3,4,5 * * * top -b -n 1 >>
/var/log/backup.log

You should be able to figure out what is going on.

JD

--
JD Austin
Twin Geckos Technology Services LLC
email:
http://www.twingeckos.com
phone/fax: 480.344.2640

---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change you mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss