Bill Wesson wrote: >Bill Wesson wrote: > > > >>I run a nightly backup at 2AM of our Fedora Core 1 server using Mondo. >>About once every two weeks the system will freeze up. It won't process >>email, present web pages, and I can't log into it using SSH. The >>console screen is blank. We have to cold start server. >> >>The easiest suggestion is to stop using Mondo, but I like the fast >>recovery Mondo offers. >> >>So we're trying to figure out how to diagnose this problem and where >>to start. Memory, Hard Drive (is new), IDE controller, etc? >> >>Does anyone have any ideas where to start? >> >>Thanks, >> >>--Bill Wesson >> >>This is my script to run Mondo daily at 2AM: >> >>mkdir -p /home/mondo/`date +%A` >> >>mondoarchive -Oi -d /home/mondo/`date +%A` -E "/home/mondo" >> >>Log files don't provide any clues. >> >>CRON log for Sunday-2AM >> >>Oct 24 01:50:00 payson CROND[24793]: (root) CMD (/usr/local/bin/weblogs) >> >>Oct 24 02:00:00 payson CROND[25052]: (root) CMD (run-parts >>/etc/cron.daily-2am) >> >>Oct 24 02:00:00 payson CROND[25056]: (root) CMD (nice --adjustment=15 >>/usr/local/sbin/update_site_summary_cache) >> >>Oct 24 02:00:00 payson CROND[25054]: (root) CMD (/usr/local/bin/weblogs) >> >>Oct 24 02:01:00 payson CROND[25961]: (root) CMD (run-parts >>/etc/cron.hourly) >> >>Oct 24 02:10:00 payson CROND[7866]: (root) CMD (/usr/local/bin/weblogs) >> >>CRON log for Monday-2AM >> >>Oct 25 01:50:00 payson CROND[29959]: (root) CMD (/usr/local/bin/weblogs) >> >>Oct 25 02:00:00 payson CROND[30059]: (root) CMD (run-parts >>/etc/cron.daily-2am) >> >>Oct 25 02:00:00 payson CROND[30063]: (root) CMD (nice --adjustment=15 >>/usr/local/sbin/update_site_summary_cache) >> >>Oct 25 02:00:00 payson CROND[30061]: (root) CMD (/usr/local/bin/weblogs) >> >>Oct 25 02:01:00 payson CROND[30953]: (root) CMD (run-parts >>/etc/cron.hourly) >> >>Oct 25 07:28:59 payson crond[2982]: (CRON) STARTUP (fork ok) >> >>MESSAGES log for Monday-2AM >> >>Oct 25 01:50:08 payson logger: weblogs: (29959) done. >> >>Oct 25 02:00:00 payson logger: weblogs: (30061) starting. >> >>Oct 25 02:00:01 payson autofs: automount shutdown succeeded >> >>Oct 25 02:00:09 payson logger: weblogs: (30061) done. >> >>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com' >>(in 'dogg.c >> >>om'?): 68.157.222.155#53 >> >>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com' >>(in 'dogg.c >> >>om'?): 207.65.0.25#53 >> >>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com' >>(in 'dogg.c >> >>om'?): 68.157.222.155#53 >> >>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com' >>(in 'dogg.c >> >>om'?): 207.65.0.25#53 >> >>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com' >>(in 'dogg.c >> >>om'?): 68.157.222.155#53 >> >>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com' >>(in 'dogg.c >> >>om'?): 207.65.0.25#53 >> >>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com' >>(in 'dogg.c >> >>om'?): 68.157.222.155#53 >> >>Oct 25 02:04:03 payson named[1736]: lame server resolving 'dogg.com' >>(in 'dogg.c >> >>om'?): 207.65.0.25#53 >> >>Oct 25 07:27:44 payson syslogd 1.4.1: restart. >> >>1AM - Monday >> >>total used free shared buffers cached >> >>Mem: 773208 713100 60108 0 66540 522432 >> >>-/+ buffers/cache: 124128 649080 >> >>Swap: 2048248 90048 1958200 >> >>2AM - Monday >> >>total used free shared buffers cached >> >>Mem: 773208 768768 4440 0 102908 491276 >> >>-/+ buffers/cache: 174584 598624 >> >>Swap: 2048248 90048 1958200 >> >>Thanks, >> >>Bill Wesson, Network Administrator >> >>*Vision Engraving Systems* >> >>http://www.visionengravers.com >> >>17621 N. Black Canyon Hwy >> >>Phoenix, AZ 85023 >> >>602-439-0600 >> >> >> >I've seen that sort of thing happen when a machine goes I/O or CPU bound. >I would suggest actually logging in via ssh and having a few xterms >going with top, free, etc so that you can see what is going on. >Are things actively running when you're backing it up? Contention for >the same resource can cause issues like this. >Look for memory leaks (procesess that grow and grow over time) too. > >JD > > > > > Set up cron to take a top measurement every 5 minutes or so while the backup is running. 0,5,10,15,20,25,30,35,40,45,50,55 2,3,4,5 * * * top -b -n 1 >> /var/log/backup.log You should be able to figure out what is going on. JD -- JD Austin Twin Geckos Technology Services LLC email: jd@twingeckos.com http://www.twingeckos.com phone/fax: 480.344.2640 --------------------------------------------------- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change you mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss