On Thu, Sep 2, 2010 at 8:03 PM, Simon Chatfield wrote: > > Ok, I've got a doozy of an issue which has happened twice this week and is > absolutely crushing to my clients who are in busy season right about now. > Here's the issue... > > I have a beefy linux database server which runs both postgres and mysql. We > just recently loaded mysql and putting it under significant load. > > Apperantly at random, twice the week (Monday and this evening) it appears to > take the network down save for a single machine which we are still able to > ssh into. There are 6 other boxes which we cannot ssh into when this occurs. > Link light activity does appear to still be active on the network. The > method for solving the problem has been to hard reboot this specific server > and as soon as it goes down, we can access the other boxes via ssh and they > start working again. When the box comes back up, we can then ssh into that > machine and everything is good (until it happens again that is). After the > reboot, there isn't much in the logs, but I see the log entry for the tech > unplugging and plugging in the computer from the switch PRIOR to the reboot > so the network link was detected and logged even though it was not > responding to ssh. > > These machines are hosted down at i/o so a hardboot is causing us > significant time to get a tech to handle it. > > Has anyone ever heard of a single linux box bringing down 'most' of a > network? then reboot and the other boxes are then accessible? > > My client is at his whits end, and I don't blame him. However, I'm not even > sure what kind of problem this is. hardware on that box? system > configuration? a bad switch? > > Looking for ideas at least, and if someone has time and ability, I'd love to > have someone on-site to help debug and fix this issue... > > Thanks everyone! > > -- > Simon Chatfield > Are the 6 machines actually crashing or are they loosing their routing tables - if the systems just got un-networked, see if/how the routing tables change - OTOH do you have avahi running? look for 169.254/16 IP addresses in the wrong place. --------------------------------------------------- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss