ubuntu + bind slave = nutty

Wed Aug 26 21:15:04 MST 2009

Hi Lisa,

First they're only internal, so not worried about unusual hacking on
them.  Myself and the wife are the only users on the network.  No
changes what so ever across my chroot - I validated nothing got deleted,
though I didn't run CRC's since literally i just duplicated the vmdk to
another guest system.  Doing so, I actually cloned the disk, redid
hostname, hosts, and networking to rehost the server, got bind fully
functional, cloned again, and rebooted both instances.  Both failed in
exactly the same way.  This also happened to the original clone, after
setup, after a normal reboot of the guest

No weird config files showing up. 

RNDC worked prior to my reboots, I was using it to force a zone xfer to
test.

There's no forwarding, only recursion from my internal subnets only.

The file definitely exists, set 664 for appropriate user/group.  I gave
bind a shell and su'd to it and perused all the directories to make sure
I could read/write to the right portions of the fs.

As far as I know, bind should launch regardless if there's an RNDC key,
it's mostly for the external control of the daemon.  I remember having
it working at one point when the key was gone by mistake, and couldn't
rndc reload the zones.

It's really quite odd that I only have this issue with the slave, and
not the master.  I have the relevant slave directory where it creates
the xfer'd zones writeable, and relevant dev's only, but the rest is
entirely readable or owned by bind.  I spent some time messing with the
apparmor profile as they don't accommodate chroots by default, but can't
find anything else in strace that it's trying to reference and cannot.
It's fairly clean until the point it pukes about what the binary
stdout's anyways in non-forked.

I really had enough short of throwing the monitor out the window today,
so I'm going to pick up with it tomorrow and look at
purging/reinstalling the binaries to validate that, and might just
remove apparmor all together.  I have a sneaking suspicion it's doing
something it shouldn't.  This was working prior to my rebuilding them as
ibex servers vs. an old gutsy or even feisty, prior to apparmor
inclusion.  It's the only big thing I can see that might be screwing
with it.

Thanks for the input, I'll look at it some more from your perspective
and see what I can see in the morning.

-mb

On Wed, 2009-08-26 at 16:46 -0700, Lisa Kachold wrote:
> Hi Michael,
> 
> I have seen a good many hacked bind servers and various known things
> happen to them:
> 
> 1) something strange changes chroot?
> 2) configuration files mysterious appear with ALT255 ascii characters
> in front of localhost entries, etc.
> 3) rndc key permissions are opened so anyone can control the server,
> when not completely firewalled.
> 4) when recursion and forwarding are misconfigured, cache poisoning is
> rampant.
> 
> In any case YOUR bind error is describing FIRST inability to find the
> /etc/bind/named.conf file.  Does it exist?
> 
> Following bind to socket() issues is due to the failure to load a
> perfectly acceptable named.conf file that calls rndc key, etc. I
> believe?
> 
> But run a crc check against the binary, blow away the package and reinstall it.
> 
> BE sure your configuration files (not using a db?) are intact...
> 
> On Wed, Aug 26, 2009 at 2:23 PM, Michael Butash<michael at butash.net> wrote:
> > I'm curious if anyone's seen anything nutty like this before...
> >
> > So I'm migrating my dns instances between boxes when I noticed my
> > secondary dns server isn't starting bind anymore.  Primary still works
> > fine, no issues.  Debugging gets me this error:
> >
> > user at dns03:~$ sudo named -u bind -t /var/lib/bind -g
> > 26-Aug-2009 21:01:33.568 starting BIND 9.5.0-P2 -u bind -t /var/lib/bind
> > -g
> > 26-Aug-2009 21:01:33.569 found 1 CPU, using 1 worker thread
> > 26-Aug-2009 21:01:33.575 loading configuration from
> > '/etc/bind/named.conf'
> > 26-Aug-2009 21:01:33.575 none:0: open: /etc/bind/named.conf: file not
> > found
> > 26-Aug-2009 21:01:33.587 net.c:80: unexpected error:
> > 26-Aug-2009 21:01:33.587 socket() failed: Permission denied
> > 26-Aug-2009 21:01:33.588 net.c:80: unexpected error:
> > 26-Aug-2009 21:01:33.588 socket() failed: Permission denied
> > 26-Aug-2009 21:01:33.588 loading configuration: file not found
> > 26-Aug-2009 21:01:33.589 exiting (due to fatal error)
> >
> > After futzing with this for several hours, I give up, clone the primary,
> > migrate the slave config files, and get it working again.  Happy it's
> > working, I reboot it, migrate the instance again, and I get the same
> > damn errors.  I can find _nothing_ related to an error like this
> > anywhere on google, and even strace-ing it shows me nothing other than
> > for some awful reason it now doesn't seem to think an ethernet interface
> > exists.
> >
> > Any ideas why a functional slave bind server would "lose" it's
> > capability of binding to an ethernet interface after a reboot?  As far
> > as I can tell, this is the only thing that seems to be holding it up.
> > This is the most frustrating and asinine thing I've seen ubuntu do in a
> > while, pretty much killing my entire day thus far...
> >
> > I've checked apparmor, permissions (all files readable fine by user),
> > named.conf allowing "any" bind interfaces, and again, it was working
> > perfectly before a reboot.  This is entirely reproducible as well as
> > apparently I just flipping did.  Ugh.
> >
> > I do know about djbdns and rdns being "better", I'd just rather not have
> > to waste a few days when bind has and does always suite my needs just
> > fine.
> >
> > -mb
> >
> > ---------------------------------------------------
> > PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> > To subscribe, unsubscribe, or to change your mail settings:
> > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
> >
> 
> 
>