Discussion:
Jail stuck in dying
(too old to reply)
Kristof Provost
2017-03-09 02:01:48 UTC
Permalink
Raw Message
Hi,

On a current box (r314933) I can’t seem to stop my jails.

It’s started like this:

jail -c name=test0 host.hostname=test vnet persist
vnet.interface=epair0b \
path=/usr/jails/jail1 exec.start="/bin/sh /etc/rc"

I terminate it with `jail -R test0`, yet the jail stays stuck in dying
state:

$ jls -a
JID IP Address Hostname Path
1 test /usr/jails/jail1

$ jls -na
devfs_ruleset=0 dying enforce_statfs=2 host=new ip4=inherit ip6=inherit
jid=1 name=test0 osreldate=1200023 osrelease=12.0-CURRENT parent=0
path=/usr/jails/jail1 nopersist securelevel=-1 sysvmsg=disable
sysvsem=disable sysvshm=disable vnet=new allow.nochflags allow.nomount
allow.mount.nodevfs allow.mount.nofdescfs allow.mount.nolinprocfs
allow.mount.nolinsysfs allow.mount.nonullfs allow.mount.noprocfs
allow.mount.notmpfs allow.mount.nozfs allow.noquotas allow.noraw_sockets
allow.set_hostname allow.nosocket_af allow.nosysvipc children.cur=0
children.max=0 cpuset.id=2 host.domainname="" host.hostid=0
host.hostname=test host.hostuuid=00000000-0000-0000-0000-000000000000
ip4.addr= ip4.saddrsel ip6.addr= ip6.saddrsel

I’ve tried debugging this, but the most I can say is that there
appears to be something wrong with the reference counting.
prison_deref() returns because pr->pr_ref is 3. There are no more jailed
processes running, so I have no idea why this happens.

The problem doesn’t appear to be related to vnet. I can reproduce it
without setting the vnet flag when creating the jail as well.

Regards,
Kristof
Dewayne Geraghty
2017-03-09 03:21:52 UTC
Permalink
Raw Message
Kristof, would you share your devfs.rules settings for the jail that
remains in a dying state.

I have a similar situation and similar jail.conf settings, in that I use:
allow.sysvipc; allow.socket_af; allow.raw_sockets;

However I have other jails that stop and can be restarted with the same
attributes; but different device rules.
So what's different? For the jail that gets stuck in state "dying"
devfs.rules:
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add path 'bpf' unhide
add path 'bpf0' unhide

As a side note: in my jail.conf I also have jid=3, ie
b3 { jid=3; ip4.addr = "10.0.7.91,10.0.5.91,127.1.5.91"; devfs_ruleset =
"7"; allow.sysvipc; allow.socket_af; allow.raw_sockets; }
(Yes its an application testing jail)

Regards, Dewayne.
Kristof Provost
2017-03-09 04:37:33 UTC
Permalink
Raw Message
Post by Dewayne Geraghty
Kristof, would you share your devfs.rules settings for the jail that
remains in a dying state.
My devfs.rules is entirely standard.
http://people.freebsd.org/~kp/devfs.rules

Regards,
Kristof
James Gritton
2017-03-09 14:51:58 UTC
Permalink
Raw Message
Post by Kristof Provost
Hi,
On a current box (r314933) I can’t seem to stop my jails.
jail -c name=test0 host.hostname=test vnet persist
vnet.interface=epair0b \
path=/usr/jails/jail1 exec.start="/bin/sh /etc/rc"
I terminate it with `jail -R test0`, yet the jail stays stuck in dying
$ jls -a
JID IP Address Hostname Path
1 test /usr/jails/jail1
$ jls -na
devfs_ruleset=0 dying enforce_statfs=2 host=new ip4=inherit
ip6=inherit jid=1 name=test0 osreldate=1200023 osrelease=12.0-CURRENT
parent=0 path=/usr/jails/jail1 nopersist securelevel=-1
sysvmsg=disable sysvsem=disable sysvshm=disable vnet=new
allow.nochflags allow.nomount allow.mount.nodevfs
allow.mount.nofdescfs allow.mount.nolinprocfs allow.mount.nolinsysfs
allow.mount.nonullfs allow.mount.noprocfs allow.mount.notmpfs
allow.mount.nozfs allow.noquotas allow.noraw_sockets
allow.set_hostname allow.nosocket_af allow.nosysvipc children.cur=0
children.max=0 cpuset.id=2 host.domainname="" host.hostid=0
host.hostname=test host.hostuuid=00000000-0000-0000-0000-000000000000
ip4.addr= ip4.saddrsel ip6.addr= ip6.saddrsel
I’ve tried debugging this, but the most I can say is that there
appears to be something wrong with the reference counting.
prison_deref() returns because pr->pr_ref is 3. There are no more
jailed processes running, so I have no idea why this happens.
The problem doesn’t appear to be related to vnet. I can reproduce it
without setting the vnet flag when creating the jail as well.
It's never about processes - dying jails are those that have no
processes but still have other references to them in the kernel. Those
"other references" are almost always through the cred system: crcopy()
calls prison_hold() to increase the associated prison's pf_ref, and
crfree() calls prison_free().

The hard part is tracking down just what might be holding such a
credential; there is nothing that points from a prison to its associated
creds. But it's almost always associated with the network stack. In
normal operation, a jail may stick around for a little while in the
dying state until its just-closed TCP connections time out. I've heard
report of NFS mounts associated with jails sometimes causing such
references that don't go away. There are of course many other places
that use creds, but most of them are in some way associated with
processes.

As long as the problem only manifests as jails in the dying list, the
easiest solution is to ignore it. As long as you don't have hard-coded
JIDs in jail.conf, you can re-create the jail and it will get a
different JID. The need for a particular JID isn't what it used to be,
and chances are you can get away with dynamically numbered jails. If
you want to track down what's holding the jail half-alive, it becomes a
matter of find out exactly what (probably network) resources a jail is
using.

- Jamie

Loading...