/srv/irclogs.ubuntu.com/2008/07/24/#ubuntu-installer.txt

davmor2Guys ubiquity is freeze out at 96% removing gparted10:48
tjaaltonwhat could have broken hardy netboot a week ago? Now I get "No root file system defined" error, but the preseed files haven't been changed..10:49
tjaaltonlast successfull installation was on Mon 14th10:49
tjaaltonpartman says "no matching physical volumes found"10:50
tjaaltonand then "no volume groups found"10:51
cjwatsondavmor2: likely bug 25122310:52
cjwatsontjaalton: the log bits you've quoted so far are irrelevant; it would be much better to get the full logs10:52
davmor2checks10:53
cjwatsondavmor2: can you try with union=unionfs on the command line?10:53
tjaaltoncjwatson: ok, a sec10:53
davmor2cjwatson: What sorry?10:55
tjaaltoncjwatson: http://users.tkk.fi/~tjaalton/foo/syslog10:55
tjaaltoncjwatson: and /partman10:55
tjaaltonoops10:55
cjwatsondavmor2: was that unclear?10:55
tjaaltonpermissions fixed10:56
davmor2I just type union=unionfs in the command line or add it to the ubiquity command or what?10:56
cjwatson*kernel* command line, sorry10:57
cjwatsonas in, press F6 at the CD boot loader and add union=unionfs to the end10:57
davmor2cjwatson: okay no probs 2 ticks10:57
cjwatsonif that works, I am going to point and laugh at all the people who told me vociferously that aufs was the bee's knees10:57
cjwatsontjaalton: hmm, shame I can't see what it's 404ing on10:58
tjaaltoncjwatson: those should be harmless, visible on a valid installation too10:58
cjwatsongiven that there are no other relevant errors ...10:59
cjwatsoncan I have the preseed file too?10:59
cjwatsonblink, where did nic-restricted-firmware-2.6.24-19-generic-di go?10:59
tjaaltonhttp://pastebin.ubuntu.com/29907/10:59
tjaaltonthat's from an old installation11:00
cjwatsonah, never mind, there it is11:00
tjaaltonhttp://users.tkk.fi/~tjaalton/foo/preseed-log11:02
davmor2cjwatson: drops into initramfs busybox11:03
davmor2cjwatson: would it help to have a ubiquity --debug report?11:03
tjaaltoncjwatson: is there a way to see what packages have been accepted to -updates after Mon 14th?11:04
cjwatsondavmor2: no, it isn't a ubiquity problem11:05
cjwatsondavmor2: the kernel is crashing11:06
davmor2:(11:06
cjwatsondavmor2: mdz already got debug logs - I can't guarantee it's at the same point but it sounds like the same thing11:06
cjwatsondavmor2: standard advice: if the whole system freezes, it ain't ubiquity's fault11:06
cjwatsontjaalton: err, not easily11:06
davmor2it ain't ubiquity then :)11:06
cjwatsontjaalton: give me a while, I'm doing several things at once here and my attention is fractured11:07
tjaaltoncjwatson: ok :)11:07
cjwatsontjaalton: a log with DEBCONF_DEBUG=developer on the kernel command line would probably help11:07
tjaaltoncjwatson: ok, trying that11:08
cjwatsontjaalton: partman.recipe.hardy would be useful too11:08
tjaaltoncjwatson: added11:09
tjaaltoncjwatson: http://users.tkk.fi/~tjaalton/foo/syslog-debug11:19
cjwatsontjaalton: one thing I notice is that lvmok{ } should be $lvmok{ }11:35
tjaaltoncjwatson: ok, I'll try changing that11:36
cjwatsonargh, I think somebody did what I told them not to do with hardy-proposed11:37
cjwatsonoh, hm, no, -updates. WTF?11:37
tjaaltoncjwatson: adding $ didn't help :/11:39
cjwatsonyeah, I think what's happened is that linux got removed from hardy-updates for some reason, and is only in -security11:39
tjaaltonah...11:39
cjwatsontheoretically that ought not to break things, but ...11:39
tjaaltonhmm, I'll check our mirror11:41
cjwatsonit's not the fault of your mirror11:41
cjwatson(I don't think)11:41
tjaaltonmaybe because we didn't mirror hardy-security main/debian-installer because it failed to mirror before (using apt-mirror)11:43
mons88Please I could use some help, I've finally succeeded in getting my local net install with a preseed option, using hardy-proposed installer with the alternate CD(i386 btw).11:43
mons88Unfortunately after reboot the keyboard comes up with diamonds and I can't login.11:43
tjaaltonhmm, mirroring those didn't help11:45
cjwatsonmons88: ok, I need to see your preseed file11:47
cjwatsonmons88: (remove any passwords from it)11:47
cjwatson(sorry, I think we had this conversation before but as you could probably tell I wasn't very awake then)11:48
mdzcjwatson: I was poised to test unionfs=unionfs when I saw your note that it's gone :-/11:48
mdzcjwatson: had words with the kernel team yet?11:49
mons88sure, hope your feeling better, just a few moments while I connect remotely and I'll paste the file online11:49
cjwatsonmdz: I haven't been able to raise them11:49
cjwatsonmdz: ogasawara and slangasek brought up the bug last night; the only response as far as I could see was rtg saying that BenC should look at it11:50
mdzcjwatson: I made sure pgraner was aware of it yesterday as well11:50
* cjwatson has not typically had good experiences with blocking kernel bugs discovered the day before a scheduled release11:52
tjaaltoncjwatson: so, do you have an idea what's going on with the archive, and do you need anything from me?-)11:56
cjwatsonI think I've fixed it11:57
cjwatsonbut you won't be able to tell for at least an hour (assuming you're using the master archive not a mirror)11:58
tjaaltoncjwatson: ok, I've also noticed that I didn't mirror the amd64 udebs from -security, so I'll try once again11:59
tjaaltonjust to be sure11:59
cjwatsonthat certainly wouldn't help12:01
cjwatsonbut anyway, I've copied linux back to -updates (I believe it was removed "temporarily" to hack around an overrides problem in -security about a week ago, which fits), so give it an hour12:01
tjaaltonright.. mirroring those bits did help12:01
tjaaltonso, maybe have stub files for debian-installer also in -security in the future?12:02
cjwatsonthey're already there!12:03
cjwatsonno need for them to be stubs, they have actual content12:03
tjaaltonnow they have, but not when I started mirroring hardy ;)12:04
tjaaltonif I'd mirror intrepid now (with apt-mirror), it would fail because intrepid-security does not have main/debian-installer12:05
cjwatsonhmm, yeah, there are placeholders for main/binary-i386 but not main/debian-installer/binary-i38612:06
tjaaltonanyway, thanks for tracking it down! I can go back on vacation..12:06
cjwatsonfeel free to file a Soyuz bug12:06
cjwatson(I don't control that stuff directly)12:06
tjaaltonok, I'll do12:06
mons88cjwatson: posted my preseed @ http://pastebin.com/m4f4bece8, cheers12:08
tjaaltoncjwatson: bug 25145412:10
tjaaltonhm, no ubotu12:11
mons88is that the same/similar to bug, 18849212:11
tjaaltonmons88: mine? no: https://bugs.edge.launchpad.net/soyuz/+bug/25145412:12
cjwatsonno12:12
cjwatson#12:13
cjwatsond-i partman-auto/disk string /dev/?da12:13
cjwatsonWTF12:13
cjwatsonif that works it's by dumb luck12:13
cjwatsond-i preseed/early_command string anna-install common-console12:13
cjwatsonremove that; you've spelled it wrong (it would be console-common) and even if that worked you don't want it12:13
cjwatsond-i debian-installer/consoledisplay string kbd=lat0-sun16(utf8)12:14
cjwatsonI would advise removing that, though I don't know if it will have negative effects12:14
cjwatsonmons88: can you boot in recovery mode from the grub menu, and fish out /etc/default/console-setup for me?12:14
mons88okay but i had the issue before adding common-console, string kbd=lat0-sun16(utf8) comes from http://www.ubuntu-forum.net/showthread.php?p=5120670 which apparently fixes it12:16
mons88I'll try recovery mode now12:16
cjwatsonyeeeeees, sort of cargo-cult debugging going on in that forums post12:19
cjwatsonnamely, applying solutions from four releases back12:19
mons88root password at maintenance prompt isn't accepted :(12:22
cjwatsonrecovery mode shouldn't involve a maintenance prompt, IIRC12:22
cjwatsonwell, failing that, select your normal boot prompt, press 'e', add 'init=/bin/sh' to the end, and boot12:23
cjwatsonhmm, yeah, recovery mode of course *does* involve a maintenance prompt if you set a root password, sorry12:23
mons88still seeing diamonds12:27
mons88looks like going to a later version of hardy-proposed isn't a good idea, Bug 251344 !12:34
mons88would it be an idea to try a later, or much earlier version of the alternate iso?12:53
cjwatsonmons88: please be patient - I'm investigating an intrepid alpha 3 blocker at the moment12:55
cjwatsonmons88: I will get back to you later on12:55
cjwatsonbut I can't do everything at once12:55
mons88cool, npbs12:56
cjwatsonmons88: you mention hardy-proposed - does that mean you're using it?12:56
=== davmor2 is now known as davmor2_lunch
mons88yep, kept getting an error about downloading a file from the mirror with the normal hardy net install stuff, so hardy-proposed worked beautifully12:57
cjwatsonmons88: that should have been fixed some time ago12:57
cjwatsonmons88: grab the new net install image from /dists/hardy/ (we copied it over) and stop using -proposed12:57
cjwatsonworth a try anyway12:58
mons88okay will do12:58
cjwatsonmdz: ok, you were right, if I tell ubiquity not to remove gparted then the same thing just happens in libntfs10.postrm13:02
mdzcjwatson: is that the very next one?13:03
cjwatsonnot quite, it got through jfsutils and ntfsprogs, both of which I think were later13:04
mdzthose sound familiar, I think they might have been first13:05
cjwatsonI'd have expected libntfs10 to be removed right after ntfsprogs, I think13:07
cjwatsonlibntfs10 has a postrm, the others don't (nor does lupin-casper), but casper does13:09
cjwatsonso it seems to be consistently the second postrm, so far13:09
mdz(how many times does hw-detect need to be run?)13:15
cjwatsonI thought it was only run once13:16
mdzcjwatson: I've been stepping ubiquity along and it seems to be run more than once13:31
mdznot worrying about it right now though13:31
* cjwatson tries commenting out remove_extras13:34
=== davmor2_lunch is now known as davmor2
cjwatsonmdz: not that I'm one to talk, but your screen is filthy13:40
cjwatsonhttp://people.ubuntu.com/~mdz/251223/P7230016.JPG13:40
cjwatsonnone of the photos look anything like root causes though13:42
cjwatsonaha, you have a call trace that I don't, I missed that13:42
cjwatsonor at least one with symbols13:42
cjwatsonI'm very confused as to why mtd is involved13:44
mdzcjwatson: it looks fine in normal lighting; the flash reflects off the dust13:44
mdzcjwatson: me too13:44
cjwatsonmaybe that is a red herring13:44
cjwatsonit's almost as if there's a superblock refcounting bug in aufs13:45
cjwatsonsys_getcwd is something I can imagine the shell doing randomly13:45
mdzcjwatson: it's not hw-detect that gets run multiple times, it's update-dev13:47
mdzcalled by both clock-setup and hw-detect13:47
mdzand I think other places, judging by how much hal spam I see in syslog13:47
mdzcjwatson: there is a *lot* of mounting and unmounting which happens under /target13:47
cjwatsonupdate-dev is needed as a checkpoint in a few places to wait for devices to appear13:48
cjwatson/proc and /sys are mounted while doing certain things, so that's expected13:48
cjwatsonit's much simpler to just mount/umount as needed rather than try to keep track of when they're needed13:48
mdzcjwatson: yes, I'm just pointing out that the bug could be tickled by all that mount churn13:48
cjwatson(IIRC, there was some udeb code that expected those filesystems to be unmounted)13:49
cjwatsoncommenting out remove_extras avoids the problem (though of course creates others)13:50
cjwatsonmdz: just out of interest, how large is your disk image?13:51
cjwatsonI just noticed mine is full13:51
mdzcjwatson: 6GB13:51
cjwatsonah, mine's 2.5, so not that13:51
cjwatson(I'm low on disk at the moment)13:52
mdz2.3G used13:52
mdzcjwatson: I have an strace -f of the dpkg process13:52
mdzhttp://people.ubuntu.com/~mdz/251223/strace.gz13:52
cjwatsondoesn't get very far into the postrm, does it?13:53
cjwatsonto put it mildly13:54
mdzcjwatson: oh, dpkg is running outside of the chroot.  I didn't realize that13:54
cjwatsonyes, with --root13:54
cjwatsonmdz: where is the sleep 36000 || ... bit coming from in casper.postrm? did you add that?13:57
mdzcjwatson: yes13:57
mdzeasier than trying to catch it at the right time13:57
mdzpgraner: hi13:57
pgranermdz: hey13:58
mdzpgraner: I've updated the bug in LP with the latest14:00
pgranermdz: ok, just got benc he will be here shortly.14:01
cjwatsonI have to confess that I'm not getting very far, beyond the observation that if I stop ubiquity from removing packages from /target then the bug goes away14:01
BenCIf I build a unionfs module for someone, will they be able to test this (just woke up, no test rig)?14:04
cjwatsonyes14:04
cjwatsonif I get the source and some instructions on what I need to build-dep on, I could even put it in casper temporarily ;-)14:04
cjwatsonif that turns out to fix it, I'm going to point and laugh so hard at the people who were going ooh-shiny at aufs14:05
evandheh14:06
BenCcjwatson: And I'm gonna smack the people that told me it worked14:06
BenCcjwatson: 32 or 64-bit?14:07
mdzBenC: and what do we do with the people who took the old-working solution away before the new-shiny one was tested? ;-)14:07
cjwatsonBenC: 3214:07
cjwatson* unionfs is now known as Agent K14:08
BenCmdz: my fear is I took the old solution away because it didn't compile any more and didn't seem worth the effort :(14:08
mdzoh dear14:09
mdzcjwatson: I'm able to reproduce the bug by booting the live CD, mounting /target and running dpkg --root=...14:16
mdzcjwatson: chrooting dpkg instead seems to run fine14:16
cjwatsonblink, how is that different as far as the kernel is concerned?14:16
mdzcjwatson: one chroot() call instead of a bunch of forks and chroots?14:17
cjwatsonubiquity doesn't actually have much choice here; it needs to use python-apt, which can only really sanely be done in-process, and apt doesn't seem to have an option to chroot the whole of dpkg14:17
cjwatsonerr, sure, but none of those should have a stateful effect on the fs layer14:18
mdzI think unionfs is probably our best hope for a workaround; I'm trying to root cause14:19
BenCI thing I'm worried about is that this may be caused by the apparmor stuff14:20
BenCanyone tried booting with apparmor disabled?14:20
cjwatsonBenC: give me the runes14:20
BenC(not really apparmor, but the compatibility parts for the VFS)14:21
mdzBenC: I saw one trace which had aa_* in it, but it was after the initial BUG()14:21
cjwatsonBenC: my hypothesis above was that it was a refcounting bug in a filesystem somewhere14:21
BenCmdz: apparmor=014:21
cjwatsonBenC: I can't think of any other reason why kill_sb would be called from a sys_getcwd path14:21
cjwatsonI'm trying apparmor=0 now14:22
BenCcjwatson: right, but that may be a bug in the parts that apparmor had to add in order to pass around the mount-point (changes we had to make to unionfs and other filesystems)14:22
cjwatsonah, interesting, ok14:22
BenCcjwatson: yeah, that does sound like a put() without a proper get() before it14:22
cjwatsonwhere would I get the apparmor patch to look at while I'm waiting?14:23
BenCcjwatson: in git, there's one commit for the plumming to add apparmor14:23
cjwatsone103a4e81552fc5fea7c21a1d34cabc23bc938cc?14:24
cjwatson    UBUNTU: SAUCE: [AppArmor] aufs patches14:24
cjwatsonno, maybe not14:25
BenCah, I think that was the only part in lum :)14:25
BenCbut there is one that touches the rest of the VFS14:25
cjwatsonlooks like 2ec8408decf6d72c346247e0c437080e9f628fe414:25
cjwatson9000+ lines, yum14:25
BenCOH!14:29
BenCI bet I know what is causing this14:29
mdz?14:29
BenCiget() was removed from upstream, and I had fixup aufs to compile with the new API14:30
BenCI bet I did it wrong...there was limited examples of how the other filesystems coped with the change14:30
BenCstill trying to get unionfs to compile14:30
cjwatsonBenC: apparmor=0 doesn't seem to make any difference14:47
cjwatsonBenC: anything else I can try yet14:51
cjwatson?14:51
mdz"double fault"14:55
mdzthat's a new one14:55
BenCcjwatson: Nothing I can think of off-hand14:55
BenCunionfs is proving to be non-trivial to compile against 2.6.2614:55
cjwatsonis 30d866ec206ff71b5f2b626ed3442bd2547da524 the iget thing you were talking about? that's squashfs rather than aufs though14:56
mdzmy test case is down to "chroot /target /var/lib/dpkg/info/gparted.postrm remove"14:58
mdzchroot /target bash14:59
mdzthis keeps getting weirder15:00
mdzcjwatson: I'm pretty sure a whole crapload of stuff has run in the chroot before this point in the installation15:00
mdzbut even trying to start a shell blows up15:00
cjwatsonwhen are you running your test case?15:00
mdzcjwatson: single-user mode on the live cd15:01
cjwatsongosh15:01
cjwatsonchroot /target dash?15:01
mdzmkdir /target && mount /target && chroot /target bash15:01
mdzblows up15:01
mdzgetting there15:01
cjwatsonbearing in mind that bash isn't the default shell15:01
mdzcjwatson: chroot /target /bin/true crashes15:01
cjwatsonthis suggests that that filesystem has been subtly buggered by something just beforehand15:03
cjwatsonI can't reproduce this15:06
cjwatsonwell, not from a desktop on the live CD anyway15:06
mdzcjwatson: I created a fresh ext3 filesystem, rsync'd the contents of /rofs into it, shut down, rebooted, mounted it, and reproduced15:07
cjwatsonah, I was working from the previous crashed fs15:07
mdzhttps://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/251223/comments/715:07
mdzBenC: ^^ ever seen one like that?15:07
cjwatsonI'll try to duplicate, by way of the scientific method15:07
mdzcjwatson: you can have my 2.3GB qemu image if you think it would help, but I'm pretty sure it's clean15:09
sorenHas this been seen on real hardware as well?15:09
cjwatsonI'm sure it won't take long to set up a duplicate15:09
* soren crosses fingers15:09
cjwatsonalmost certainly quicker than downloading 2.3GB15:09
cjwatsondavmor2: are you using real hardware or a kvm?15:09
mdzsoren: yes, that's where I first saw it15:10
sorenmdz: Oh, "good".15:10
sorenThere has been a few people who claimed to have seen data corruption of some sort with kvm 70, but noone could provide any detail and kvm upstream have been unable to reproduce.. I'm glad that's not it.15:11
mdzcjwatson: interestingly, chroot /rofs /bin/true works15:11
cjwatsonsuggests an ext3 bug, which is a bit terrifying15:11
BenCmdz: IIRC, double fault means a fault in a fault15:12
BenClooks like register corruption kept it from printing the first oops properly15:12
BenCand caused another fault15:12
sorenYes, that's indeed what a double fault is.15:12
* BenC loves re-entrant crashing15:12
mdzwhat's a statically-linked binary in the base system?15:14
BenC"The system was unable to crash properly. Please reboot soon to avoid losing any other crashes"15:14
mdzah, ld_static15:14
cjwatsonld_static -> not base system (though it might be there anyway)15:14
mdzaha15:15
mdzchroot /target /bin/ld_static -> OK15:15
mdzchroot /target /bin/true -> crash15:15
cjwatsonchroot /target /sbin/ldconfig ?15:16
cjwatsonin fact, perhaps chroot /target /sbin/ldconfig.real15:16
mdzrebooting15:16
ckingmdz: Hi, just got your Email. What is the problem?15:18
* cjwatson goes to stretch back after too long hunched over the laptop15:18
cjwatsoncking: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/251223 is the problem15:19
mdzcking: bug 25122315:19
cjwatsonyou might want to check scrollback of this channel on irclogs.ubuntu.com15:19
BenCI've got one more file to forward port in unionfs15:19
mdzcking: this bug is blocking alpha 315:20
BenCcking: so we can attack this from both sides, could you take this bug from the perspective of finding out what is broken in aufs?15:20
mdzcjwatson: ldconfig causes a crash15:20
BenCcking: most likely it has to do with the apparmor patches to aufs, but there's an outside chance it is just busted15:20
BenCcjwatson: was aufs not being used in Alpha 2?15:21
mdzcking: if you can give me a kernel with that patch reverted, I'll test15:21
BenCmdz: we can't revert the patch15:23
BenCit wont compile without them15:23
mdzBenC: what won't?15:23
BenCunless you get a kernel that has no apparmor patch as well, but then we wouldn't know if it was aufs or apparmor15:23
BenCmdz: aufs15:23
mdzBenC: right, revert apparmor as well15:23
mdzcjwatson: ldconfig.real runs with no problem15:24
mdzI wonder what's different between this ext3 mount point and the squashfs mount point which has exactly the same files in it15:24
* cking is trying to extracate himself from a meeting and find somewhere to park himself15:25
mdzchroot /target true breaks, chroot /rofs true works15:25
mdzcking: if you're unavailable, don't sweat it, I only nagged you because I thought you were the only one around15:26
mdzI had just seen your trip report and thought you might have some time to help15:26
ckingmdz: I'm in the middle of a power savings talk - it's packed here15:27
cjwatsonaufs/unionfs> mdz's test case exercises neither of those, surely?15:27
cjwatsonmdz: ldconfig.real> and then does anything work *after* that? I was wondering if building the ld cache would be enough to fix dynamically linked binaries15:28
mdzcjwatson: aufs is involved15:29
ckingThis is going to be difficult to sort out from here, the network connection is slow and my home box is turned off at the moment.15:29
mdzmy root filesystem is aufs15:29
mdzbecause I'm booted from the cd15:29
cjwatsontrue15:29
cjwatsonseems ever more tenuous though15:29
mdzmounting a tmpfs, copying a minimal chroot into it, and chrooting true doesn't break15:31
cjwatsonmdz: I can't make a similar setup crash for me15:34
mdzcjwatson: how did you create the fs?15:34
cjwatsonbooted kvm with textonly as a kernel argument and a blank -hda (dd), parted to create a disklabel and partition, sudo mke2fs -j /dev/sda1, sudo mkdir /target, sudo mount /dev/sda1 /target, sudo rsync -av /rofs/ /target/15:35
mdzcjwatson: except for creating the partition table with fdisk, that's exactly what I did15:35
BenCI almost have unionfs done15:39
BenCall the 2.6.26 VFS changes plus apparmor...it's a wonder there aren't more bugs15:39
cr3BenC: by the way, what happened to replacing unionfs by aufs? did that happen?15:39
stgrabercr3: yes15:40
BenCcjwatson: http://kernel.ubuntu.com/~bcollins/unionfs.ko15:42
BenCcjwatson: that's 32-bit 2.6.26-4-generic15:42
BenCcjwatson: if it works, I have the source, and can tell you how to build-dep/compile it15:43
BenCsoren: as a side note, I have a -virtual package for you to test15:44
sorenWoo!15:44
BenCsoren: 9.1M .deb, 34Meg unpacked15:45
BenCsoren: I had to mess with your list a little, and module dep's pulled in some extras15:45
cjwatsoncr3: we conjecture that that may be the problem :-P15:45
sorenBenC: Yes, I kind of expected that. That's fine.15:46
cjwatsonBenC: 40415:46
sorenBenC: Did you see my question over in #ubuntu-kernel, by the way? About the virtio modules?15:46
BenCcjwatson: try now15:46
* BenC needed to hit enter still15:46
cjwatsonBenC: sorry, crashes at boot15:52
cjwatsonI'll try to get you a trace after this call15:52
BenCdamnit15:52
cjwatsonit's a null deref in unionfs_interpose15:53
BenCthat helps...15:53
cjwatson+0x16f15:53
BenCI did an API change there...let me check it15:53
cjwatsonerr, +0x16f/0x42015:53
cjwatsonclaims to be in the mount process15:54
BenCcjwatson: I see what happened...I didn't do all the right changes to drop read_inode()16:00
BenCgive me a couple minutes16:00
cjwatsonok16:04
BenCcjwatson: new module ready (same place)16:12
cjwatsonBenC: not quite sure, but it seems to have hung16:17
* BenC isn't quite sure that that means16:18
BenCAh!...found one more thing16:19
cjwatsonwell, nor am I, all I know is it ain't doing anything16:19
BenCcjwatson: new module up16:23
cjwatsonit's at least booting now16:26
* BenC crosses fingers16:26
BenCcjwatson: to be honest, I looked at the aufs code, and I can't see how anyone thinks it's somehow better than unionfs16:27
BenCbut then, I'm looking at unionfs 1.4 code, not the unionfs 2.3.x-need-to-patch-half-the-vfs patches16:27
BenCplus unionfs being in mm says a lot more about it :)16:27
cjwatsoninstalling16:42
BenCcjwatson: did the test case pass?16:43
cjwatsonI can't reproduce mdz's little test case, but I'm trying the big one (full installation)16:43
cjwatsonit's at 64%16:43
cjwatsonfailed around 96% before16:43
BenCok....good to know that the test passed at least16:43
mdzcjwatson: I didn't know about textonly and used single; I'm baffled as to why it didn't happen fro you16:43
BenCmeans we're probably on the right track16:43
cjwatsonBenC: err, what test? :-)16:44
cjwatsonno test has passed yet, short of booting16:44
BenCnever mind, I misunderstood16:44
cjwatsonBenC: same problem with unionfs16:52
cjwatson(unionfs:generic_shutdown_super is in the call trace so I'm sure)16:53
BenCcjwatson: can I see a full trace?16:55
BenCat least with unionfs I can do a better job of tracing the code and finding out what's wrong16:55
BenCAnd I know it at least worked at some point (in hardy)16:56
mdzcjwatson: it turns out I had inadvertently used the i386 squashfs rather than the amd64 one16:56
mdzit is still completely bizarre that it caused those crashes though16:56
cjwatsonBenC: erm, if you have really good eyesight, http://people.ubuntu.com/~cjwatson/tmp/251223.png16:57
cjwatsonBenC: sorry for the font but I was trying to guarantee that the dump plus context would fit on the screen16:57
BenCcjwatson: I think I see the problem...but let me check the trace16:57
BenC    sysfs: Use kill_anon_super17:01
BenC    17:01
BenC    Since sysfs no longer stores fs directory information in the dcache17:01
BenC    on a permanent basis kill_litter_super it is inappropriate and actively17:01
BenC    wrong.  It will decrement the count on all dentries left in the17:01
BenC    dcache before trying to free them.17:01
BenC    17:01
BenC    At the moment this is not biting us only because we never unmount sysfs.17:01
BenCthat sounds just like our unionfs problem17:02
cjwatsonyes, it does rather17:02
BenCgoing to try that for unionfs17:02
cjwatsonis that patch in our tree17:02
cjwatson?17:02
cjwatsonbecause note that we *do* unmount sysfs17:02
BenCit is in intrepid17:02
BenCit's from Aug 20, 200717:03
BenCso was in hardy too17:03
BenCand probably gutsy17:03
cjwatson(not bringing its total mount count down to zero, but nevertheless we mount /target/sys using 'chroot /target mount -t sysfs sysfs sys' rather than as a bind-mount, and then unmount it)17:03
BenCcjwatson: new module ready17:03
cjwatsonok17:03
BenCcjwatson: that patch probably coincided with sysfs not using dentry cache anymore, so never affected us17:04
BenCor maybe it only affected something like 2.6.2x where X is an odd number that we didn't use :)17:04
cjwatsonor it broke mysteriously but we never got to the bottom of the problem :)17:05
* cjwatson <- cynical17:05
cjwatsoninstalling17:07
BenCcjwatson: Not taking the mount count to zero probably meant it didn't affect us either...since kill_sb() wouldn't be called in that case17:08
cjwatsonok, I wasn't sure whether repeated mounts were equivalent to bind-mounts for that purposes17:08
cjwatsons/s$//17:08
BenCcjwatson: so far so good?17:24
cjwatsonsorry dude, exact same thing17:24
BenCfuck17:25
BenCit can't be the same backtrace though17:26
BenCcjwatson: can you repost the trace?17:26
cjwatsonBenC: http://people.ubuntu.com/~cjwatson/tmp/251223-2.png17:28
mdzcjwatson: so with a less weird test rig, I am still able to reproduce the bug with dpkg --root=/target --purge gparted17:28
mdzthough not with running random programs as with the 32-bit chroot17:29
mdzperhaps that's a different bug17:29
cjwatsonI think I'm going to assume different17:29
BenCcjwatson: hmm...this appears more and more like it's in fuse module, since that's the last module before generic_shutdown_super()17:31
cjwatsonI can't think of any fuse filesystems that would be mounted17:33
cjwatsonoh, there's $HOME/.gvfs I suppose17:33
BenCcjwatson: I think that has to be unmounted prior to rootfs17:34
BenCso it may be a unionfs bug that it allows unmounting when there's another fs still mounted under it, but we can work around that in scripting17:34
cjwatsonbut nothing is being unmounted here17:34
cjwatsonthis is getting triggered inside gparted.postrm, which does nothing except update-menus17:35
cjwatsonthe fact that it's hitting unmount paths at all is the bug17:35
BenCthe crash says "unmount of rootfs rootfs"17:35
cjwatsonsorry, I should have said "nothing should be being unmounted here"17:35
cjwatsonthis is why I was talking about refcounting bugs17:35
cjwatsonit's trying to unmount stuff unasked17:35
BenCI don't think refcounting causes unmounting17:35
cjwatsoninitially I thought that it wasn't really gparted.postrm causing it, but we have traces that confirm17:36
cjwatsonyou can see dpkg reporting that gparted.postrm segfaulted in that last trace17:36
BenCok, let me look deeper17:37
cjwatsonit definitely shouldn't have got as far as unmounting /target yet, much less rootfs17:37
mdzBenC: yes, I was wondering where that message comes from17:45
mdzcjwatson: can you see whether you get that "unmount of rootfs rootfs" in your test?17:45
cjwatsonI do17:45
mdzit scrolls off too fast for me17:45
cjwatsonyou can see it in the screenshots I've posted17:45
cjwatsonit's been entirely consistent17:46
BenCso what is the rootfs in this case?17:50
BenCI mean, what filesystem?17:50
BenCis it squashfs?17:50
cjwatsondoes rootfs mean absolute root (initramfs), or whatever we pivoted to, or the current process' root?17:50
cjwatsonI thought rootfs generally meant the initramfs17:51
BenChard to say...what are the possibilities at the point of failure?17:51
BenCit wouldn't be initramfs anymore, since we've pivoted by now, right?17:51
cjwatsonif you look at /proc/mounts, the top item is still labelled rootfs and no others are, and that top item is the initramfs17:52
cjwatsonthe other possibilities are the root that practically everything else thinks we're using, which is a unionfs composed of squashfs+ext3; and the process that dies happens to be chrooted into an ext3 filesystem17:52
cjwatsonI'd be surprised if the kernel referred to anything other than the absolute root initramfs as "rootfs", though - anything else I'd've thought would be "/" at most17:53
BenCI wonder if it's squashfs17:53
BenCwith so many filesystems layered, and the fact that this is a bubble up call to destroy the sb, It's hard to say where it happens17:54
BenCindirection might make a problem anywhere in the layering cause this17:54
cjwatsonit would probably be a good idea if you got yourself a test rig :)17:54
cjwatsonI'm going to be having myself an evening in the not too distant future17:54
BenCwhat's the most simple test case at this point?17:55
BenCcjwatson: can we just not remove gparted and see if that let's things finish? :)17:57
cjwatsonBenC: I tried telling ubiquity not to dpkg --remove gparted, and it failed on a different package instead17:57
BenCit's surprising to me that it's gparted and nothing else (other things surely have to be calling getcwd(), right?)17:57
BenCah17:58
cjwatsonlibntfs10.postrm17:58
cjwatsongparted is just the unlucky first one17:58
cjwatsonwe *could* have ubiquity not remove any packages at all, but that would probably be bad17:58
cjwatsonit would leave stuff like casper and ubiquity on the target system17:58
cjwatsonsimplest in the sense of simplest-to-construct (not quickest) is to do an installation with all the defaults17:58
cjwatsona kvm and use-the-whole-disk is sufficient17:59
mdzI have a more minimal test case now17:59
cjwatsonmdz reckons that creating a filesystem mounted on /target, then rsync -a /rofs/ /target/, then dpkg --root=/target --remove gparted will do the job17:59
cjwatsonwith a reboot in between the rsync and the dpkg?18:00
mdzI was always running dpkg on a fresh boot18:00
mdzBenC: I have a test case which only requires a minimal chroot18:00
cjwatsoncan you reproduce it *outside* the live CD?18:00
mdzcjwatson: haven't tried, I doubt it18:01
mdzit's just chroot+exec18:01
mdza few times18:01
cjwatsonso what's the current minimal test case and I'll brave it on my laptop?18:01
BenCcjwatson: is the postrm doign anything important? Maybe we could delete the postrm files of the packages being removed?18:03
BenCquick link to an ISO?18:03
mdzcjwatson: http://people.ubuntu.com/~mdz/251223/test.c18:03
cjwatsonhttp://cdimage.ubuntu.com/daily-live/current/intrepid-desktop-i386.iso18:03
cjwatsonBenC: not much in the cases I've looked at so far, but I'm not very comfortable with that18:04
BenCmdz: are you running this on the livecd?18:05
mdzBenC: yes18:05
mdzcontinuing to try to minimize it18:05
mdzBenC: /mnt needs to have a chroot with a working shell in it18:06
BenCgoing to take me an hour to download this ISO (should have started earlier18:06
BenC)18:06
mdzexecing /bin/sh -c '' triggers the bug, /bin/true doesn't18:07
cjwatsonmdz: that crashed my laptop, although only after a lot of iterations18:08
BenCbin/sh would do a getcwd18:08
cjwatsonseveral hundred18:08
mdzcjwatson: the current version at the same URL does only 318:09
mdzcjwatson: and that crashes 100% reliably for me18:09
mdzcjwatson: inside or outside of the live cd environment?18:09
cjwatsonin a regular intrepid system18:09
mdzoohhh18:09
BenCcjwatson: really?18:09
BenCthat's interesting18:09
cjwatsonI just mounted some random ext3 fs I had lying around18:09
cjwatson(I didn't want to introduce further variables with a loop-mount)18:09
mdzthat will speed up my test cycles tremendously18:10
* cjwatson sticks a counter in18:10
cjwatsonthis time round it hung after 8 iterations18:13
mdzcjwatson: if I add a chdir("/") after the chroot, it stops crashing18:14
mdznow we're getting somewhere18:14
BenCcjwatson: did you do a /proc or /sys mount (or any mounts) on the ext3 fs?18:15
cjwatsonBenC: no18:15
cjwatsonmdz: could you put http://people.ubuntu.com/~mdz/251223/strace.gz back up?18:16
mdzcjwatson: it's still there18:16
cjwatsonBenC: basically the same trace except no call trace in syslog18:17
cjwatsonmdz: oops, I transposed two digits18:17
cjwatsonand I note that dpkg does not chdir()18:17
cjwatsonyes, it survives 10000+ iterations for me with a chdir()18:18
mdzok18:18
mdzI have a test case now which works without a chroot18:19
cjwatsonblink18:19
mdzer, with a chroot(2)18:19
cjwatsonchdir rmdir?18:19
mdzbut with an empty chroot18:19
cjwatsonah18:19
BenCyay...I crashed18:19
mdzso doesn't require any setup18:19
mdzcjwatson: can you try the latest test.c?18:19
BenC[71309.482997]  [<ffffffff802ed461>] path_put+0x31/0x4018:19
BenC[71309.483019]  [<ffffffff802f9161>] sys_getcwd+0x101/0x16018:19
BenCnow we're getting somewhere18:20
BenCthat is with 2.6.27+ubuntu too18:20
BenCwonder if I can try a stock kernel and make sure it's not an upstream bug18:20
mdzthis one works without even mounting anything on the directory18:21
cjwatsonmdz: yeah, fails18:22
mdzcjwatson: does it fail in 3 iterations for you, or do you need more like you did before?18:22
BenCweird...if I read the dmesg right18:22
cjwatsonI'm not sure - the system didn't crash immediately so I thought it needed more, but then later on vi segfaulted and then my terminal ...18:22
BenCthis failed on a regular chroot18:22
BenCnot even a mounted fs18:22
cjwatsonBenC: right, either works18:22
BenCinteresting18:22
mdzBenC: <mdz> this one works without even mounting anything on the directory18:23
mdzan empty dir will do18:23
cjwatsonsuggesting that it's just the fact of calling chroot() exec*() without an intervening chdir()18:23
mdzcjwatson: my latest test case doesn't even call exec()18:23
mdzjust getcwd()18:23
cjwatsonoh, heh18:23
cjwatsonso getcwd likes to have a cwd18:24
mdzcjwatson: so we might get away with a workaround after all...18:24
BenCyay18:24
mdzdpkg would be more correct if it chdir()d18:24
cjwatsonmdz: fancy doing your yearly upload? :)18:24
BenClet me try this on a stock kernel before we beat ourselves up18:24
mdzcjwatson: ;-)18:24
BenChehe18:24
mdzI'll work up a dpkg patch and see18:24
cjwatsonI suppose maintainer scripts hardly ever rely on relative paths18:25
mdzif they relied on a relative path outside of the chroot, they'd be darn buggy :-)18:26
cjwatsonthey can't rely on relative paths *at all* or they'd break here18:27
cjwatsonbut of course nothing relies on relative paths because dpkg doesn't chdir :)18:27
cjwatsonnot for running maintscripts anyway18:28
cjwatsonI think I'll stop crashing my laptop now18:28
mdzpatched dpkg works18:30
cjwatsonwoo18:30
BenCmdz: sweet18:30
BenCI'll look at this from the kernel side still18:31
BenCit's definitely a bug no matter how we work around it18:31
cjwatsonidneed18:31
cjwatsonso looks like all the fs stuff was a red herring :)18:31
mdztested with --root=/mnt --purge gparted jfsutils lupin-casper ubiquity-casper ubiquity casper18:31
BenCnow that alpha3 isn't blocking on the kernel, I can fix some coffee :)18:31
BenCcjwatson: yeah...yay for indirection and abstraction18:32
cjwatsonBenC: is it relevant that the out: path of sys_getcwd does the puts in FIFO order rather than LIFO?18:32
cjwatsonpath_get(&pwd); path_get(&root); ... out: path_put(&pwd); path_put(&root);18:32
mdzcjwatson: http://people.ubuntu.com/~mdz/251223/dpkg.debdiff18:34
cjwatsonmdz: looks good, nice touch for reusing an existing string18:35
BenCI'll investigate that side as well18:37
mdzcjwatson: will you do a build and confirm that it fixes things for you?18:37
mdzI'll fire up ubiquity and do a full test18:37
cjwatsonwell, I was going to confirm it by firing up ubiquity and doing a full test :)18:38
mdzhow funny. X won't start in kvm anymore18:41
evandamd64 or i386?18:42
cjwatsonmdz: double-check that you remembered -m? :)18:42
evandheh18:42
mdzcjwatson: that's what I thought, too, but -m 512 still doesn't work18:44
mdzevand: amd6418:44
mdzwhich init script is it which switches usplash back to PULSATE at some random point in the boot sequence?18:46
BenCmdz: I noticed that at some point to...went from progress to pulsate a couple times during boot up18:48
BenCPretty sure it only happened on livecd18:48
evandah, perhaps you're running into this: https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/251480 - though I can't imagine why it would suddenly trigger when it was working before.18:48
BenCIIRC, it was a hard 8.04.1 CD18:48
BenC*hardy18:48
mdzevand: except that i386 guests break as well now18:51
cjwatsontest install with new dpkg running18:52
evandah18:55
mdzI'm leaving it to you, kvm hates me now18:55
mdzit survived only long enough to to test dpkg manually18:55
cjwatson51%18:56
BenCcjwatson: using aufs?19:02
cjwatsonBenC: yes19:02
cjwatsonevand: could you commit the 1.9.7 release?19:02
cjwatson(ubiquity)19:02
evandupload it?  sure.19:05
cjwatsonI thought you had already uploaded it19:05
cjwatsonactually, I meant push, not commit19:05
evandnope, not yet19:05
evandshould I?19:06
cjwatsonevand: oh, I'm sorry, I thought I'd seen a release commit, but I hadn't, so ignore me19:06
evandok19:06
cjwatsonI thought it was just one of those committed-but-not-pushed things19:06
evandyou did for 1.9.6 :), I was in a rush to fix a build failure.19:06
cjwatsonare you on intrepid now then? :)19:07
cjwatsonI'd been holding off updating the autotools since you weren't19:07
BenCI'm really hoping this kernel bug exists upstream19:07
evandindeed I am19:08
cjwatsonmdz: success19:16
cjwatsonSHIP IT19:16
evandhahaha19:17
mdzcjwatson: uploaded19:19
BenCmdz: excellent work on getting that work around19:20
BenCI'm booting a stock kernel now to see if I can reproduce it19:20
BenCcjwatson: a few things changed in sys_getcwd() since hardy, but not the ordering of the put's19:21
BenCbut it may just be one of those things that has gotten exposed through other changes19:21
BenCso I'll still give it a try19:21
BenCthe VFS did have some locking changes, and quite a few API changes since hardy, so I suspect this is an upstream issue19:22
mdzBenC: thanks, felt good to get my hands a little dirty19:24
CIA-12oem-config: cjwatson * r492 oem-config/ (d-i/manifest debian/changelog):19:25
CIA-12oem-config: Automatic update of included source packages: console-setup 1.25ubuntu2,19:25
CIA-12oem-config: localechooser 2.03ubuntu1, user-setup 1.20ubuntu3.19:25
BenCmdz: you wont be the next to leave management for engineering, will you? ;)19:28
CIA-12oem-config: cjwatson * r493 oem-config/debian/changelog: releasing version 1.4319:40
pgraner9820:00
mario_limonciellevand, just wanted to see how long are you sticking around for? both tue and wed?20:00
evandmario_limonciell: just tried calling you back ;) .  Correct, my flight back is the night of the 30th.20:01
mario_limonciellevand, yeah i've got poor reception in doors here, but also have a hard time getting to IRC lately20:01
mario_limonciellevand, okay.  i'll make sure you get a CC of the agenda we have together.20:01
mario_limonciellevand, do you have anything particularly you would like to bring up20:02
mario_limoncielland fit into a time slot for me to throw onto the agenda?20:02
evandWhere you stand with respect to automated installations, and anything you need from us there.20:03
mario_limonciellokay.  that should probably be fine clumped into the installer timeslot as it stands20:03
evandindeed20:03
mario_limonciellokay thx20:03
CIA-12oem-config: cjwatson * r494 oem-config/debian/ (changelog oem-config.dirs):20:09
CIA-12oem-config: Create /var/lib/localechooser directory, otherwise localechooser20:09
CIA-12oem-config: completely breaks.20:09
CIA-12oem-config: cjwatson * r495 oem-config/ (configure configure.ac): bump to 1.4420:10
BenCthat sucks...it's not an upstream problem...stock kernel doesn't BUG() out even after letting this run a few thousand times20:11
CIA-12oem-config: cjwatson * r496 oem-config/debian/changelog: releasing version 1.4420:11
mdzBenC: any luck tracking down the bug?21:18
BenCmdz: it's definitely apparmor changes to the VFS21:19
BenCthere's some patches to d_path (and thusly sys_getcwd's usage of it)21:19
mdzBenC: have you sent it to apparmor upstream?21:19
BenCand they even talk about lazy unmounts21:19
BenCmdz: not yet, tracking it down bit more21:20
BenCaha!21:26
BenCI think I found the bug21:26
BenCThey had the same patch in hardy, but slightly different...in hardy sys_getcwd passed fail_on_deleted to __d_path() but in the intrepid patches it doesn't21:30
BenCand that's from upstream svn (not something we goofed up)21:30
BenCHopefully this recompile shows it fixes the bug21:30
kirklandis this the best channel for a usplash question?21:45
cjwatsonkirkland: not really, -devel is fine22:10
uditI have a question regarding apt-ftparchive.....so the way i undertsand it, it takes a pool as input and generates the packages index file in the dist folder. But the pool is only divided along components (main universe etc.)... how does apt-ftparchive know the distribution (hardy. feisty etc.) to generate the packages file in their respective subloders under dist ?22:37
udit*subloders =subfolders22:37
mdzcjwatson: confirmed successful installation using dpkg from the archive22:46
andare_devoi have dvdshrink installed but i don't know how to use it cuz it's installed by the terminal.  can anyone help me out?22:50

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!