[12:31] <kylem> BenC, feel free to git pull hera.kernel.org:/pub/scm/linux/kernel/git/kyle/ubuntu-hppa.git branch master.
[12:31] <kylem> i've revvi've reverted the async scsi stuff which should help.
[02:09] <Keybuk> BenC: around?
[03:06] <zul> BenC: http://70.29.57.2/ubuntu/xen/xen-naming.patch it only touches the xen bits
[03:13] <BenC> Keybuk: yeah
[03:13] <BenC> zul: ok
[03:14] <zul> sweeeet...ill upload a new one tomorrow
[03:16] <BenC> zul: I'll upload kernel-package
[03:16] <BenC> I need my changes in it
[03:16] <zul> ok sounds good
[03:17] <zul> im just rewriting the rules file so it uses kernel-package
[04:00] <Keybuk> BenC: can we get linux/linkage.h back? :)
[04:00] <Keybuk> http://librarian.launchpad.net/3507234/buildlog_ubuntu-edgy-i386.sysklogd_1.4.1-18ubuntu2_FAILEDTOBUILD.txt.gz
[04:23] <fabbione> morning
[04:47] <fabbione> BenC: ping?
[05:55] <BenC> Keybuk: ok
[05:55] <BenC> fabbione: hey
[06:15] <fabbione> BenC: yo
[06:15] <fabbione> BenC: do you feel like looking into the cluster stuff or is it too late for you?
[07:27] <BenC> fabbione: sorry, too much going on for the past hour or so, and I'm beat
[07:29] <fabbione> BenC: no problem.. 
[12:35] <pmjdebruijn> http://paste.ubuntu-nl.org/18511
[12:36] <pmjdebruijn> do I have to be root?
[02:07] <fabbione> BenC: when you wake up can you please pull from my edgy branch?
[02:07] <fabbione> BenC: i have 2/3 fixes for GFS and GFS2
[02:07] <fabbione> we are only missing one from upstream now that i have no idea what to do about
[02:07] <fabbione> and if you can manage to upload before you leave for holidays i can get the rh-c-s binaries to build too
[02:13] <zul> hey
[02:15] <zul> BenC: so when are you going to upload a new kernel-package?
[02:16] <fabbione> i hope he is going to do everything today before he goes in holidays
[02:16] <fabbione> otherwise sparc is borked for headers and so are other arches :)
[02:16] <fabbione> and packages
[02:16] <fabbione> and ...
[02:16] <fabbione> a..
[02:16] <fabbione> ..
[02:16] <fabbione> .
[02:16] <zul> when does his holidays start?
[02:17] <fabbione> end of his friday i guess
[02:19] <zul> oh..
[04:45] <BenC> fabbione: actually, I'll be around till Sunday afternoon
[04:45] <BenC> but I am planning on an upload tonight
[04:45] <fabbione> BenC: well it's weekend :)
[04:45] <zul> not it isnt :)
[04:46] <BenC> kernel-package + grub + linux
[04:46] <fabbione> BenC: btw the OOM killer doesn't work. i just found out by mistake
[04:46] <BenC> not for me yet :)
[04:59] <Keybuk> just trying to find some documentation on the suspend format
[04:59] <Kamion> so, the swsusp format appears to be:
[05:00] <Kamion> PAGE_SIZE with 10 bytes at the end reading "S1SUSPEND\0"
[05:00] <Kamion> then a struct swsusp_info
[05:00] <Keybuk> it replaces the entire swap partition?
[05:00] <Kamion> think so, S1SUSPEND goes in the same place as SWAP-SPACE or SWAPSPACE2 AFAICT
[05:01] <Kamion> see power/swsusp.c:mark_swapfiles
[05:01] <Kamion> now, a struct swsusp_info is { struct new_utsname; u32; unsigned long; int; unsigned long; unsigned long }
[05:01] <fabbione> BenC, Keybuk: well something is stored in md sb as i pasted on -devel
[05:02] <Keybuk> fabbione: can we not have this conversation right now, please
[05:02] <Kamion> the corresponding bit of a swap partition is { unsigned int version; unsigned int last_page; unsigned int nr_badpages; unsigned char uuid[16] ; char volume_name[16] ; ... }
[05:03] <Keybuk> right
[05:03] <Keybuk> the swap partition is at the start of the page
[05:03] <Keybuk> uh
[05:03] <Keybuk> swap partition header
[05:03] <Keybuk> and the SWAPSPACE2 bit is at the end
[05:04] <Kamion> the start of a swap partition is a char[1024]  to allow room for boot images
[05:04] <Kamion> the SWAPSPACE2 is right at the end of that
[05:04] <Keybuk> right
[05:04] <Kamion> and the other fields like uuid and stuff immediately follow
[05:04] <Keybuk> uh, no
[05:04] <Keybuk> wrong
[05:04] <Keybuk> the start of a swap partition is a char[1024] 
[05:04] <Keybuk> THEN the header struct
[05:05] <Keybuk> the SWAPSPACE2 is definitely at the very end of the page
[05:05] <Keybuk> it reads 0+PAGE_SIZE_10
[05:05] <Keybuk> 0+PAGE_SIZE-10
[05:05] <Keybuk> even
[05:05] <Kamion> oh, sorry, you're right
[05:05] <Kamion> page size confusion on my part
[05:05] <Keybuk> volumne_name appears to be the "label"
[05:06] <Keybuk> and uuid is obviously the UUID
[05:06] <Kamion> yes, it is
[05:06] <Keybuk> (though mkswap doesn't have an option to set the UUID)
[05:06] <Kamion> it generates it automatically
[05:06] <Keybuk> right
[05:06] <Keybuk> how come the installer didn't do that?  custom mkswap?
[05:06] <Kamion> busybox mkswap, yes
[05:06] <Kamion> I'm pulling the uuid changes into that
[05:06] <Keybuk> d-i is more NIH than I am :)
[05:06] <Kamion> :)
[05:06] <thom> Keybuk: keep dreaming
[05:07] <Keybuk> ok
[05:07] <Keybuk> now the swsusp header goes at the END of the page too, by the looks of it
[05:07] <fabbione> Keybuk: sure we can wait after stuff is broken and data lost.. no problem at all
[05:07] <Keybuk> it has a reserved block on the front of PAGE_SIZE - 20 - sizeof(swp_entry_t)
[05:08] <Keybuk> fabbione: dude, if you're not going to be helpful and go find out whether the RAID stuff actually does hardcode sd* names in the block somewhere, please be quiet
[05:08] <fabbione> Keybuk: i told you before what is hardcoded. major and minors
[05:08] <Kamion> he said on #ubuntu-devel that it hardcodes major numbers in the block
[05:09] <Keybuk> fabbione: ok, then it's broken
[05:09] <Kamion> still, it might be helpful to tackle one problem at a time
[05:09] <Keybuk> and probably broken today, in fact
[05:09] <Keybuk> explains a few bugs
[05:09] <Keybuk> back to swap
[05:09] <fabbione> Keybuk: well then the transition need to take that into account. You can't just break stuff like this
[05:10] <Keybuk> fabbione: dude, it's already broken
[05:10] <Keybuk> dapper won't guarantee ordering of multiple scsi devices
[05:10] <Keybuk> neither did breezy
[05:10] <Keybuk> this is us trying to FIX that
[05:11] <fabbione> ok don't break my raid.. i don't care about others.
[05:12] <zul> heh
[05:14] <Keybuk> static struct swsusp_header {
[05:14] <Keybuk>         char reserved[PAGE_SIZE - 20 - sizeof(swp_entry_t)] ;
[05:14] <Keybuk>         swp_entry_t image;
[05:14] <Keybuk>         char    orig_sig[10] ;
[05:14] <Keybuk>         char    sig[10] ;
[05:14] <Keybuk> } __attribute__((packed, aligned(PAGE_SIZE))) swsusp_header;
[05:14] <Keybuk> Kamion: ^ so it appears that the swsusp header contains a copy of the original swp_entry_t
[05:15] <Keybuk> so a swsusp first page looks like "Blah blah blah ... swp_entry_t SWAPSPACE2 S1SUSPEND"
[05:15] <Kamion> oh yes, good point, I hadn't got round to figuring out what orig_sig was
[05:15] <Kamion> yeah
[05:15] <BenC> ok, messing with swap on a running system turns out to be a bad idea
[05:16] <Keybuk> I'm trying to find the resume code
[05:16] <Keybuk> to see how it puts it back
[05:18] <Kamion> BenC: how ba?
[05:18] <Kamion> d
[05:19] <BenC> Kamion: crash with a fsck bad
[05:19] <Kamion> yum
[05:19] <Keybuk> BenC: what did you do?
[05:20] <Keybuk> Kamion: dunno about you, but I haven't found yet where it restores the original swap header
[05:20] <mjg59> Kamion: swsusp_check
[05:20] <mjg59> Oh, sorry, no, that's only in the failure case
[05:21] <mjg59> No, wait, that's right
[05:21] <mjg59>                if (!memcmp(SWSUSP_SIG, swsusp_header.sig, 10)) { memcpy(swsusp_header.sig, swsusp_header.orig_sig, 10); /* Reset swap signature now */ error = bio_write_page(0, &swsusp_header);
[05:21] <mjg59> Grah, irssi hate
[05:21] <Kamion> unintuitive naming, but right
[05:21] <mjg59> The original signature is in the swsusp header, so it just copies it back
[05:22] <Kamion> "Check for swsusp signature in the resume device <small>and munge it</small>"
[05:22] <Keybuk> this is confusing me ... these two bits of code seem to put the swp_entry_t in different places
[05:23] <Keybuk> oh, grr, swp_entry_t != swap_header
[05:23] <Keybuk> right
[05:23] <Keybuk> so swsusp does blat the header of the swap partition
[05:23] <Keybuk> GO SWSUSP!
[05:25] <Kamion> are you sure? it reads the existing contents first
[05:25] <Keybuk> it does?
[05:25] <Kamion> I think experimentation might be better than reading the code here
[05:25] <Keybuk> tbh, I think we just need to test this
[05:25] <Kamion>                 if ((error = bio_read_page(0, &swsusp_header)))
[05:25] <Kamion>                         return error;
[05:25] <Kamion>                 if (!memcmp(SWSUSP_SIG, swsusp_header.sig, 10)) {
[05:25] <Keybuk> mjg59: care go guinea pig for us?  you're the one man in the universe whose laptop can suspend and resume <g>
[05:26] <Kamion> can I suspend/resume in vmware, I wonder?
[05:26] <mjg59> Wait, what's the problem you're working on here?
[05:26] <BenC> Kamion: that was just "mkswap -L `uuidgen`", which probably isn't what we wan't anyway
[05:26] <Keybuk> mjg59: finding swap devices by UUID
[05:26] <Keybuk> in particular, making sure that the UUID in the swap partition isn't destroyed by suspending
[05:27] <Kamion> BenC: oh you completely rewrote the swap then - that will have zeroed the existing contents and stuff
[05:27] <Kamion> I was thinking more of just dd'ing in a UUID
[05:27] <mjg59> Ah, right
[05:27] <mjg59> Can't right now. Or even this weekend, really...
[05:27] <Kamion> I'm doing a fresh install in vmware at the moment; if swsusp will work there then I'll try it
[05:28] <BenC> Kamion: from what I remember, it works
[05:28] <mjg59> There's no reason for swsusp to blat the uuid, so yeah, fixing it so it doesn't should be fine
[05:28] <Kamion> mjg59: however, if it does in the current kernel, we're already screwed
[05:29] <Kamion> since this is for dapper->edgy upgrades
[05:29] <Kamion> we're pretty much stuck with whatever the dapper kernel does
[05:30] <fabbione> well you can still mount / by uuid and write these info somewhere on the filesystem (for the swap) perhaps?
[05:30] <fabbione> something like:
[05:30] <fabbione> hash the last 4K of the swap
[05:30] <fabbione> reboot
[05:30] <fabbione> check for swap partitions
[05:30] <fabbione> rehash to figure where the partition is gone
[05:30] <fabbione> take action?
[05:31] <mjg59> Oh, the risk that they'll upgrade then suspend and lose the uuid?
[05:32] <fabbione> wouldn't that happen even booting on another linux distro that does mkswap at boot?
[05:32] <Kamion> fabbione: yeah, sounds possible, but if we can do it all directly on upgrade then that's obviously better
[05:32] <Kamion> fabbione: do those exist?
[05:34] <fabbione> Kamion: i don't know.. i haven't used any distro != Debian || Ubuntu for the last 4 years
[05:34] <mjg59> Sharing swap isn't a supported configuration, is it?
[05:34] <Kamion> I don't think there's much we can do about that case
[05:34] <mjg59> Given that we suspend into it
[05:34] <fabbione> i think we could use partition roundings
[05:34] <fabbione> there is always some space in between partitions
[05:34] <Kamion> mjg59: there exist people who don't suspend to disk :-)
[05:35] <Kamion> for them, it will work apart from this theoretical problem
[05:35] <fabbione> we might be able to reuse it
[05:35] <fabbione> like the block immediatly after the end of the swap
[05:35] <fabbione> i doubt it's the start of the next partition (needs to be checked)
[05:35] <Kamion> I'm much more comfortable sticking to the UUID if we can
[05:35] <fabbione> if that space is really empty, we can make use of it
[05:35] <Kamion> it's simpler and supported by existing tools
[05:36] <Kamion> and the failure mode with swap is not as bad as the failure mode with other partitions
[05:36] <mjg59> Kamion: Sure, but it's enabled by default. The supported install configuration is incompatible with shared swap.
[05:36] <fabbione> Kamion: yes so am I but i am considering alternatives
[05:36] <Kamion> if it fails, people don't have swap, big deal, they can fix it once they've booted
[05:36] <Kamion> mjg59: only if you *actually suspend*
[05:36] <Kamion> which desktop users will to a good first approximation never do
[05:37] <Kamion> mjg59: yes, I agree that for laptop users it can't be supported
[05:37] <Kamion> but we've never explicitly advertised it as unsupported either, and for desktop users it will always have worked well
[05:37] <Keybuk> echo -n "LABEL" | dd conv=trunc of=SWAP obs=1 seek=1052
[05:37] <Keybuk> for the adventurous :p
[05:37] <Kamion> label, not uuid?
[05:38] <Keybuk> uuid is harder to encode for "testing whether one can mangle an active swap partition" purposes
[05:38] <Keybuk> if you want to write the uuid, just write to 1036 instead
[05:41] <mjg59> Kamion: I think if there's a feature that's enabled by default that will destroy data in certain configurations, either (a) that feature shouldn't be enabled or (b) that configuration shouldn't be supported
[05:41] <Kamion> what does "be supported" mean though?
[05:41] <mjg59> "If you do this and stuff breaks, it's your problem not ours"
[05:41] <Kamion> in practice, since we've never advertised it as unsupported, it appears to mean "we will feel free to fuck your system without prior warning if you didn't realise you couldn't do that"
[05:41] <Kamion> which is a bit harsh
[05:42] <mjg59> Well, in this case the system fucking would be limited to your swap potentially vanishing
[05:42] <Kamion> right, so it's not too bad - but I'm just very very uncomfortable with dismissing problems as "unsupported" when we never warned people about them
[05:43] <mjg59> I think when we're faced with a choice of "effectively impossible" or "may irritate someone who ought to be smart enough to fix it", we're effectively obliged to choose the latter and document it
[05:43] <mjg59> Doing impossible things is outside our remit, no matter how happy it may make users :)
[05:46] <Kamion> right, just saying that the "and document it" is not optional, and ideally needs to be in-their-face during the upgrade
[05:47] <Keybuk> uuidgen | perl -ne 's/-//g;print "%c",hex %1 while(/(..)/g)' | dd conv=notrunc of=/dev/hda5 obs=1 seek=1036
[05:48] <Kamion> s/print/printf
[05:48] <Keybuk> yes, sorry, mis-typed into IRC
[05:48] <Kamion> and s/%1/\$1/
[05:49] <Keybuk> that appears to survive a swapoff/swapon
[05:53] <Keybuk> and it survives a reboot
[05:54] <Keybuk> ok, so where we have an active (or inactive) swap partition without a UUID, we can modify it to include one without disturbing the running system
[05:54] <Keybuk> that check should include an "is a swap partition" check, obviously
[05:54] <Keybuk> that leaves us with
[05:54] <Keybuk> - v0 swap partitions  (no UUID)
[05:54] <Keybuk> - partitions with a resume image
[05:55] <Keybuk> what do we do about those?
[05:55] <mjg59> They haven't been supprted since 2.2, have they?
[05:56] <mjg59> v0 swap partitions, that is
[05:57] <Keybuk> dunno, the documentation just implies that they were the only version supposrted before 2.2
[05:57] <Keybuk> right
[05:57] <Keybuk> they're not supported ;)
[05:57] <Keybuk> you can't swapon a v0 partition
[05:58] <Keybuk> ok, so that just leaves us with partitions with a resume image
[05:58] <Keybuk> (and just thought, how do we find the resume image? :p)
[05:58] <mjg59> Is there space in the header for a UUID?
[05:58] <mjg59> Ha. That's a fun one.
[05:58] <Keybuk> mjg59: which header?
[05:58] <mjg59> The swsusp header.
[05:58] <Keybuk> if swsusp includes the original swap header, we could just give resume= the smarts to find that
[05:59] <mjg59> We need to deal with the situation where resume fails and swsusp doesn't redo the header
[05:59] <Keybuk> btw, any particular reason resume= is necessary?  couldn't one just iterate /proc/partitions and look for a S1SUSPEND partition?
[05:59] <mjg59> No, because you might have multiple OSs installed
[05:59] <Keybuk> does swsusp do that often?
[05:59] <Keybuk> that'd leave them with dead swap anyway, no?
[05:59] <mjg59> swap wouldn't necessarily be shared
[06:00] <mjg59> If you don't mount filesystems between them, that's a prefectly reasonable configuration
[06:00] <Keybuk> so the choice is what to do when we find them
[06:02] <mjg59> Check whether they have the correct uuid and resume them?
[06:05] <Keybuk> right, I mean at upgrade time
[06:06] <Keybuk> someone upgrades from dapper to edgy, and we find that something listed in /etc/fstab as a swap partition to be mounted on boot is not active and contains a suspend image
[06:06] <Keybuk> I guess this is easy, we write a UUID into the right bit of the image
[06:06] <mjg59> They're fucked anyway, presumably?
[06:06] <Keybuk> and adjust the resume= so whenever that does get resumed, it will work
[06:06] <mjg59> Or do we support booting edgy with a dapper kernel?
[06:07] <Keybuk> we don't "support" it
[06:07] <Keybuk> I don't see any reason it wouldn't boot though
[06:07] <mjg59> Ok
[06:07] <Keybuk> dapper->edgy isn't that large a jump
[06:07] <Keybuk> Kamion: do you concur?
[06:07] <mjg59> dapper kernel won't write a uuid into the suspended image
[06:07] <Keybuk> mjg59: ?
[06:08] <mjg59> I don't quite understand the situation you're suggesting
[06:08] <mjg59> They're on dapper. They upgrade to edgy. They suspend?
[06:08] <mjg59> Also, we don't have a resume=
[06:08] <mjg59> That information lives in the initramfs
[06:10] <Keybuk> no
[06:10] <Keybuk> they're on dapper
[06:10] <Keybuk> they have a swap partition listed in /etc/fstab
[06:10] <Keybuk> that swap partition is not active, and contains an S1SUSPEND image
[06:10] <Keybuk> they upgrade to edgy
[06:10] <Keybuk> during the upgrade, while still in dapper
[06:10] <mjg59> Whose suspended image is it?
[06:11] <mjg59> Does it belong to the dapper system?
[06:11] <Keybuk> can we tell that?
[06:11] <mjg59> I don't /think/ you can ever get into that situation
[06:11] <BenC> if it's in /etc/fstab, IMO, it belongs to dapper
[06:11] <Keybuk> mjg59: failed resume?
[06:12] <mjg59> If it's a dapper image (a) it should have been mkswapped during resume, and (b) if it's resumed now it'll cause massive filesystem corruption
[06:12] <BenC> Keybuk: if so, then the failure is too late for recovery
[06:12] <Keybuk> mjg59: mkswapped during resume?
[06:12] <mjg59> Keybuk: The failed resume, sorry
[06:12] <Keybuk> do you mean "mkswap run" or do you mean "the original swap header restored" ?
[06:12] <mjg59> Though it's possible that that bug never got fixed
[06:12] <mjg59> Failed resume should result in the suspended image being turned back into swap
[06:13] <Keybuk> right, but _NOT_ using mkswap, right?
[06:13] <mjg59> We may use mkswap at the moment
[06:13] <Keybuk> do you know where I can find that out?
[06:13] <mjg59> Though I think the kernel actually fixes it
[06:13] <mjg59> Actually, yes
[06:13] <mjg59> swsusp_check will rewrite it
[06:13] <Keybuk> ok, I can't find mkswap anywhere in the initramfs or boot scripts
[06:13] <mjg59> So original swap header restored, I believe
[06:14] <Keybuk> ok
[06:14] <Keybuk> so we found a swap partition in /etc/fstab that contains a resume image
[06:14] <Keybuk> what we do?
[06:14] <BenC> mkswap
[06:14] <mjg59> Yes
[06:15] <BenC> once they boot ignoring that resume image, the resume image is useless
[06:15] <mjg59> Allowing that image to be restored would be actively dangerous
[06:15] <BenC> exactly
[06:16] <Keybuk> ok, that's fine
[06:16] <Keybuk> now, we do need to deal with resume=
[06:16] <mjg59> We don't use resume=
[06:16] <Keybuk> we don't?  how do we resume from hibernate?
[06:16] <Keybuk> context change, btw
[06:16] <Keybuk> we've upgraded the system, it's now running edgy
[06:16] <Keybuk> they want to suspend and resume
[06:17] <Keybuk> resume=/dev/hda5 ain't gonna work <g>
[06:17] <mjg59> We have a RESUME= statement in /etc/initramfs-tools/conf.d/resume
[06:17] <mjg59> Initramfs generates a major and minor from that and echoes them into /sys/power/resume
[06:17] <Keybuk> right
[06:17] <Keybuk> so we need to extend initramfs to support RESUME/resume=UUID=...
[06:18] <Keybuk> and have it iterate /proc/partitions, look for a hibernate image on a swap partition with the given UUID
[06:18] <mjg59> Please don't use "rescue=", that's a separate codepath
[06:18] <Keybuk> eh?
[06:18] <mjg59> Uh, "resume="
[06:18] <mjg59> Mentioning it confuses the situation
[06:18] <Keybuk> resume= looks like the same code path to me
[06:18] <Keybuk> export resume=${RESUME}
[06:18] <mjg59> Indirectly
[06:18] <Keybuk>         resume=*)
[06:18] <Keybuk>                 resume=${x#resume=}
[06:18] <Keybuk>                 ;;
[06:19] <Keybuk> RESUME is otherwise entirely unmentioned in the initramfs code
[06:19] <mjg59> The traditional semantics for "resume=" is that it's parsed by the kernel
[06:19] <Keybuk> is it still parsed by the kernel?
[06:19] <mjg59> In our case, possibly not, but we never write any configuration that uses it
[06:19] <Keybuk> right
[06:20] <Keybuk> does RESUME=UUID=... sound insane?
[06:20] <mjg59> No, that's fine
[06:20] <mjg59> We change /etc/initramfs-tools/conf.d/resume
[06:20] <mjg59> Ideally before and after point to the same partition
[06:20] <mjg59> Then it just needs a small amount of work in initramfs-tools
[06:20] <Keybuk> "before and after point" ?
[06:21] <mjg59> The partition pointed at before the rewrite should be the same as the one pointed at afterwards
[06:21] <Keybuk> right
[06:21] <BenC> that's ideally what we're shooting for :)
[06:21] <Keybuk> the difference is before it'd be /dev/?d?? but after would be UUID=...
[06:21] <mjg59> Yes
[06:22] <Keybuk> with the advantage that after, it'll work even if the disk jumped from hda5 to sdb5 :p
[06:22] <BenC> you wouldn't want to just to /dev/disk/by-uuid/* ?
[06:22] <BenC> less code
[06:22] <mjg59> Keybuk: Not quite
[06:22] <Keybuk> BenC: /dev/disk/by-uuid doesn't (yet) know about extracting a UUID from a suspended image
[06:22] <BenC> actually, no code changes if you do that
[06:23] <Keybuk> BenC: we use UUID=* elsewhere for consistency
[06:23] <BenC> Keybul: it's different than a swap image?
[06:23] <Keybuk> BenC: dunno yet
[06:23] <Keybuk> still waiting for Kamion to come back from vmware
[06:23] <mjg59> Keybuk: It won't handle that if the change happens over a suspend/resume cycle
[06:23] <BenC> ok
[06:23] <mjg59> But that's a massively pathological case, so
[06:23] <Keybuk> mjg59: can the change happen over a suspend/resume cycle?
[06:23] <mjg59> If they move disks around, yes
[06:24] <mjg59> It'll deal fine with the drivers/ide -> libata conversion
[06:24] <BenC> if they resumed before upgrade, changes are they did it from a dapper kernel
[06:24] <Keybuk> uh, doesn't this change mean they *CAN* move disks around between a suspend and resume, and have everything just work?
[06:24] <BenC> if they suspend again before rebooting to edgy, then that would break
[06:24] <BenC> I think
[06:24] <Keybuk> because things are mounted by UUID, the physical location or kernel-assigned name of the disk won't matter
[06:24] <mjg59> Keybuk: No, because kernel restores with old hardware knowledge
[06:25] <BenC> is there any way we can disable suspend functionality until they reboot after this change?
[06:25] <mjg59> So you'll resume, and the kernel will think "/ is on the first disk connected to this controller" when in fact it's now on the second
[06:25] <Keybuk> mjg59: have you had any cases of that so far?
[06:25] <mjg59> Keybuk: Right now it won't even attempt to resume. With this change it'll attempt to resume and then blow up
[06:26] <Keybuk> depends how much they changed
[06:26] <Keybuk> tbh, I suspect this is a "WELL DON'T DO THAT THEN!"
[06:26] <mjg59> You can't change hardware config and expect a resume to work
[06:26] <mjg59> (sadly)
[06:26] <Keybuk> though people could get heavily bitten if the resume boot is subtly different to the suspend boot
[06:26] <Keybuk> ie. your raid controller takes an extra second to start up
[06:26] <mjg59> There's a standard for a BIOS flag that gets set if the BIOS detects a changed config
[06:26] <mjg59> We should probably check that
[06:26] <Keybuk> so your scsi controller wins
[06:27] <mjg59> Oh, that's fine
[06:27] <Keybuk> but I suspect that's fine, actually
[06:27] <Keybuk> because your "first disk in this controller" info is still valid
[06:27] <mjg59> As long as the hardware is the same, enumeration order is unimportant
[06:27] <Keybuk> right
[06:27] <Keybuk> this is, in fact, only if you add new duplicate controllers of a given type
[06:27] <mjg59> The entire running kernel state gets overwritten by the old one
[06:27] <Keybuk> or swap the drives around on a particular controller
[06:27] <mjg59> Or move PCI slots
[06:27] <Keybuk> I guess
[06:28] <Keybuk> but that's stupid
[06:28] <mjg59> If the device ID changes, things explode
[06:28] <Keybuk> I'm trying to make sure that sunspots can't break it
[06:28] <mjg59> But PCI device enumeration isn't dependent on startup speed - disk enumeration order may be
[06:28] <Keybuk> ie. udev loading modules in a different order
[06:28] <BenC> I think changing hardware configuration in the middle of an upgrade should just be strictly forbidden :)
[06:28] <mjg59> Keybuk: Anyway, I need to pack
[06:28] <Keybuk> pack?
[06:28] <Keybuk> for?
[06:29] <mjg59> Keybuk: Are you coming up this evening, or will I see you tomorrow?
[06:29] <Keybuk> I may be up this evening
[06:29] <Keybuk> depends when David turns up
[06:29] <mjg59> Ok
[06:29] <mjg59> We're leaving in about an hour
[06:29] <Keybuk> cool
[06:29] <Keybuk> BenC: does vmware work on an amd64?
[06:30] <BenC> does on mine
[06:30] <Keybuk> ok, let me install it for testing too
[06:30] <Keybuk> about time I did
[06:31] <Keybuk> I'm bored of screwing up chroots <g>
[06:31] <kylem> BenC, http://www.kernel.org/git/?p=linux/kernel/git/kyle/ubuntu-hppa.git;a=summary <- please look to make sure i cleaned up tulip properly. 
[06:31] <BenC> kylem: ok
[06:31] <Keybuk> BenC: right, if I modify vol_id as necessary to support extracting the UUID out of a swap partition containing a S1SUSPEND image ...
[06:31] <BenC> kylem: btw, I think I narrowed a500 boot failure down to something with CONFIG_GSC being enabled
[06:32] <Keybuk> (this is a patch upstream will accept with a "sick... but YES!")
[06:32] <Keybuk> then we can treat UUID=* as an alias for /dev/disk/by-uuid/*
[06:32] <Keybuk> and could do RESUME=LABEL=SWAP with the same code <g>
[06:32] <kylem> BenC, ooh. neat.
[06:32] <kylem> BenC, i'll be able to work sunday, still at OLS this week.
[06:33] <kylem> sounds like it could be that CONFIG_GSC is trying to register a console first.
[06:33] <BenC> which console should a500 be using, PDC, right?
[06:34] <BenC> with GSC, it was using SERIAL_MUX or something like that
[06:35] <Keybuk> ouch
[06:35] <Keybuk> LILO!
[06:36] <Keybuk> yes, it can
[06:46] <Keybuk> I'm still unsure where the hell this migration code should go
[06:48] <Keybuk> it may be true that udev is the right place for it :-/
[06:49] <Keybuk> we almost need a sane-linux-system package
[06:49] <Keybuk> that the kernel depends on
[06:58] <BenC> linux-image-kdump_2.6.17-5.15_amd64.deb
[06:58] <BenC> yummy
[07:03] <zul> sweet
[07:20] <zul> BenC: thanks for the kernel-package upload
[07:20] <BenC> np
[07:21] <zul> now i can finish my rewrite
[07:28] <Kamion> Keybuk: right, sorry, I was away for a while
[07:28] <Kamion> I concur with the above plan, I think, as long as tests work
[07:29] <Kamion> Keybuk: I have the partman-target change (it's tiny), but am holding off on it a while until I have new d-i with the fixed busybox mkswap
[07:29] <Kamion> I'm writing out just "# /dev/hda1" or whatever above each UUID= line
[07:47] <Keybuk> http://people.ubuntu.com/~scott/convert.txt
[07:47] <Keybuk> Kamion: ^ could you take a read through that and check I'm not insane
[07:58] <BenC> Keybuk: you don't want to make /dev/* be /dev/[hs] d[a-z] *?
[07:59] <Kamion> Keybuk: will have to be tomorrow I think, sorry
[07:59] <BenC> Keybul: there are some off block storage devices majors that might match, and you probably don't want to mess with them
[07:59] <Kamion> there's stuff like /dev/scd0 which is shoved in by default
[07:59] <Kamion> if you have a scsi cdrom during installation
[07:59] <Kamion> anyway, guests, gone
[08:00] <BenC> probably just /dev/[hs] d* would be good
[08:00] <BenC> there's one other issue I'm not sure if it was covered
[08:00] <BenC> the issue of a whole block device being used as a partition
[08:01] <BenC> does by-uuid stuff actually list those?
[08:13] <fabbione> Keybuk: i think you want to be careful about ocfs2|gfs|gfs2
[08:13] <fabbione> it is a common misconception that they are network fs
[08:13] <fabbione> they are indeed local fs
[08:13] <fabbione> but they require net to do cluster operations
[08:13] <fabbione> uhe
[08:13] <fabbione> ops
[08:13] <fabbione> skip "uhe"
[08:13] <fabbione> :)
[08:14] <fabbione> i have stuff like /dev/sdXY mounted as ocfs2/gfs2/gfs
[08:14] <fabbione> or hda for the matter
[08:18] <Keybuk> BenC: should do
[08:19] <Keybuk> fabbione: we can work those out as we go ... safer not to migrate than migrate incorrectly
[08:20] <fabbione> Keybuk: yes agreed. *IF* i am no father by monday we can try them together and see what's needed
[08:20] <BenC> Keybuk: would something like "/dev/* matches, and vol_id says it has a UUID, then convert" be safe?
[08:21] <Keybuk> BenC: that's what that script does, no?
[08:21] <Keybuk> I figured that if vol_id thinks it has a uuid, it means we can mount using UUID=, so it's not going to break anything by converting it
[08:21] <BenC> Keybuk: I meant without all the catches for fstypes and such
[08:22] <Keybuk> BenC: possibly ... I didn't want to accidentally find the UUID of a CD in the drive, and hard-code that ;)
[08:22] <Keybuk> all the tmpfs/network fs stuff is probably not necessary
[08:22] <BenC> cd's are noauto, right?
[08:22] <BenC> and so are fd's?
[08:22] <Keybuk> actually ... let's change that -e to a -b
[08:23] <Keybuk> and then just check for auto/noauto
[08:23] <BenC> one last build/boot test, and the kernel is getting uploaded
[08:32] <Keybuk> http://people.ubuntu.com/~scott/convert.txt
[08:32] <Keybuk> ^ ok, so that doesn't check filesystem types
[08:32] <Keybuk> instead it's converted if it's a _block device_ in /dev, it's not auto fstype or noauto options, and it has a uuid (or we can generate one)
[08:38] <BenC> sweet
[08:40] <BenC> there's one other issue....some people have already been bitten by hd->sd problems with AHCI drivers...
[08:40] <BenC> if rootfs conversion fails, can it be cross-referenced with /proc/mounts and updated?
[08:41] <Keybuk> how do you mean?
[08:41] <BenC> the ahci drivers have caused some ppl to already get sdX instead of the hdX listed in /etc/fstab grub
[08:42] <BenC> so if you check fstab and find /dev/hda1, and it doesn't exist (or worse, is pointing to something else like a CD on another IDE controller), then this gets messed up
[08:43] <BenC> so far it's only causing a few ppl to get long boot times waiting for the rootfs
[08:43] <BenC> and it's not something I can revert because then other people cannot boot at all (driver fails to attach to IDE controller)
[08:43] <BenC> this is the whole ata_ahci and ide ahci driver problem
[08:44] <Keybuk> if someone is messed up by this now, they can't be automatically converted
[08:47] <Keybuk> because by definition, we don't know how their filesystems used to be arranged
[08:48] <Keybuk> if /proc/mounts is right, it means the user has already gone through and fixed their fstab :p
[08:48] <BenC> not true
[08:48] <Keybuk> why not true?
[08:48] <BenC> initramfs will find their /dev/sda1 and mount it as the rootfs
[08:48] <Keybuk> no it won't
[08:48] <BenC> even though /etc/fstab says /dev/hda1
[08:48] <Keybuk> initramfs will be looking for their /dev/hda1
[08:48] <Keybuk> so they'll get a PANIC
[08:49] <BenC> from what I understand it will eventually mount the /dev/sda1
[08:49] <Keybuk> it will eventually mount UUID=... after the conversion
[08:49] <BenC> but there wont be a UUID= if fstab says /dev/hda1 when one doesn't exist
[08:49] <Keybuk> if somebody in dapper is using an ahci, and runs this conversion during the upgrade to edgy, and then reboots ... all will be fine
[08:49] <BenC> dapper is the problem though
[08:50] <Keybuk> if dapper has /dev/sdX instead of /dev/hdX, their machine won't boot
[08:50] <Keybuk> not without them manually changing their grub options and fstab to /dev/sdX
[08:50] <Keybuk> initramfs doesn't magically try /dev/sda1 if /dev/hda1 doesn't work
[08:51] <BenC> maybe I'm misunderstanding the problem I saw then
[08:51] <BenC> could be the user eventually updated the needed files
[08:51] <Keybuk> yeah
[08:51] <Keybuk> I suspect you have a user failing to mention they already changed stuff in the bug report <g>
[08:52] <Keybuk> we have a few situations in dapper where we would have liked to have had mount-by-uuid
[09:46] <kylem> BenC, i think you found the bug.
[10:10] <makx> Keybuk: your convert script doesn't take care of /dev/cciss and /dev/ida
[10:17] <Keybuk> makx: that's deliberate ... do you know whether either of those have UUIDs? :)