[09:10] smoser: ping. Need some ephemeral image debugging help when you get in please. For some reason it's not configuring DNS right even though IP-Config lists correct info (can't resolve ports.ubuntu.com later). I can't reproduce by booting the machine into a normal installation. Is there a way to get a shell onto the system via the console and/or stop it shutting down? [10:25] Daviey: got a moment to talk about how we run dhcpd? [10:31] jtv: Hmm, i can try.. kinda knee deep atm [10:31] Thanks. It's just that /etc/default/isc-dhcp-server seems to be the only place where we can set which interfaces it should serve on. [10:31] rbasak: are you using an apt proxy? [10:32] And so my only option to set that _if I stay within MAAS code_ is to rewrite that file, which is all sorts of ugly. [10:32] Daviey: no. I've got a bit further though. Looks like cloud-init is not writing a resolv.conf at all. Some conflict with resolvconf and the nature of my image perhaps. [10:32] jtv: Well, that is the proper way... but you can supply your own upstart job with a config, if that makes more sense [10:32] rbasak: resolvconf awesomeness ! [10:34] Daviey: I'm not sure it'd be good to have two upstart jobs for the same daemon side by side. Otherwise, tempting. [10:35] Obviously I want to avoid just appending endless lines... is there a standard reusable way to rewrite a single line with some kind of I-wrote-this marker comment in front? [10:37] augeas is the standard answer. Not sure how well adopted that is in Ubuntu. And it's probably overkill [10:37] Don't know what that is, but the name certainly carries an air of overkill! [10:37] It's a generic mechanism to understand different file formats and change them programatically and losslessly [10:37] rbasak: yeah, we never really quite got into the augeas vibe [10:38] Something else then? [10:38] It's a relatively simple problem, but obviously not unique. [10:38] I would use a comment as a marker and combine that with sed [10:38] I think that's the usually done thing [10:38] jtv: Well... having /etc/default/isc-dhcp-server - ENABLED=False.. then writing a /etc/maas/isc-dhcp-server is another option, no? [10:38] Not _particularly_ nice to have the comment on the same line, but... [10:39] But then we're back to writing that file. :) [10:39] Usually it's the immediately preceding line [10:39] Which notes that it's an automatic thing and not to mess with the line below [10:39] * jtv has not mastered inter-line editing in sed [10:39] well, ## Begin MAAS entries\n .. ### End MAAS entries ? [10:39] A comment like that is exactly what I had in mind though. [10:40] I thought sed got horribly involved when you had to edit across lines. [10:40] It's not too bad. [10:40] smoser would fight for awk here :) [10:40] If you use Daviey's start and end markers, then: [10:40] hmm, maybe too much to do in irc [10:40] You may have a point [10:40] * jtv thought so :) [10:41] It's not _very_ much code but given how evil corruption in /etc would be, I'd much prefer something that's already in widespread use. [10:42] Failing that, I can just write some python myself. [10:42] It's against debian policy for packages to change other packages conffiles [10:42] isn't it? [10:42] How would that work for maas packaging then? [10:43] I think the separate upstart job would be cleaner and more policy-compliant [10:43] But if it requires editing /etc/default/isc-dhcp-server to disable the original upstart job, it doesn't make things much better. [10:44] Luckily the dhcp server installs disabled [10:44] Ideally we'd have some kind of conf.d hook here. [10:44] Oh, yes, of course it does! [10:44] *facepalm* [10:44] Oh dear. But that's in the config, innit? [10:44] And we write to the config. [10:44] Have an entirely separate config. /etc/maas/dhcp [10:45] Yeah, that's what springs to mind but it's a lot of weight to carry around. [10:45] The maas dhcp upstart job would fire up the daemon to use that, with separate pidfiles, lockfiles, leases files and everything [10:45] It is [10:46] Unfortunately there's no accepted mechanism for packages to stack up automatically in the way that maas needs to sit higher in the stack and manuiplate dhcp [10:46] So right now that's the only clean way to do it in packaging I think [10:46] But who am I to say this anyway. I'm neither a Debian nor an Ubuntu anything [10:47] And yet you cite debian policy. [10:47] At least it's giving me a much better view of the problem, so I thank you for that. [10:48] You're right about the conf.d hooks btw. That's the normal way of packages getting hooks into other packages [10:50] I suppose /etc/default/isc-dhcp-server could have something like [ -d /etc/default/isc-dhcp-server.d ] && for f in $(run-parts --list /etc/default/isc-dhcp-server.d); do . $f; done [10:51] That could go in as an Ubuntu delta to the dhcp package, and then maas would just need to manipulate /etc/default/isc-dhcp-server.d/maas [10:51] I think that would be clean and policy compliant, but it would be unusual [10:54] ^ that seems better actually [10:56] But it'd have to be in Precise as well. [10:58] Daviey: any chance of getting anything like that into Precise? [10:58] As well as Quantal? [10:59] jtv: well quantal is easy.. precise is so messed up with this massive change, what is more pain. :) [10:59] *cough* [10:59] Who might be able to do such a thing? [11:02] jtv: I honeslty can't see the massive change going into precise at the moment... [11:02] that run-parts style thing, is /possible/ [11:02] Wait... which is the massive change then? You mean writing our own config is the impossible one? [11:02] no, the massive change.. as in, pulling everything back :) [11:03] I don't understand that expression. [11:04] jtv: The plan to SRU trunk maas back to precise.. I have reservations abou it being possible. [11:04] Oh! That one. [11:04] Sorry, I had no idea you were talking about that. [11:05] not saying it can [11:05] Well, one thing at a time then: do you know how we might achieve a runparts extension on /etc/defaults/isc-dhcp-server? [11:05] not saying it can't/won't be done.. i just have reservations [11:06] jtv: Well, we won't achieve it until it's required.. does that make sense? [11:06] ie, we won't update the precise version.. on the off-chance it might be needed in a few months. [11:06] Does that make sense? [11:06] Of course. [11:08] The only other solution that's come up is to write our own upstart job and our own dhcpd config. [11:12] It's a lot of weight to carry around, basically just to work around this relatively standard piece of config machinery not being there. [11:22] jtv: right [11:29] rvba: you wrote the dhcp config files, right? [11:30] jtv: no, Julian did iirc. [11:30] Ah [11:31] Oh, maybe the problem you had was with the dns config. [11:31] Can I help with something? [11:31] The problem where you had to write into the config file. [11:31] Yeah, that's with the DNS config file. [11:31] Well, there's only one place where we can specify what interfaces dhcpd should service. And it's in /etc/default/, so we're not really supposed to change it. [11:31] Alternatives include: [11:32] * Doing it anyway. [11:32] (that's what we did with the DNS stuff) [11:32] * Getting a "runparts" into the packaged version of that file. [11:32] * Creating our own upstart job, and using our own config that isn't in the usual /etc/dhcpd or wherever it normally is. [11:33] (That might be a problem if you add Appamor into the mix) [11:33] Oh dear yes, there's that too. [11:34] Let's hangout briefly if that's ok with you, the decisions we made for the DNS stuff are still fresh in my memory so maybe the two of us can figure out the best way to do this. [11:34] jtv: ^ [11:34] Sure, thanks. I don't have much time, but I have some. [11:34] I'm creating a hangout. [11:35] rvba: inviting... [11:35] https://plus.google.com/hangouts/_/c5f8c770e7c02a1703b226632493d84bb508ad53?authuser=0&hl=en-GB [11:52] Daviey: looks like my only realistic option is to write the existing config file anyway. Thanks for sparring with me! [11:52] * jtv is off [11:53] \o/ [13:33] rbasak, what happened above? [13:33] smoser: I got a bit further [13:33] i'd prefer the separate upstart job myself. [13:33] It seems that resolv.conf isn't being populated for some reason [13:33] and i dont really think its a lot of additional things to carry. its 1 file. [13:33] Oh [13:33] oh. yeah, that too. the above comments were wrt dhcp [13:33] but rbasak yeah, i suspect you have clock issues and dhcp :) [13:34] The dhcp thing is between Daviey and jtv I guess [13:34] or the eth0 naming bug we agreed existed. [13:34] I'm not sure about clock. The rest of DHCP seems to work fine. It's just resolv.conf that's not being populated [13:34] Could it be a race between eth0 and eth1? [13:34] rbasak, where is your image? [13:35] what does /etc/resolv. conf looki like in the pristine image? [13:35] It's a symlink [13:35] because if you're using something older on 12.04, that was busted, and i'm not sure how it would react (although it seems to work fine on intel...) [13:35] rbasak, ok. then i think i know. [13:35] ah. i know. [13:35] I can look up the target if you need [13:35] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1031065 [13:36] Ubuntu bug 1031065 in cloud-init (Ubuntu) "cloud-init-nonet runs 'start networking' explicitly" [Medium,Confirmed] [13:36] i suspect it is that. [13:36] and you can verify by removing the 'start networking' line from cloud-init-nonet. [13:36] its a race condition. [13:36] interestingly, i came upon that bug when I was incorrectly not specifying 'ro' on the kernel command line [13:37] Verify by removing it from where? In the image? [13:37] so you might want to verify that you're doing that as it massages the race differently :) [13:37] in the image, yes. [13:37] so, you *should* boot with 'ro' on the kernel command line, and that *might* "fix" your problem here. [13:37] What's the plan for the armhf image for precise in general? [13:37] rbasak: can you give us a short summary of the dhcp issue? [13:37] err [13:37] sorry [13:37] rvba: [13:37] ^^ [13:38] rbasak, we'll get you one if you need one. [13:38] I definitely need one [13:38] i made a daily of the images available, and my plan was to get those down. [13:38] Critical for MAAS on ARM - in time for 12.10 [13:38] but due to many blocking bugs in quantal maas, i've not been able to sufficiently test the images to move them to 'release'. [13:38] i've verified they're good for precise usage. [13:39] I believe MAAS trunk is essentially working now [13:39] Not sure about the packaging [13:39] roaksoax: the dhcp issue that Jeroen was having? [13:44] rvba: yeah, did you guys reach concensus on how should it be fixed? [13:44] roaksoax: we basically don't have a choice here: with support for a proper conf.d directory, the only solution is to write to the config file directly. Same as what we've done for the DNS config. [13:45] s/with support/without support/ [13:46] roaksoax: can you think of another way to do this? [13:46] rvba: so I asked the securoity team about this the other day [13:46] rvba: and this is the response: "Perhaps it would be easier for MAAS to have a helper that spawns the" [13:46] dhcp and bind daemons itself? This would allow specifying different [13:46] configuration files, in a location that is accessible by the maas [13:47] rvba: which I believe is probably the best way to mess up with files, without messing up with files (if you know what i mean) [13:47] We ruled that out because Apparmor restrict where, say, the named daemon can read its config file quite seriously. [13:47] restricts* [13:47] rvba: right, but we can always ship apparmor profiles [13:47] Right, that would solve the Apparmor problem. [13:48] But this means adding conf.d support for the apparmor profiles. [13:48] rvba: not really no [13:48] Which, IIRC, isn't available yet. [13:48] ah? [13:48] rvba: it is the same as what happened with cobbler/maas-provisiion [13:49] rvba: we just ship an apparmor profile which is also the same as the original one [13:49] rvba: but we add the differences to it [13:49] rvba: so we can just copy the dhcp apparmor profile and those to would lie their [13:50] ie. /etc/apparmor.d/usr.sbin.dhcp and /etc/apparmor/usr.sbin.dhcpd-maas [13:50] or similar [13:50] and the -maas profile includes the files we are touching [13:52] I can't say I know Apparmor very well but I suppose this means that the profile are "cumulative"? I mean that we would have to provide a /etc/apparmor/usr.sbin.named-maas which would sort of" extend" the default /etc/apparmor/usr.sbin.named [13:52] Is that right? [13:55] rvba: I don't know actually if it would work that way [13:55] rvba: we shoud ask jdstrand [13:55] rvba: cause otherwise we ship the same exact profile, plus the location of the maas configu files [13:55] rvba: we will simply have 2 different profiles installed [13:58] smoser: definitely booting with ro. I'll try dropping the "start networking". [14:02] roaksoax: if using the apparmor profile tricks works, this would also mean duplicating all the startup scripts to control our custom dns/dhcp services. Is this really something we want to do? [14:02] rvba: what do you mean by duplicating all the startup scripts? Isn't it just the upstart job? [14:03] roaksoax: yeah, the upstart jobs. [14:03] rvba: yeah, so only 2 upstarts jobs [14:03] 1 for DHCP and 1 for DNS [14:04] but for now we can just try this out with DHCP [14:10] rvba, i +1 roaksoax's solutoin. this is the right way especially for dhcp. we can probably find some way to get around bind, though. [14:10] using groups i think. [14:10] but that might be more trouble than its worth also [15:45] roaksoax: Is python-txtftp published to a PPA somewhere? [15:46] Or, where's it coming from for precise? [15:46] allenap: for precise its on maas-trunk [15:46] allenap: are you planing any fixes? [15:46] roaksoax: Where's the maas-trunk PPA? [15:48] allenap: err there';s no precise package fro python-txtftp [15:48] we dont want it [15:48] because we wont be able to have it in the archives [15:49] allenap: python-tftftp is installed as part of maas [15:51] roaksoax: debian/control says that python-django-maas depends on python-txtftp. How will that work on precise then? [15:51] allenap: there's a different precise branch [15:51] Ah, right. [15:51] allenap: the precise packaging branch is stacked on top of the quantal one [15:54] smoser: ping [15:55] smoser: I've tried the workaround you suggested, but now it cloud-init seems to failcompletely. It doesn't even attempt apt-get update [15:55] cloud-init-nonet gave up waiting for a network device. [15:55] Then it lists eth0 as configured correctly, and eth1 as unconfigured this time [15:56] What is cloud-init-nonet doing here, anyway? [15:57] rbasak, cloud-init-nonet ensuring that network comes up before cloud-init.conf runs [15:57] * rbasak doesn't understand why this isn't called cloud-init-net [15:58] rbasak, i can help if you can point me at something i can look at. [15:58] but i'm almost certain you're seeing the bug that i pointed you at. [15:58] smoser: I've tried the workaround you suggested - I removed "start networking" [15:58] combined with the other bug we discussed (which does not have a launchpad bug for) of eth0 not being the device that was pxe booted from. [15:58] Behaviour has changed now [15:58] It definitely has booted off eth0 [16:00] smoser: is it significant that it lists eth1 _before_ eth0? [16:01] rbasak, well are hte mac's right? [16:01] did it boot off of eth0 ? [16:01] Yes [16:01] then its not significant [16:02] smoser: am I right in thinking that this bug isn't arch specific, and not specific to having two interfaces either then? === matsubara is now known as matsubara-lunch [16:03] i dont think its specific to 2 nics. [16:03] i dont think its arch specific [16:03] if it is arch specific, it is not really because of arch, but rather because of some config somewhere that makes an assumption [16:04] or because of race condition [16:04] I think this is release critical for maas [16:04] that just happens because arm has different bottlenecks. [16:04] well, yes, its clearly critical [16:04] OK, where do we need to go from here? [16:06] rbasak, other things stoped me from poking further at bug 1031065 after i found that the bug we were working around by adding 'start networking' was not fixed otherise (ie, we still needed 'start networking' to correctly boot under lxc). [16:06] Launchpad bug 1031065 in cloud-init (Ubuntu) "cloud-init-nonet runs 'start networking' explicitly" [Medium,Confirmed] https://launchpad.net/bugs/1031065 [16:06] i ran out of time the day i was poking at it nad haven't been bakc. [16:06] it is possible that if we debug that firuter, the root of the problem would also be the root of your issue. [16:06] its also possible they're unreleated. [16:07] I understand that you have too much to do in too little time. I didn't mean to imply that you were slacking! I'm just not sure how to proceed right now and this issue completely blocks me [16:08] rbasak, i'm not complaining. [16:08] rbasak, so there are 3 issues here. [16:09] a.) 1031065 documents the fact that cloud-init should not have 'start networking' as it does. but removal of that breaks booting under lxc. we need to fix that. [16:09] b.) your issue, which seems unrelated to me, but may be the core cause of a. [16:10] c.) nothing in the ephemeral images is going to force 'eth0' to be "pxe booted interface". however we assume that. [16:10] i dont have a bug opened for 'c', but i'd call that critical too [16:10] As an aside, there's no support for ipappend in U-Boot right now [16:11] rbasak, right. so i'm not sure how we'd solve that for arm, but the solutoin is possible for intel [16:11] I want to propose that rather than getting that through, we instead have the MAAS dynamic TFTP server just supply the MAC address of the node it is responding for in the kernel command line, if that will work [16:11] rbasak, the tftp server is a IP application, no? [16:12] would it necessarily (without arp hackery) know the mac of the client? [16:12] and if it could figure that out, i suspect it would still break any case where the client was not on the same network [16:12] I think arp hackery may be needed to make that work, but I think that would be cleaner than ipappend [16:12] I'd like to assume that the tftp server is on the same network if I am permitted to do that [16:13] It seems unclean to me for the node to boot and re-dhcp and then assume that the pxelinux supplied IP is the same as the one on the correct interface that it dhcp'd [16:13] Oh [16:14] The TFTP currently does know the mac of the client [16:14] it's in the pxelinux.cfg/01- [16:14] that it tried to fetch [16:14] Only catch is that if I have it fall back to default for arch detection then that will break [16:14] (without keeping some state which is horriblew) [16:16] tftp does work generically over ip [16:16] so mac cannot actually be assume di dont think . unless its part of the tftp protocol [16:16] (ie, inside the packet) [16:16] Oh wait - you're using ipappend 2? [16:16] for the mac directly? [16:17] rbasak, i believe we use http://www.syslinux.org/wiki/index.php/SYSLINUX#IPAPPEND_flag_val_.5BPXELINUX_only.5D [16:17] at least previously cobbler used that, and the installer knows how to handle that. [16:18] OK, just checked. It's ipappend 2 which adds bootif= [16:19] so yeah, 2 [16:19] So a workaround for U-Boot would be to supply bootif= from maas if it knows it [16:19] rbasak, yes. tha twuld work. [16:19] except where it is not known. [16:19] yeah [16:19] but, as you say, that might not be a requirement. [16:19] like enlistment, without storing state from the previous miss :-/ [16:19] and, i'm not certain that its *not* in a tftp request [16:20] although my argument about it being IP breaks that too [16:20] Just checked. TFTP doesn't include it [16:21] but it is supplied by pxelinux in a previous pxelinux.cfg/01- (what will be a) miss [16:22] would you mind if we define a missing bootif to mean eth0? [16:24] rbasak, ok. i just see what you were saying about 01-MAC now. [16:24] rbasak, well w can just make the fall through case "do nothing" [16:25] but "eth0" is completely arbitrary. i think we're to the point in the kernel now that upgrades probably consistently order network adapter names on the same bus consistently [16:26] but hard coding eth0 basically implies/enforces wiring in a specific way. which sucks. but we dont have a lot of other options. [16:26] rbasak, how does pxe work? [16:26] that's a bit of an open question! [16:27] it dhcp's , uses that IP to then do a tftp i guess. [16:27] Yes [16:27] I think the NIC it uses is hardware-defined [16:27] The first one on the case [16:27] (probably) [16:28] I've never seen/heard of a real server trying to PXE off a second nic, but I don't usually PXE them so I may be wrong there [16:28] what is our tftp server? [16:28] It's one that's now built into maas [16:28] (some twisted thing) [16:28] can you g+ really quick? [16:28] Sure === matsubara-lunch is now known as matsubara [18:48] allenap: still around? === bjf is now known as bjf[afk] === bjf[afk] is now known as bjf