[00:55] <horatio> I'm having troubles with IPMI + Commissioning. I get 3 of "could not open device at /dev/ipmi0 .. no such file or directory", followed by initscript ipmievd action "start" failed. I added a backdoor and ran the maas_ipmi_autodetect.py script, which works. Then I ran IPMI tool with the credentials the python script returns, and that works too.  And when I check, /dev/ipmi0 exists.
[00:56] <horatio> The IPMI commissioning failures look like they're blocking the reboot at the end of the script though.
[08:55] <jtv> Who's up for a pre-imp?
[08:56] <jtv> Because I've got to do some actual product improvement done this week, as opposed to specs and meetings, or I'll go mad.
[10:12] <jgrassler> Good morning.
[10:12] <jgrassler> jtv: I found out where yesterday's problem occurs.
[10:13] <jgrassler> local_host, local_port = get("local", (None, None))   # tftp.py, line 180
[10:14] <jgrassler> This retrieves the machine's IP addres from twisted, which is quite sensible in most cases, but not in mine.
[10:14] <jgrassler> (since it amounts to what you'd get from `ifconfig eth0`)
[10:18] <jgrassler> I fixed it in a rather messy manner by setting params['local'] manually in the next line, but that's not exactly clean.
[10:19] <jgrassler> I might cobble up a patch that allows for configuring params['local'] in pserv.yaml - would that have a chance of getting accepted upstream?
[10:29] <dimitern> hey guys, just a heads-up; as discussed on the x-team call, I've filed a couple of bugs - https://bugs.launchpad.net/maas/+bug/1390404 and https://bugs.launchpad.net/maas/+bug/1390411
[10:35] <dimitern> and I've also commented on a few static ip addresses bugs I filed (mostly questions about whether what's left will be fixed in time for 1.7.1 or 1.7.2)
[11:03] <jtv> Hi jgrassler — thanks for digging that up!   Let me just digest the whole thing...
[11:07] <jtv> allenap, did you see jgrassler's note above?  Looks like the tftp code doesn't use MAAS_URL when parameterising a boot method, but "whatever my address is."
[11:08] <allenap> jtv, jgrassler: I need to remind myself what that code is meant to do.
[11:09]  * jtv mumbles his usual rant about documenting code
[11:10] <jtv> The "local" parameter goes into the PXE config as the iscsi address for the boot image.
[11:11] <jtv> Where do we serve iscsi now?
[11:11] <jtv> If it's the cluster controller, then MAAS_URL doesn't apply of course.
[11:11] <allenap> It should be the cluster controller.
[11:11] <jtv> Blast.
[11:12] <jtv> We have an abstraction for "address where nodes can find the region controller," but no equivalent for the cluster controller.
[11:13] <allenap> Well, here we discover it from the address on which the node has contacted the cluster controller.
[11:13] <allenap> That should be accurate.
[11:13] <allenap> Unless NAT or something gets in the way.
[11:13] <jtv> Which seems to be the case here.  :(
[11:14] <jtv> Maybe this should really be the cluster interface address.
[11:14] <allenap> The cluster can have multiple interfaces, right?
[11:14] <jtv> (Which, I know, just raises more wrinkles)
[11:14] <jtv> Yes, a single cluster can manage multiple subnets...
[11:16] <roaksoax> scsi is on the cluster
[11:16] <roaksoax> jtv: we cannot bind the clsuter to one single address
[11:16] <jtv> Yeah.
[11:16] <roaksoax> because the cluster manages different networks
[11:17] <allenap> I think supporting cluster controllers behind NAT is a can of worms at the end of a rabbit hole that I really don't want to get drawn in to. jgrassler, I think it's unlikely that we'll support it.
[11:18] <roaksoax> if we do NAT< then MAAS would have to inject rules for all the networks it knows about
[11:18] <jtv> In this case MAAS is not involved in managing the NAT (and couldn't be).
[11:19] <roaksoax> jtv:that doens't mean that we wont :)
[11:19] <roaksoax> jtv: but that's not something we will be doing anytime soon
[11:19] <jtv> I'm guessing if only we could determine an appropriate cluster interface, the cluster interface's address would be the right one here.
[11:20] <roaksoax> jtv: we can't really know what's te right clsuter interface
[11:20] <roaksoax> jtv: when It comes to know, the right clsuter interface is the interface they are being managed from
[11:20] <allenap> jtv: I think MAAS would need to know its real address and the address that outsiders know it by, and relate the two.
[11:21] <jtv> I think we want "the cluster interface's address from a given node's point of view."
[11:21] <jtv> Which, yes, is a can of worms.  :(
[11:21] <roaksoax> jtv: right, and that we *can* know
[11:21] <roaksoax> jtv: with the NIC->network matching, we can know
[11:21] <jtv> Not in this case, I think.
[11:21] <jtv> (We're talking about a very specific scenario here)
[11:22] <roaksoax> jtv: well, i think we should discuss this in Austin
[11:22] <gmb> jtv, rvba, allenap: Branch needs review: https://code.launchpad.net/~gmb/maas/check-for-overlapping-cluster-networks/+merge/241061
[11:23] <jtv> roaksoax: you mean supporting NAT between the node and the cluster controller?  I'm just mulling it over in hopes of finding an easy solution, but if we don't find one, is it a use-case we want to support?
[11:34] <roaksoax> jtv we might want to support doing nat when both the cluster and region are in the same machine
[11:34] <jtv> That is the case here.
[11:41] <gmb> allenap: Thanks for the review. I've replied. After your comments, I think it's safer to just disallow overlaps altogether. Sound sane to you?
[11:44] <allenap> gmb: Sounds good to me, but I'd like rvba to take a look too. rvba, can you look at Graham's last diff comment on https://code.launchpad.net/~gmb/maas/check-for-overlapping-cluster-networks/+merge/241061?
[11:59] <rvba> allenap: sure
[12:08] <jtv> Python wishlist item: have "import" propagate indirect ImportErrors as a different type, so we can tell "I'm trying to import something that doesn't exist" from "I'm trying to import a file with a broken import in it."
[12:09] <jtv> (Because I'm tired of test runners reporting that my test doesn't exist just because my test contains an import error)
[12:14] <jgrassler> jtv, allenap: Sorry, I missed the discussion (lunch o' clock got in the way)
[12:15] <jtv> jgrassler: it's not good news I'm afraid — I hadn't realised yesterday that there'd be NAT between the cluster controller and the nodes.
[12:16] <jgrassler> I can relate to not wanting to support the oddball scenario we've got here - I'll just fix it locally by templating the address into tftp.py with puppet
[12:16] <jgrassler> It's ugly but it'll do for now
[12:18] <jgrassler> These floating IP addresses are a bit of a nuisance, unfortunately.
[12:18] <jgrassler> It's not the first time we've run afoul of the problem :-)
[12:31] <jtv> And this is one area where even in 1.7 you won't get IPv6.  :(
[12:33] <jgrassler> That'll be another can of worms at some point in the future...
[12:38] <gmb> rvba: I think you missed what allenap was asking about… See the final comment on the *original* diff (circa line 124)
[12:38] <gmb> (https://code.launchpad.net/~gmb/maas/check-for-overlapping-cluster-networks/+merge/241061)
[12:39] <gmb> rvba: He's spotted a problem with the assumption that different clusters can define the same networks. And I think he's right.
[12:43] <rvba> gmb: when we discussed about this yesterday, we were talking about overlapping networks *in the same cluster*.  I don't think it makes sense to have a node related to many clusters (not related in DB terms, I'm talking network here)
[12:46] <gmb> rvba: Right, so the point allenap is making then — that no two interfaces *anywhere* in a region's scope should have overlapping networks - makes sense. It's not that the node is related to two clusters, its that two independent clusters can right now define interfaces with exactly the same network settings. Which looks fine on paper — they're not the *same* network on the physical level — but once you get to layer 3 and above, they're identical, w
[12:54] <roaksoax> they could be yes
[12:54] <rvba> gmb: what I mean it that I don't see why we would have to enforce that in MAAS.  The only problem we could see was the IP allocation and it only becomes a pb if a node is connected (network) to 2 clusters.
[12:54] <rvba> s/I mean it/I mean is/
[12:55] <roaksoax> rvba: rihgt, but nodes should not be connected to two cluster, should they?
[12:56] <rvba> roaksoax: yeah, that's my point.
[12:56] <rvba> roaksoax: but it's not something we enforce anywhere.
[13:03] <gmb> rvba: So, my concern is that we're leaving a potential footgun lying around for people if we allow them to do stuff like this. OTOH, you could do some NATing at the cluster level, so maybe it's not a big deal and we should let them. I'm happy with either solution, TBH.
[13:03] <gmb> And we probably should't be telling network admins what to do.
[13:03] <gmb> Come to think of it :)
[13:04] <rvba> gmb: yeah, I think it's the admin's job to sort out the routing.  Unless letting them configure identical network will break something in MAAS, I think we should let them do so.
[13:05] <rvba> networks*
[13:05] <gmb> rvba: Okay, I'm happy with that.
[13:10] <roaksoax> rvba: right, but that's not something we reocmmend either
[13:11] <rvba> roaksoax: probably not.  But I don't think we should forbid this (again, unless it breaks something in MAAS itself).
[13:13] <roaksoax> rvba: yeah, if someone does that it is their own fault
[13:16] <jesk> hi
[13:17] <jesk> trying to understand maas... having problems it :-)
[13:17] <jesk> s/it/with it/
[13:18] <jesk> i'am not able to get informations of the boot order process
[13:19] <jesk> what I could see so far was that a) server boots via PXE, b) server reboots again and boots via PXE (whyever two times) c) shuts off
[13:20] <jesk> when trying to install something (only tried juju quickstart) a) server boots and installs image b) reboots and boots again from PXE
[13:20] <jesk> do I need to deactivate PXE manually or is that handled by MAAS?
[13:22] <gmb> allenap, rvba, jtv: 'Nother branch for all y'all. https://code.launchpad.net/~gmb/maas/fix-ipmi-wording-bug-1304518/+merge/241075
[13:22] <jtv> I'll take it.
[13:23] <allenap> jtv: I've done it.
[13:23] <jtv> Grrr
[13:23] <allenap> Sorry :)
[13:24] <jtv> Bikeshed derby is ON!
[13:24] <jesk> is there any real technical doc about MAAS?
[13:24] <jesk> or just cloud-style powerpoint informations :D
[13:26] <jtv> jesk: http://maas.ubuntu.com/docs1.5/
[13:28] <jesk> jtv: those docs dont explain what happens when you really want to deploy nodes
[13:29] <jtv> It's an old version...  more recent docs on maas.ubuntu.com may help.  Did you have anything specific in mind?
[13:29] <jesk> its more like "type this and that"
[13:29] <jesk> jtv: i dont get the overall picture of it
[13:30] <jesk> jtv: concrete use case
[13:31] <jesk> currently i'am only having one MAAS node and now i want to deploy more nodes. I'am not coming over the step of "booting a server from pxe" which shuts down after PXE boot
[13:31] <jtv> Okay, so you're clearly beyond the part covered in the Orientation section.
[13:31] <jtv> Which is good.
[13:32] <jesk> even wake on lan works
[13:32] <jesk> "start node" -> node starts, boots from pxe -> and down again
[13:32] <jtv> Ah, wake-on-LAN is awkward because it has no way to shut down a node.
[13:32] <jtv> So you already commissioned and allocated the node?
[13:32] <jesk> yeah, but i'am happy with shutting down fvor the moment via ILO-manually
[13:33] <jesk> yeah I did that, *but* i'am not able to get the knowledge what it even means :-)
[13:33] <jtv> Right.
[13:33] <jtv> I'm sure we documented this _somewhere_, but let me be lazy first and summarise.
[13:33] <jesk> one node is currently in "allocated to root"-state and one in "ready"-state
[13:34] <jtv> OK.  The "allocated to root" one should be either being installed, or up and installed with your system and your ssh key.
[13:34] <jtv> Or, crucially, it could be waiting for you to start it.
[13:34] <jesk> "allocated to root"-state because of issueing "juju quickstart" most probably, which unfortunately ended in nothing
[13:34] <jtv> (This all gets much better in the 1.7 which we're currently in the process of releasing)
[13:35] <jtv> Oh, this was done through juju?
[13:35] <jesk> i'am not 100% sure if juju started through MAAS a node
[13:35] <jesk> but i could saw with remote console that a system was installed
[13:36] <jtv> Oh, an operating system was installed on that node?
[13:36] <jtv> That's good.
[13:36] <jesk> after automatic reboot it was booted again via PXE and juju quickstart timeout
[13:36] <jtv> When you bootstrap a juju environment, it allocates a node for itself.
[13:37] <jtv> It asks MAAS to allocate a node, and when it gets a node, it tells MAAS to start the node up.
[13:37] <jtv> As the node starts up, it netboots off the MAAS server, and boots into an install image.
[13:37] <jtv> Thus it installs an OS, and the user's SSH keys.
[13:37] <jtv> Then it reboots into the OS that it just installed.
[13:37] <jtv> At this point the user (which I guess here is juju) has a working node.
[13:38] <jtv> I guess your node got installed, and then shut down... and did it come back up after that?
[13:38] <jesk> in my case (i believe) it rebooted from PXE again :-)
[13:39] <jesk> so the user has to make sure that PXE is turned off on the server when the server reboots finally after OS installation?
[13:40] <jesk> or can the PXE boot image check if the server was started for normal operation after OS installation and boots from local disk?
[13:41] <jesk> boot order is: (1) PXE (2) CD (3) HDD
[13:41] <jtv> Once the node is deployed (as this one seems to be), it will boot off its own disk.
[13:41] <jesk> to have the flexibility to always boot from PXE for new installation PXE has to be (1)
[13:41] <jtv> So no need to change that order.
[13:42] <jesk> if it boots from disk directly boot order must be alsways (1) HDD first
[13:42] <jesk> but then iam not able to boot from PXE if i want to
[13:42] <jtv> If the node tries to netboot while it's deployed, the MAAS server tells it to boot from local disk.
[13:42] <jesk> and HDD gets boots always as soon as the HDD has an valid boot record
[13:43] <jesk> ah! so via PXE it gets told to boot from local disk
[13:43] <jtv> Yeah.  No need to change that order: just always let it netboot.
[13:43] <jesk> interesting but unfortunately seemed not work
[13:43] <jtv> Any symptoms?
[13:44] <jesk> it booted from PXE
[13:44] <jesk> saw this via console
[13:44] <jesk> but maybe "juju quickstart" dont tell MAAS to handle that installation persistent and leave it as "new installed node"?
[13:45] <jesk> so much magic :-)
[13:45] <jesk> i'am just dumb network engineer playing with that stuff
[13:45] <jtv> So the node that was "allocated to root" booted from PXE?  What happened then?
[13:45] <jesk> (with a bit of linux and freebsd background)
[13:46] <jesk> it shut off after that
[13:46] <jtv> (Yes, far too much complexity — there's a lot less you can count on once you cross the boundaries between machines and between reboots)
[13:46] <jtv> It shut off...
[13:46] <jtv> That normally means it's not allocated.
[13:46] <jesk> ah those server shouldnt shut off after PXE boot?
[13:47] <jtv> Now, the situation as I understand it is that you have two nodes: #1 was deployed by Juju itself, and #2 is in the Ready state.
[13:48] <jtv> Servers will PXE-boot rather a lot... it depends on the situation.  *During deployment* there should be one reboot, from the install image into the installed system.
[13:48] <jesk> yes, but both off
[13:48] <jtv> Is it possible that the wake-on-LAN simply didn't come through?  Again, things are much better in 1.7, but in 1.5 the server just wouldn't notice.
[13:49] <jesk> wake on lan works, i dont get how a system is installed at all. What I saw is that servers boot two times from PXE, but dont install a full blown OS
[13:50] <jesk> and doesnt matter what I do they dont come up with a plain OS boot
[13:50] <jesk> they always boot something from PXE which ends in a shutdown after that
[13:50] <jtv> I wonder if maybe you don't have the boot images you need...
[13:51] <jesk> so the goal ist that MAAS would install the image I gave the node via MAAS frontend
[13:51] <jtv> Well you wouldn't have to provide an image; MAAS downloads those by itself.
[13:51] <jesk> and it would install and boot it similar to installation from CD
[13:52] <jesk> ending in terminal prompot
[13:52] <jtv> Well, login prompt.  :)
[13:52] <jesk> yes :)
[13:52] <jesk> however it finds out charsets, language, time zone, disk partitions...
[13:52] <jtv> Let me just summarise what phases these pxe-boots go through:
[13:53] <jtv> First you "enlist" nodes — usually simply by turning them on and letting them netboot off the MAAS server.
[13:53] <jtv> They then register their existence with MAAS.
[13:53] <jtv> Then, you tell MAAS that you want to "commission" them.
[13:53] <jtv> MAAS boots them up, but into an ephemeral image, and builds an inventory of the node's hardware.
[13:54] <jtv> After this step, a node is Ready.
[13:54] <jtv> If you got to this point, that should mean that basic things like netbooting already work.
[13:55] <jtv> I do believe that ILO has some IPMI quirks, but if you're using wake-on-LAN, I don't think those would affect you.
[13:55] <jesk> (the wakeonlan package isnt installed by dependencies btw, had to manually do this)
[13:56] <jesk> ok those steps were all done I guess, the server registered, I see their MACs, and both were already in the ready-state
[13:56] <jesk> but what after that
[13:57] <jtv> If you go to a Ready node's UI page, there are buttons to allocate and start the node.
[13:57] <jtv> (You may want to log in as a non-admin user to hide the atypical steps for now)
[13:58] <jtv> Did you upload your SSH public key?
[13:58] <jesk> yes
[13:58] <jtv> Then the Start button should boot the node into the installer.
[13:58] <jesk> when I press "start node" what will happen?
[13:58] <jesk> ìt boots again from PXE?
[13:58] <jtv> Yes, into an installer.
[13:58] <jesk> ah ok
[13:58] <jtv> That then installs the OS (which is always Ubuntu in 1.5).
[13:59] <jtv> (If you edit the node you can select a different release.)
[13:59] <jtv> When the installer is done, it reboots the node.
[13:59] <jtv> At that point the node should come back up, into the OS that was just installed  — with your SSH keys on it.
[13:59] <jesk> and this last step can also be managed by juju for example?
[13:59] <jtv> Yes.
[14:00] <jtv> When you ask Juju to start a unit, it allocates and starts that node.
[14:00] <jtv> (It provides some custom data to install the charm you want, of course.)
[14:01] <jtv> Once you have the node up and running, it's utterly yours.  You can reboot it, mess with the OS, etc.  Just don't disable PXE-boot or it will be hard for MAAS to manage after you release it.  :)
[14:01] <jesk> thanks so far, jtv
[14:01] <jtv> Juju has a very cloud-y view of machines, so it will tend to think of machines as things you start up once, use for as long as you need it, and then discard.
[14:01] <jtv> np.
[14:02] <jesk> i will play a little more
[14:02] <jtv> OK.  Let me know when you want to tackle the Mystery of the Phantom Server.
[14:02] <jesk> of what? :D
[14:03] <jtv> Yeah, this analogy isn't working very well.  These titles usually complain about dead people/ships/animals acting as if they're alive, not the other way around.
[14:03] <jesk> i would really like to install the whole openstack magic on box for now
[14:03] <jesk> and have like 6 nodes for storage and computing, just for the possibilities in my lab
[14:04] <jesk> one openstack box
[14:04] <jtv> One small caveat: MAAS can manage VMs, but it doesn't create them.
[14:04] <jesk> but unfortunately the guides install like 6 openstack servers for having 2 compute and 2 storage nodes
[14:05] <jesk> ok thanks for the hint
[14:16] <jesk> interesting OS was installed
[14:17] <jesk> but cant login, SSH pubkey of my user doesnt work
[14:17] <jesk> :D
[14:17] <jesk> wonder how juju handled that...
[14:20] <jesk> muha... my mistake sorry for the spam... user ubuntu ...
[14:25] <jtv> :)
[14:40]  * jtv steps out for a break
[16:34] <jesk> it's a bit of shame that if you do an openstack installation like ubuntu guides suggest that on all edges and corners things dont work as explained
[16:35] <jesk> i would expect that for foreign howtos, but from the distribution itself
[16:38] <lutostag> jesk: where at? I'd like to fix that if possible?
[16:40] <jesk> just take one of the manuals about installing openstack, this is really not a flame... i'am now trying for few days to install it in all kinds of variants... without success
[16:41] <jesk> next error happened right now:
[16:41] <jesk> 2014-11-07 14:56:27 ERROR juju.cmd supercommand.go:305 gomaasapi: got error back from server: 401 OK (Expired timestamp: given 1415372187 and now 1415381494 has a greater difference than threshold 300)
[16:41] <jesk> its a mess
[16:42] <roaksoax> rvba: ^^
[16:42] <jesk> you need a lot of clue of all components, maybe then its possible to install that stuff, but then please no big marketing about "openstack installation from canonical step by step guides"
[16:45] <roaksoax> jesk: what guides are you using?
[16:45] <jesk> i tried all I could found :D
[16:45] <jesk> maas guides, juju guides, openstack guides
[16:45] <jesk> all from the ubuntu doc archive, and also foreign stuff
[16:46] <roaksoax> jesk: like?
[16:46] <jesk> what do you mean?
[16:47] <jesk> like that https://insights.ubuntu.com/2014/05/21/ubuntu-cloud-documentation-14-04lts/
[16:48] <jesk> or just the openstack-install package
[16:50] <roaksoax> jesk: http://insights.ubuntu.com/wp-content/uploads/UCD-latest.pdf?utm_source=Ubuntu%20Cloud%20documentation%20%E2%80%93%2014.04%20LTS&utm_medium=download+link&utm_content= that's what you need to follow
[16:50] <roaksoax> maybe you just run into a bug with juju