/srv/irclogs.ubuntu.com/2014/11/07/#maas.txt

=== CyberJacob is now known as CyberJacob|Away
horatioI'm having troubles with IPMI + Commissioning. I get 3 of "could not open device at /dev/ipmi0 .. no such file or directory", followed by initscript ipmievd action "start" failed. I added a backdoor and ran the maas_ipmi_autodetect.py script, which works. Then I ran IPMI tool with the credentials the python script returns, and that works too.  And when I check, /dev/ipmi0 exists.00:55
horatioThe IPMI commissioning failures look like they're blocking the reboot at the end of the script though.00:56
=== jfarschman is now known as MilesDenver
=== mscheel is now known as Guest43984
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== CyberJacob|Away is now known as CyberJacob
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== CyberJacob is now known as CyberJacob|Away
jtvWho's up for a pre-imp?08:55
jtvBecause I've got to do some actual product improvement done this week, as opposed to specs and meetings, or I'll go mad.08:56
=== jfarschman is now known as MilesDenver
jgrasslerGood morning.10:12
jgrasslerjtv: I found out where yesterday's problem occurs.10:12
jgrasslerlocal_host, local_port = get("local", (None, None))   # tftp.py, line 18010:13
jgrasslerThis retrieves the machine's IP addres from twisted, which is quite sensible in most cases, but not in mine.10:14
jgrassler(since it amounts to what you'd get from `ifconfig eth0`)10:14
jgrasslerI fixed it in a rather messy manner by setting params['local'] manually in the next line, but that's not exactly clean.10:18
jgrasslerI might cobble up a patch that allows for configuring params['local'] in pserv.yaml - would that have a chance of getting accepted upstream?10:19
dimiternhey guys, just a heads-up; as discussed on the x-team call, I've filed a couple of bugs - https://bugs.launchpad.net/maas/+bug/1390404 and https://bugs.launchpad.net/maas/+bug/139041110:29
ubot5Ubuntu bug 1390404 in MAAS "new API to get network interfaces information for a node" [Undecided,New]10:30
ubot5Ubuntu bug 1390411 in MAAS "add docs on how to take advantage of maas-proxy to cache custom images (e.g. for LXC/KVM containers)" [Undecided,New]10:30
dimiternand I've also commented on a few static ip addresses bugs I filed (mostly questions about whether what's left will be fixed in time for 1.7.1 or 1.7.2)10:35
=== jfarschman is now known as MilesDenver
jtvHi jgrassler — thanks for digging that up!   Let me just digest the whole thing...11:03
jtvallenap, did you see jgrassler's note above?  Looks like the tftp code doesn't use MAAS_URL when parameterising a boot method, but "whatever my address is."11:07
allenapjtv, jgrassler: I need to remind myself what that code is meant to do.11:08
* jtv mumbles his usual rant about documenting code11:09
jtvThe "local" parameter goes into the PXE config as the iscsi address for the boot image.11:10
jtvWhere do we serve iscsi now?11:11
jtvIf it's the cluster controller, then MAAS_URL doesn't apply of course.11:11
allenapIt should be the cluster controller.11:11
jtvBlast.11:11
jtvWe have an abstraction for "address where nodes can find the region controller," but no equivalent for the cluster controller.11:12
allenapWell, here we discover it from the address on which the node has contacted the cluster controller.11:13
allenapThat should be accurate.11:13
allenapUnless NAT or something gets in the way.11:13
jtvWhich seems to be the case here.  :(11:13
jtvMaybe this should really be the cluster interface address.11:14
allenapThe cluster can have multiple interfaces, right?11:14
jtv(Which, I know, just raises more wrinkles)11:14
jtvYes, a single cluster can manage multiple subnets...11:14
roaksoaxscsi is on the cluster11:16
roaksoaxjtv: we cannot bind the clsuter to one single address11:16
jtvYeah.11:16
roaksoaxbecause the cluster manages different networks11:16
allenapI think supporting cluster controllers behind NAT is a can of worms at the end of a rabbit hole that I really don't want to get drawn in to. jgrassler, I think it's unlikely that we'll support it.11:17
roaksoaxif we do NAT< then MAAS would have to inject rules for all the networks it knows about11:18
jtvIn this case MAAS is not involved in managing the NAT (and couldn't be).11:18
roaksoaxjtv:that doens't mean that we wont :)11:19
roaksoaxjtv: but that's not something we will be doing anytime soon11:19
jtvI'm guessing if only we could determine an appropriate cluster interface, the cluster interface's address would be the right one here.11:19
roaksoaxjtv: we can't really know what's te right clsuter interface11:20
roaksoaxjtv: when It comes to know, the right clsuter interface is the interface they are being managed from11:20
allenapjtv: I think MAAS would need to know its real address and the address that outsiders know it by, and relate the two.11:20
jtvI think we want "the cluster interface's address from a given node's point of view."11:21
jtvWhich, yes, is a can of worms.  :(11:21
roaksoaxjtv: right, and that we *can* know11:21
roaksoaxjtv: with the NIC->network matching, we can know11:21
jtvNot in this case, I think.11:21
jtv(We're talking about a very specific scenario here)11:21
roaksoaxjtv: well, i think we should discuss this in Austin11:22
gmbjtv, rvba, allenap: Branch needs review: https://code.launchpad.net/~gmb/maas/check-for-overlapping-cluster-networks/+merge/24106111:22
jtvroaksoax: you mean supporting NAT between the node and the cluster controller?  I'm just mulling it over in hopes of finding an easy solution, but if we don't find one, is it a use-case we want to support?11:23
roaksoaxjtv we might want to support doing nat when both the cluster and region are in the same machine11:34
jtvThat is the case here.11:34
gmballenap: Thanks for the review. I've replied. After your comments, I think it's safer to just disallow overlaps altogether. Sound sane to you?11:41
=== jfarschman is now known as MilesDenver
allenapgmb: Sounds good to me, but I'd like rvba to take a look too. rvba, can you look at Graham's last diff comment on https://code.launchpad.net/~gmb/maas/check-for-overlapping-cluster-networks/+merge/241061?11:44
rvbaallenap: sure11:59
jtvPython wishlist item: have "import" propagate indirect ImportErrors as a different type, so we can tell "I'm trying to import something that doesn't exist" from "I'm trying to import a file with a broken import in it."12:08
jtv(Because I'm tired of test runners reporting that my test doesn't exist just because my test contains an import error)12:09
jgrasslerjtv, allenap: Sorry, I missed the discussion (lunch o' clock got in the way)12:14
jtvjgrassler: it's not good news I'm afraid — I hadn't realised yesterday that there'd be NAT between the cluster controller and the nodes.12:15
jgrasslerI can relate to not wanting to support the oddball scenario we've got here - I'll just fix it locally by templating the address into tftp.py with puppet12:16
jgrasslerIt's ugly but it'll do for now12:16
jgrasslerThese floating IP addresses are a bit of a nuisance, unfortunately.12:18
jgrasslerIt's not the first time we've run afoul of the problem :-)12:18
jtvAnd this is one area where even in 1.7 you won't get IPv6.  :(12:31
jgrasslerThat'll be another can of worms at some point in the future...12:33
gmbrvba: I think you missed what allenap was asking about… See the final comment on the *original* diff (circa line 124)12:38
gmb(https://code.launchpad.net/~gmb/maas/check-for-overlapping-cluster-networks/+merge/241061)12:38
gmbrvba: He's spotted a problem with the assumption that different clusters can define the same networks. And I think he's right.12:39
=== jfarschman is now known as MilesDenver
rvbagmb: when we discussed about this yesterday, we were talking about overlapping networks *in the same cluster*.  I don't think it makes sense to have a node related to many clusters (not related in DB terms, I'm talking network here)12:43
gmbrvba: Right, so the point allenap is making then — that no two interfaces *anywhere* in a region's scope should have overlapping networks - makes sense. It's not that the node is related to two clusters, its that two independent clusters can right now define interfaces with exactly the same network settings. Which looks fine on paper — they're not the *same* network on the physical level — but once you get to layer 3 and above, they're identical, w12:46
roaksoaxthey could be yes12:54
rvbagmb: what I mean it that I don't see why we would have to enforce that in MAAS.  The only problem we could see was the IP allocation and it only becomes a pb if a node is connected (network) to 2 clusters.12:54
rvbas/I mean it/I mean is/12:54
roaksoaxrvba: rihgt, but nodes should not be connected to two cluster, should they?12:55
rvbaroaksoax: yeah, that's my point.12:56
rvbaroaksoax: but it's not something we enforce anywhere.12:56
gmbrvba: So, my concern is that we're leaving a potential footgun lying around for people if we allow them to do stuff like this. OTOH, you could do some NATing at the cluster level, so maybe it's not a big deal and we should let them. I'm happy with either solution, TBH.13:03
gmbAnd we probably should't be telling network admins what to do.13:03
gmbCome to think of it :)13:03
rvbagmb: yeah, I think it's the admin's job to sort out the routing.  Unless letting them configure identical network will break something in MAAS, I think we should let them do so.13:04
rvbanetworks*13:05
gmbrvba: Okay, I'm happy with that.13:05
roaksoaxrvba: right, but that's not something we reocmmend either13:10
rvbaroaksoax: probably not.  But I don't think we should forbid this (again, unless it breaks something in MAAS itself).13:11
roaksoaxrvba: yeah, if someone does that it is their own fault13:13
jeskhi13:16
jesktrying to understand maas... having problems it :-)13:17
jesks/it/with it/13:17
jeski'am not able to get informations of the boot order process13:18
jeskwhat I could see so far was that a) server boots via PXE, b) server reboots again and boots via PXE (whyever two times) c) shuts off13:19
jeskwhen trying to install something (only tried juju quickstart) a) server boots and installs image b) reboots and boots again from PXE13:20
jeskdo I need to deactivate PXE manually or is that handled by MAAS?13:20
gmballenap, rvba, jtv: 'Nother branch for all y'all. https://code.launchpad.net/~gmb/maas/fix-ipmi-wording-bug-1304518/+merge/24107513:22
jtvI'll take it.13:22
allenapjtv: I've done it.13:23
jtvGrrr13:23
allenapSorry :)13:23
jtvBikeshed derby is ON!13:24
jeskis there any real technical doc about MAAS?13:24
jeskor just cloud-style powerpoint informations :D13:24
jtvjesk: http://maas.ubuntu.com/docs1.5/13:26
jeskjtv: those docs dont explain what happens when you really want to deploy nodes13:28
jtvIt's an old version...  more recent docs on maas.ubuntu.com may help.  Did you have anything specific in mind?13:29
jeskits more like "type this and that"13:29
jeskjtv: i dont get the overall picture of it13:29
jeskjtv: concrete use case13:30
jeskcurrently i'am only having one MAAS node and now i want to deploy more nodes. I'am not coming over the step of "booting a server from pxe" which shuts down after PXE boot13:31
jtvOkay, so you're clearly beyond the part covered in the Orientation section.13:31
jtvWhich is good.13:31
jeskeven wake on lan works13:32
jesk"start node" -> node starts, boots from pxe -> and down again13:32
jtvAh, wake-on-LAN is awkward because it has no way to shut down a node.13:32
jtvSo you already commissioned and allocated the node?13:32
jeskyeah, but i'am happy with shutting down fvor the moment via ILO-manually13:32
jeskyeah I did that, *but* i'am not able to get the knowledge what it even means :-)13:33
jtvRight.13:33
jtvI'm sure we documented this _somewhere_, but let me be lazy first and summarise.13:33
jeskone node is currently in "allocated to root"-state and one in "ready"-state13:33
jtvOK.  The "allocated to root" one should be either being installed, or up and installed with your system and your ssh key.13:34
jtvOr, crucially, it could be waiting for you to start it.13:34
jesk"allocated to root"-state because of issueing "juju quickstart" most probably, which unfortunately ended in nothing13:34
jtv(This all gets much better in the 1.7 which we're currently in the process of releasing)13:34
jtvOh, this was done through juju?13:35
jeski'am not 100% sure if juju started through MAAS a node13:35
jeskbut i could saw with remote console that a system was installed13:35
jtvOh, an operating system was installed on that node?13:36
jtvThat's good.13:36
jeskafter automatic reboot it was booted again via PXE and juju quickstart timeout13:36
jtvWhen you bootstrap a juju environment, it allocates a node for itself.13:36
jtvIt asks MAAS to allocate a node, and when it gets a node, it tells MAAS to start the node up.13:37
jtvAs the node starts up, it netboots off the MAAS server, and boots into an install image.13:37
jtvThus it installs an OS, and the user's SSH keys.13:37
jtvThen it reboots into the OS that it just installed.13:37
jtvAt this point the user (which I guess here is juju) has a working node.13:37
jtvI guess your node got installed, and then shut down... and did it come back up after that?13:38
jeskin my case (i believe) it rebooted from PXE again :-)13:38
jeskso the user has to make sure that PXE is turned off on the server when the server reboots finally after OS installation?13:39
jeskor can the PXE boot image check if the server was started for normal operation after OS installation and boots from local disk?13:40
jeskboot order is: (1) PXE (2) CD (3) HDD13:41
jtvOnce the node is deployed (as this one seems to be), it will boot off its own disk.13:41
jeskto have the flexibility to always boot from PXE for new installation PXE has to be (1)13:41
jtvSo no need to change that order.13:41
jeskif it boots from disk directly boot order must be alsways (1) HDD first13:42
jeskbut then iam not able to boot from PXE if i want to13:42
jtvIf the node tries to netboot while it's deployed, the MAAS server tells it to boot from local disk.13:42
jeskand HDD gets boots always as soon as the HDD has an valid boot record13:42
jeskah! so via PXE it gets told to boot from local disk13:43
jtvYeah.  No need to change that order: just always let it netboot.13:43
jeskinteresting but unfortunately seemed not work13:43
jtvAny symptoms?13:43
jeskit booted from PXE13:44
jesksaw this via console13:44
=== jfarschman is now known as MilesDenver
jeskbut maybe "juju quickstart" dont tell MAAS to handle that installation persistent and leave it as "new installed node"?13:44
jeskso much magic :-)13:45
jeski'am just dumb network engineer playing with that stuff13:45
jtvSo the node that was "allocated to root" booted from PXE?  What happened then?13:45
jesk(with a bit of linux and freebsd background)13:45
jeskit shut off after that13:46
jtv(Yes, far too much complexity — there's a lot less you can count on once you cross the boundaries between machines and between reboots)13:46
jtvIt shut off...13:46
jtvThat normally means it's not allocated.13:46
jeskah those server shouldnt shut off after PXE boot?13:46
jtvNow, the situation as I understand it is that you have two nodes: #1 was deployed by Juju itself, and #2 is in the Ready state.13:47
jtvServers will PXE-boot rather a lot... it depends on the situation.  *During deployment* there should be one reboot, from the install image into the installed system.13:48
jeskyes, but both off13:48
jtvIs it possible that the wake-on-LAN simply didn't come through?  Again, things are much better in 1.7, but in 1.5 the server just wouldn't notice.13:48
jeskwake on lan works, i dont get how a system is installed at all. What I saw is that servers boot two times from PXE, but dont install a full blown OS13:49
jeskand doesnt matter what I do they dont come up with a plain OS boot13:50
jeskthey always boot something from PXE which ends in a shutdown after that13:50
jtvI wonder if maybe you don't have the boot images you need...13:50
jeskso the goal ist that MAAS would install the image I gave the node via MAAS frontend13:51
jtvWell you wouldn't have to provide an image; MAAS downloads those by itself.13:51
jeskand it would install and boot it similar to installation from CD13:51
jeskending in terminal prompot13:52
jtvWell, login prompt.  :)13:52
jeskyes :)13:52
jeskhowever it finds out charsets, language, time zone, disk partitions...13:52
jtvLet me just summarise what phases these pxe-boots go through:13:52
jtvFirst you "enlist" nodes — usually simply by turning them on and letting them netboot off the MAAS server.13:53
jtvThey then register their existence with MAAS.13:53
jtvThen, you tell MAAS that you want to "commission" them.13:53
jtvMAAS boots them up, but into an ephemeral image, and builds an inventory of the node's hardware.13:53
jtvAfter this step, a node is Ready.13:54
jtvIf you got to this point, that should mean that basic things like netbooting already work.13:54
jtvI do believe that ILO has some IPMI quirks, but if you're using wake-on-LAN, I don't think those would affect you.13:55
jesk(the wakeonlan package isnt installed by dependencies btw, had to manually do this)13:55
jeskok those steps were all done I guess, the server registered, I see their MACs, and both were already in the ready-state13:56
jeskbut what after that13:56
jtvIf you go to a Ready node's UI page, there are buttons to allocate and start the node.13:57
jtv(You may want to log in as a non-admin user to hide the atypical steps for now)13:57
jtvDid you upload your SSH public key?13:58
jeskyes13:58
jtvThen the Start button should boot the node into the installer.13:58
jeskwhen I press "start node" what will happen?13:58
jeskìt boots again from PXE?13:58
jtvYes, into an installer.13:58
jeskah ok13:58
jtvThat then installs the OS (which is always Ubuntu in 1.5).13:58
jtv(If you edit the node you can select a different release.)13:59
jtvWhen the installer is done, it reboots the node.13:59
jtvAt that point the node should come back up, into the OS that was just installed  — with your SSH keys on it.13:59
jeskand this last step can also be managed by juju for example?13:59
jtvYes.13:59
jtvWhen you ask Juju to start a unit, it allocates and starts that node.14:00
jtv(It provides some custom data to install the charm you want, of course.)14:00
jtvOnce you have the node up and running, it's utterly yours.  You can reboot it, mess with the OS, etc.  Just don't disable PXE-boot or it will be hard for MAAS to manage after you release it.  :)14:01
jeskthanks so far, jtv14:01
jtvJuju has a very cloud-y view of machines, so it will tend to think of machines as things you start up once, use for as long as you need it, and then discard.14:01
jtvnp.14:01
jeski will play a little more14:02
jtvOK.  Let me know when you want to tackle the Mystery of the Phantom Server.14:02
jeskof what? :D14:02
jtvYeah, this analogy isn't working very well.  These titles usually complain about dead people/ships/animals acting as if they're alive, not the other way around.14:03
jeski would really like to install the whole openstack magic on box for now14:03
jeskand have like 6 nodes for storage and computing, just for the possibilities in my lab14:03
jeskone openstack box14:04
jtvOne small caveat: MAAS can manage VMs, but it doesn't create them.14:04
jeskbut unfortunately the guides install like 6 openstack servers for having 2 compute and 2 storage nodes14:04
jeskok thanks for the hint14:05
jeskinteresting OS was installed14:16
jeskbut cant login, SSH pubkey of my user doesnt work14:17
jesk:D14:17
jeskwonder how juju handled that...14:17
jeskmuha... my mistake sorry for the spam... user ubuntu ...14:20
jtv:)14:25
=== matsubara is now known as matsubara-lunch
* jtv steps out for a break14:40
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== matsubara-lunch is now known as matsubara
jeskit's a bit of shame that if you do an openstack installation like ubuntu guides suggest that on all edges and corners things dont work as explained16:34
jeski would expect that for foreign howtos, but from the distribution itself16:35
lutostagjesk: where at? I'd like to fix that if possible?16:38
jeskjust take one of the manuals about installing openstack, this is really not a flame... i'am now trying for few days to install it in all kinds of variants... without success16:40
jesknext error happened right now:16:41
jesk2014-11-07 14:56:27 ERROR juju.cmd supercommand.go:305 gomaasapi: got error back from server: 401 OK (Expired timestamp: given 1415372187 and now 1415381494 has a greater difference than threshold 300)16:41
jeskits a mess16:41
roaksoaxrvba: ^^16:42
jeskyou need a lot of clue of all components, maybe then its possible to install that stuff, but then please no big marketing about "openstack installation from canonical step by step guides"16:42
roaksoaxjesk: what guides are you using?16:45
jeski tried all I could found :D16:45
jeskmaas guides, juju guides, openstack guides16:45
jeskall from the ubuntu doc archive, and also foreign stuff16:45
roaksoaxjesk: like?16:46
jeskwhat do you mean?16:46
jesklike that https://insights.ubuntu.com/2014/05/21/ubuntu-cloud-documentation-14-04lts/16:47
jeskor just the openstack-install package16:48
roaksoaxjesk: http://insights.ubuntu.com/wp-content/uploads/UCD-latest.pdf?utm_source=Ubuntu%20Cloud%20documentation%20%E2%80%93%2014.04%20LTS&utm_medium=download+link&utm_content= that's what you need to follow16:50
roaksoaxmaybe you just run into a bug with juju16:50
=== roadmr is now known as roadmr_afk
=== jfarschman is now known as MilesDenver
=== CyberJacob|Away is now known as CyberJacob
=== roadmr_afk is now known as roadmr

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!