[00:33] travnewmatic: so it didn't get an IP address from the MAAS? [02:55] catbus1, it did get an ip address from maas [02:56] but it didn't get pxe files from MAAS? [02:57] its like the maas server gave up on it after a certain period of time [02:57] timed out [02:58] ERROR Wed, 18 Feb. 2015 16:55:46 Failed to power on node — Timeout after 7 tries [02:58] it failed to power on the node [02:59] back to power control IPMI again [02:59] does the node respond to the manual ipmipower command? [03:02] umm that i have not tried [03:02] the node powers back on [03:02] but like i said it does this bootloop thing [03:02] it never makes it out of the bios [03:03] and then after a while it does [03:04] that sounds like a firmware issue [03:04] yeah [03:04] we dont make much of use of the dracs on those models [03:05] so keeping them up to date isnt a priority [03:05] is it an outdated thing you suspect or a misconfiguration, or both [03:06] outdated I think, I would update the firmware. Most of the time it fixes all the odd issues. Do you have any other server than 1950? [03:08] many many other types [03:08] but it'd require some rearranging [03:08] all dell [03:08] i was thinking to try another model [03:09] but we have a ton of these older 1950's just sitting around [03:09] so it'd be pretty cool if we could get it to work on those [03:09] not to mention we have ram out the ass for those [03:09] In the first phase, the discovery/enlistment phase, MAAS would set a new IPMI username/password so it can power control the node from that point on. All you need to do is commission it and it will be ready to be deployed. You don't have to manually power on/off or change any BIOS settings. [03:09] right [03:10] then that will be my task for tomorrow [03:10] figuring out how to update the firmware in the 1950's [03:10] sounds fun [03:10] don't brick any [03:11] :D [03:11] is there a way to check the version? [03:12] i think i found it [03:14] actually nope i didnt [03:21] so i walked out there and checked [03:21] "Remote Access Configuration Utility Copyright 2006 Dell Inc. All Rights Reserved 1.21" [03:22] Baseboard management controller revision 1.77 [03:22] Primary backplane firmware revision 1.05 [03:27] I am not familiar with Dell servers. [03:29] poop [03:30] theres also this http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=CH9TW [03:43] actually this seems to be the most relevant http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=D8GP9 [03:43] so i'm either going to need to install windows on it [03:44] or hope that red-hat version also works with centos and install that [03:50] again thanks for the help, i'll see what i can figure out tomorrow [03:50] yall have a good rest of the day === frankban__ is now known as frankban === racedo_ is now known as racedo [15:09] can anyone give me a hand i am implementing a maas server and it all works it is just SUPER slow on loading boot-kernel any ideas anyone? [15:09] nashville, yes, sure [15:10] kiko hello old friend [15:10] nashville, very slow tftp suggests a networking problem [15:10] hmmm [15:10] things to look for [15:10] ifconfig drops errors [15:10] switch configuration throttling udp [15:11] the latter is VERY common if you are using a "good switch" [15:11] everything that could mess up IP traffic could apply here [15:11] MTU issues [15:11] etc [15:11] i have actually gotten it to register a node but cant get it to commission and it took forever to register [15:11] tftp is particularly sensitive [15:12] because it's a udp-only protocol [15:12] tcpdump on the server side can give you some hints [15:12] ok thats what i have been thinking it is too [15:12] time to look at the switches [15:12] if you can replace the switches and cables one by one [15:12] mtu should be like 1500 im guessing [15:12] and you'll find it [15:12] normally on a lan it's 1500 [15:12] but some people have weirder setups [15:12] its running on a hyper-v virtual machine [15:13] oh! [15:13] thats where my server is [15:13] i was kinda wondering if i have something about the vm wrong [15:13] http://www.altaro.com/hyper-v/troubleshooting-the-hyper-v-virtual-switch-part-1/ [15:13] https://mssecbyben.wordpress.com/2014/11/02/losing-udp-traffic-in-a-hyper-v-environment-with-nvgre-and-cant-explain-why/ [15:14] nashville, j^2 and I spent a day looking at related problems and ended up replacing the NIC on the maas host which was dropping packets like crazy [15:15] like inside the server you replaced the actual hardware? [15:15] yes [15:15] ifconfig showed the way [15:15] nashville: yep, we went through the whole 7 layers and it turns out it was a layer 1 problem [15:16] ugh thats nasty i really hope thats not it [15:17] ok for some reason when i do ifconfig as the maas user it gives me problems [15:18] that doesnt matter it happens on my working server [15:22] nashville, sudo ifconfig -a [15:22] nashville, as your user may lack the right caps [15:44] after some reading thanks kiko i think my problem is vmq gonna disable that and retest and ill let you guys know thanks kiko === roadmr is now known as roadmr_afk [16:10] nashville, let me know if it works and I'll release note === med_` is now known as medberry === medberry is now known as medberry2 [16:25] @kiko well that was it buddy disabled vmq in hyper-v (2012r2) and its BLAZING now [16:25] you are the man/woman kiko thanks a ton [16:25] nashville, it's those pesky windows people [16:25] heh [16:25] enjoy [16:25] nashville, how many nodes? [16:25] and why did you chose to run maas inside hyper-v? [16:25] are you a windows shop? [16:25] right now just ten but it is easily going to climb to the thousands [16:26] no actually i dont know why they did it that way [16:26] hmm [16:26] it does make testing easier [16:26] basically i think they bought the server and didnt know what to do with it so they put 2012 on there and hyper-v [16:26] but not running on the bare metal can make the networking a bit more complex [16:26] I see [16:26] who is they, procurement? [16:26] im just a contractor here so i dont know the whole story [16:26] more like "engineering" [16:27] heh [16:27] heh [16:28] this makes twice you have saved me during this project kiko let me know where you would like your beer sent [16:29] nashville, I want a blog post actually :) [16:29] i could do that... but kiko, honestly i basically followed exactly what you sent me [16:29] specifically: [16:30] I know! but a blog post that says "maas works great" and "here were the gotchas" is great publicity [16:30] http://www.altaro.com/hyper-v/troubleshooting-the-hyper-v-virtual-switch-part-1/ [16:30] part two [16:30] oh [16:30] yeah i can def do that!!! [16:30] i.e. something like I did for dhcp/dns here: http://kiko.ghost.io/things-i-wish-id-known-about-nsupdate-and-dynamic-dns-updates/ [16:30] when i get off tonight ill write one up and put it in the chanel so you can see it [16:32] cool thanks! [17:08] roaksoax, https://bugs.launchpad.net/juju-core/+bug/1423626 fyi [17:08] Launchpad bug 1423626 in maas (Ubuntu) "Inconsistent device naming depending on install method" [Undecided,New] [17:08] roaksoax, I spent most of today helping a guy in #juju who could not get his LXC containers to get network addresses :( [17:10] jamespage, hmm! [17:10] kiko, my exact comment was "!!!eeek" [17:11] jamespage, I bet roaksoax would say "d-i is going away" [17:12] jamespage, but the actual bug if I understand your comment is with d-i, which by using biosdevname to install causes the interface to be named "incorrectly" based on the commissioning data [17:12] I guess the easiest solution would for d-i not to install biosdevname [17:12] well, but is that easy? [17:12] incorrect/inconsistent [17:13] kiko, I'm not convinced that MAAS is doing anything wrong - maybe Juju should be using MAC address and not name to identity the primary network interface ? [17:14] jamespage, perhaps, though I'm not sure maas encodes the interface name in its own logic in places [17:14] or not sure maas doesn't encode [17:14] at any rate, perhaps asking if juju /could/ do that to begin with might be a good starting point [17:15] as it's the place where it could be made more robust [17:15] i.e. the theme of accepting inputs liberally [17:17] there's no way to get the NIC name by MAC address at the time juju is generating cloud-init userdata for the node AFAIK [17:18] dimitern, kiko: in the new world: [17:18] http://fedoraproject.org/wiki/Features/SystemdPredictableNetworkInterfaceNames [17:18] dimitern, isn't the agent running and able to issue an ifconfig -a? [17:19] not sure whether we'll have that on by default for vivid - I'll check [17:20] kiko, the agent is not yet running as is the machine - this happens at allocate/deploy time [17:20] kiko, the agent however might do that once the machine boots [17:20] kiko, but that's a bit nasty - there should be a maas api for it :) [17:23] dimitern, the problem is that the interface name might have changed [17:23] dimitern, so while we could return to you what maas thinks the interface name on the node is [17:23] that might have changed from commissioning to install [17:23] as in jamespage's case [17:23] kiko, that's a fair point [17:24] kiko, dimitern: no plans to enable that specific feature for vivid/systemd [17:26] dimitern, well, whatever code /is/ running on the machine could figure out what the interface name is [17:26] dimitern, what is running on the machine? an ssh session? [17:29] kiko, juju polls the machine addresses as they become known via the cloud api and then tries to connect via ssh to all of them [17:30] kiko, in order to do the initial bootstrap (e.g.install mongod etc.) and then starts jujud [17:31] kiko, dimitern: ouch - I think I just re-opened old wounds in #ubuntu-devel [17:31] apparently this was enabled by default for server d-i installs under some protest from the foundations team [17:32] https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1347859 [17:32] Launchpad bug 1347859 in ubuntu-meta (Ubuntu) "Introduction of Predictable Network Interface Names (aka biosdevname) breaks working systems" [Undecided,New] [17:34] "predictable" seems like the wrong term here :) [17:36] jamespage: yes, it might indeed be a juju issue [17:36] dimitern, 'accurate' [17:36] the card in slot 1 port 1 will always be addressed the same [17:36] irrespective of what actually plugged into it [17:38] jamespage, except most motherboards have on-board nics that are routed who knows how ;) [17:38] jamespage, btw - I've just deployed a amd64 trusty node using d-i in my maas, and unlike before I can see "apt-cache policy biosdevname" is installed, but the NICs are still called ethX [17:38] dimitern, right, so in that ssh session could we not inter the interface name based on the MAC? [17:40] kiko, it will be rather late then - we could do it when bootcmds are run to dump some data somewhere (maybe even the correct /etc/network/interfaces using the correct names) [17:40] kiko, I've commented on about this suggestion on the bug, be we need to investigate [17:40] dimitern, uhh.. right. I am now beyond my understanding of the process :) [17:40] thanks :) [17:42] kiko, np :) thanks for the discussion - it was very useful [17:42] thanks to you [17:55] morning maas crew [17:58] morning travnewmatic [18:00] got a bit of actual work today [18:00] then will attempt to update the ipmi firmware stuff [18:01] i'll have some help, my coworker is onboard the maas bandwagon, so i'm optimistic [18:08] travnewmatic, at least somebody's trying to help :) where did you leave off? [18:09] to isolate the issue, are you already able to control a machine via IPMI manually? [18:09] well maas does have some control over the hardware through ipmi [18:09] it can turn the machine on [18:09] and can turn the machine off [18:09] but theres some miscommunication going on [18:09] i mentioned the bootloops [18:09] what is the latest symptom? [18:10] the machine will be off after having checked in with maas [18:10] travnewmatic, you're on 1.7.1, right? [18:10] uuuuuh [18:10] how to check? [18:10] i know i'm at least 1.7 [18:10] dpkg -l | grep maas [18:10] but you should be if you used the PPA [18:11] 1.7.1 [18:11] great! [18:11] woohoo! :D [18:11] tell me about these fruit loops [18:11] lol [18:11] so the machine will be off after having initially checked in with the maas server [18:11] in the maas interface i'll hit commission [18:11] by checked in do you mean what we call enlist? [18:11] and you've accepted it into your pool? [18:11] yes, the stage before commission [18:12] so i'll click commission [18:12] the machine will turn back on [18:12] go halfway through the bios, then turn off, and then turn back on again [18:12] so far so good [18:12] and it'll do that a few times [18:12] through the bios? i.e. not even attempt to pxe boot? [18:12] doesnt make it that far [18:12] okay [18:13] this could be an issue with the fact we're telling it to network boot [18:13] do the NICs have netboot functionality available/enabled [18:13] mmmm i would assume so, as it has been able to pxe boot before? [18:13] are those different things? [18:14] it has? [18:15] I see, enlist was able to pxe boot of course [18:15] okay [18:15] so if you power the machine on manually it does pxe boot [18:15] but if maas tells it to power on it gets stuck mid-boot [18:15] travnewmatic, would it be hard to video and dropbox the symptom? [18:15] or illegal [18:16] lol i dont think it'd be illegal [18:16] i'd be happy to do that [18:16] that makes it less fun but more effective I guess [18:16] though i wonder if it would be better for us to invest more time into updating the firmware to be current [18:17] right, let's do that first [18:17] mhm [18:17] and then if you are still stuck we should get a video and a model number so I can check with the certification guys [18:28] for sure [18:30] this is actually kind of cool [18:30] to think that i could contribute to the development of a peice of software used by people all over the planet [18:37] that's rather the essence of free software [18:37] yeah! [18:37] it's why practically all of us got into it, really [18:37] mhm [18:37] my buddy here at work contributes as well [18:37] as we work at a data center [18:38] we've got hardware and bandwidth out the wazoo [18:38] so he set up a mirror [18:38] centos in this case [18:39] did you know maas can now provision centos? [18:39] j^2 is working on a knife cookbook that gets it all up and running [18:39] it's slightly unfortunate that maas is the gutter where all your hardware and networking problems run to [18:39] so while maas itself is probably working, your hardware seems to have a mind of its own :) [18:40] yeah, i'm not holding maas accountable for our old janky out of date servers [18:40] YET [18:40] hahah [18:40] but soon :) [18:40] right :D [18:40] but the centos thing is kind of a requirement for us [18:40] most of the servers we provision are either centos or windows [18:42] well i guess we're lucky in that ubuntu seems to have a good partnership with dell [18:45] with all the major vendors in reality [18:45] but indeed, we should be able to get any issues resolved if the hardware is relatively well supported [20:19] so the adventure begins [20:19] kiko, do you have any experience with what we're about to attempt :D [20:28] kiko, http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=XCVN0&fileId=3078071536&osCode=LNUX&productCode=poweredge-1950&languageCode=EN&categoryId=ES [20:28] its doing its thing! [20:29] keep your fingers crossed [20:34] http://imgur.com/5wgYMVK [20:36] its doing the boot looping before it gets out of the bios [20:37] but the firmware is upgraded [20:39] still loopin [20:39] hmm [20:39] but i'm watching the log and no maas error [20:40] theres the error [20:40] Feb 19 14:35:31 tnewman3 maas.power: [INFO] plain-desire.local: Power state [20:40] has changed from unknown to on. [20:40] Feb 19 14:39:55 tnewman3 maas.power: [ERROR] Error changing power state (on) [20:40] of node: plain-desire.local (node-a20f1e7a-b876-11e4-8a8d-0015c5ef85ed) [20:40] Feb 19 14:39:55 tnewman3 maas.node: [INFO] plain-desire.local: Stopping moni [20:40] tor: node-a20f1e7a-b876-11e4-8a8d-0015c5ef85ed [20:40] Feb 19 14:39:55 tnewman3 maas.node: [ERROR] plain-desire.local: Marking node [20:40] failed: Timeout after 7 tries [20:40] gonna try an R210 [20:41] travnewmatic, could you get an actual video of a boot-up (i.e. from the time maas triggers the commission power-on) and /msg that to me? [20:41] I need to split tonight but I'll look later [20:41] thanks [20:42] shore [20:42] you have a good night! [20:42] will post results! [20:55] i think my troubles may have been related to spanning tree portfast not being enabled [21:10] everything worked good on the R210 [21:10] now trying it again on the 1950 with the updated firmware and spanning tree portfast enabled on the port === roadmr_afk is now known as roadmr [21:28] kiko, we're golden [21:28] it appears spanning-tree portfast was the culprit [21:50] and it also works on servers that do not have upgraded firmware [21:51] good news [21:52] travnewmatic: so you have 1950 commissioned successfully now? [21:52] indeed! [21:53] so, to do, centos, static ip's, subnets === medberry2 is now known as med_