=== lazypower-travel is now known as lazyPower === vladk|offline is now known as vladk === vladk is now known as vladk|offline === CyberJacob|Away is now known as CyberJacob [06:32] allenap: Both trunk and 1.5 CI are failing now. [06:35] also, the lander is down [06:36] THE SKY IS FALLING [06:36] The day is off to a good start. [06:36] well mine was good :) [06:36] the end, however, is pants [06:43] rvba: so I thought of a big problem today [06:43] I don't think we have a way to work out on which nodegroupinterface a mac belongs [06:44] and we need that to do the allocation [06:44] since django turned my brain to cottage cheese this afternoon, I am hoping your freshly awoken one will have some ideas [06:45] Okay, I need some context… hangout? [06:47] mmm cottage cheese [06:47] heh [06:47] * bigjools spies a lurking lifeless [06:47] Hi lifeless [07:04] lurking 4 eva :) [07:04] rvba: o/ [07:07] Hi there lifeless! [08:09] rvba: if you make no progress with the lander machine we can kill it and deploy a new one [08:09] bigjools: okay. [08:11] rvba: I added cards on the board from our discussion [08:11] k, ta [08:14] rvba: just to check, when django deletes ipaddress, it will remove the link table entry automatically, right? [08:15] bigjools: yeah, I think so. [08:15] shall we just call them stips? :) [09:05] allenap: Actually, your change seems to have fixed the immediate problem. But the tests failed later on: the import of the images failed. [09:10] rvba: Do you have a paste of that? [09:11] allenap: all the logs are here: http://d-jenkins.ubuntu-ci:8080/view/MAAS/job/utopic-adt-maas-manual/11/artifact/results/ [09:38] allenap: it's the reporting of the images that is failing with http://paste.ubuntu.com/7579121/ [10:24] rvba: Do you have the Apache logs that correspond to that? I still can’t get this damned VPN to work. [10:59] jhobbs, blake_r: always run "make lint" before submitting. Ideally, run it with every commit. [10:59] Two lint problems right now, one cosmetic, the other serious: [10:59] 1. A bunch of JS lint. This sometimes gets masked somehow. [11:00] You know how picky browsers are — lint could mean that you've got something that'll break on somebody else's browser. [11:00] 2. We no longer have call_capture_and_check. I said so on the review! The tests don't notice because they patch out the call. [11:01] The linter notices though, so it's not just there for the cosmetics. [11:01] jtv: I was going to fix that in a later branch, as that was getting back ported into 1.5 [11:04] allenap: When I tried configuring the VPN with a script, it was a bit of a nightmare, but using the UI it was pretty simple. Maybe you should try that. [11:04] I see. It's OK to land incomplete work, but having code actually break is a bit much... Sometimes these things can hit spots where we can't easily diagnose the failure. [11:05] jtv: also i run "make lint" on everything before I submit [11:05] Ah good. [11:05] jtv: I don't remeber it erroring [11:05] rvba: The UI didn’t work, and I don’t know why. From the command-line it starts okay, but there’s no indication that it sets up a difference/additional DNS server. [11:05] Hmm... [11:05] jtv: as for the js errors, how do you get them to show? [11:05] blake_r: maybe the problem then is that you ran the lint check before merging a fresh trunk. [11:06] allenap: Maybe it's related to running Ubuntu inside a VM. [11:06] jtv: speaking of js look at this, maybe I am just using it incorrectly https://bugs.launchpad.net/maas/+bug/1325927 [11:06] Ubuntu bug 1325927 in MAAS "YUI.Array.each not working as expected" [High,Confirmed] [11:06] jtv: i bet that was it [11:06] * jtv looks [11:07] gmb: any idea on how to get rid of an unreachable dying instance? (I know you've done things on the landers before.) [11:07] rvba: Unreachable in what way? [11:08] gmb: "It's stone dead." [11:08] As in, it rebooted and lost its network access. [11:08] It's as good as dead. [11:08] blake_r: that's my main dislike about JS... so easy to make a mistake that it doesn't complain about. Cruel to be kind, etc. [11:08] Ooh, nasty. [11:08] rvba: So you can’t do destroy-service or remove-unit or anything like that? [11:08] jtv: am I using YUI.Array.each correctly? [11:09] jtv: as the same method, works in another piece of that file [11:09] I don't get it either. [11:09] jtv: oh okay [11:09] gmb: that's what I did. Now the instance is 'dying'. [11:09] Has been 'dying' for 30 minutes now. [11:09] jtv: i will just use options.each [11:09] Yeah. [11:09] I'm afraid it's a zombie now. [11:09] rvba: I think you’re hosed then. [11:09] Yeah. [11:09] Juju is waiting for the unit to report back that it’s powering off [11:10] But it can't [11:10] gmb: so, what do you recommend? Getting rid of the entire environment and recreating from scratch? [11:10] rvba: That’s about your only option. [11:10] It's a bit crazy that there isn't a better way out of this. [11:10] blake_r: to run just the JS check, "make lint-js". [11:10] jtv: did you fix the call_capture_and_check? [11:10] No, I only just noticed. [11:11] rvba: Well, there may be, but I don’t know enough juju gris-gris to know about it. [11:11] jtv: okay [11:15] jtv: "make lint-js" on trunk shows no isues? [11:16] blake_r: oh dear, I've noticed that in the past, and thought it was version skew in the linter... [11:18] It _could_ be an upstream revision that never made it into the package. [11:19] I've got python-pocket-lint installed... Have you? If not, the Makefile will download the upstream tarball. [11:20] jtv: Installed: 0.5.31-0ubuntu1 [11:21] allenap: I tried that. `nova list` now says its 'DELETED' but Juju still thinks it's there. [11:22] blake_r: same here... [11:23] jtv: did "bzr branch lp:maas" and then "make lint-js" no issues [11:23] Puzzled. [11:23] jtv: https://code.launchpad.net/~blake-rouse/maas/fix-find_mac_via_arp/+merge/221864 [11:23] Thanks. Will review. [11:23] jtv: np [11:23] Not even getting the warnings on a clean branch... [11:24] I mean, I *am* getting them on a clean branch. [11:24] As well as a built branch. [11:25] blake_r: reviewed. Thanks again. [11:25] jtv: np [11:25] gmb: This is the output of `juju status`: http://paste.ubuntu.com/7579736/; do you know what are the two services are the bottom? [11:25] jtv: http://paste.ubuntu.com/7579740/ [11:25] tarmac-maas14 / tarmac-maas15 [11:26] rvba: The landers for 1.5 and 1.4 [11:26] Ah, right. Silly me. [11:26] blake_r: I did the same thing, and got the errors. Could you maybe run the "find" command from the Makefile (in the lint-js target) manually, see if we get different results? [11:27] gmb: allenap: Unless you guys have a better idea, I'm going to destroy the environment and re-create it. [11:27] rvba: Go for it. Burn it. [11:27] It can’t work worse than it is right now. [11:27] Well, some of the landers work. [11:28] blake_r: in my case, I get a whole bunch of JS filenames vomited into my terminal when I run "find src/maasserver/static/js -type f -print0" — as expected. Same for you? [11:28] jtv: yep [11:28] jtv: i get them all [11:28] gmb: isn't possible to deploy the same service twice? [11:28] Like under a different name or something. [11:29] jtv: http://paste.ubuntu.com/7579761/ [11:29] rvba: Yes, you could do that. [11:29] I think it’s just an extra positional argument on the end of the deploy command. [11:29] jtv: i removed the print0 [11:29] jtv: just for a cleaner output [11:29] I also find that I get the list if I keep the -print0, and pipe it through "xargs -r0 ls" [11:30] And I get the errors when I run just "pocketlint src/maasserver/static/js/os_distro_select.js" [11:30] (On an up-to-date trunk) [11:31] jtv: i get nothing [11:31] "which -a pocketlint" for me prints /usr/bin/pocketlint — must be the same for you I guess... [11:31] jtv: yep [11:32] Gah. [11:32] Oh, I haven't looked into the "available" magic. Because it's, well, magic to me. [11:32] I'll try removing the '@' sign from that command line in the Makefile, to see what command it actually runs. [11:32] gmb: that's what the doc says, but it doesn't seem to work. I'm using: paste.ubuntu.com/7579793/ [11:33] rvba: Do you get an error? [11:33] error: unrecognized args: ["tarmac-maas-trunk-fixed"] [11:34] jtv: just did "apt-get install --reinstall python-pocket-lint" [11:34] jtv: still getting no errors [11:34] blake_r: surprisingly, removing the @ isn't enough. I had to remove the "sources = src/..." line right above, _and_ remove the @. Then I got: [11:34] find src/maasserver/static/js -type f -print0 | xargs -r0 pocketlint [11:35] (I had to spell out the $(sources) value in the command line, of course) [11:35] jtv: no errors for that command as well [11:35] But "make lint-js" printed that same command line? [11:36] I'm mainly curious if it might be substituting something for the "pocketlint" command. [11:36] jtv: yep got the same command [11:37] Even tried different locale settings... no change. :( [11:38] gmb: I got that charm started… let's see if it works… [11:39] Fingers crossed [11:41] Machine is still pending… doesn't look good. [11:41] blake_r: I also tried looking for latent pocketlint configuration on my system, but no dice. Maybe there's some linking going on that makes it ignore files? Are you using lightweight checkouts or anything like that? [11:42] jtv: i use buildout [11:43] In this case you're getting an installed version of pocketlint though, so I wouldn't expect it to make a difference in itself. [11:44] jtv: i even added "which $(pocketlint)" to the makefile and got /usr/bin/pocketlint [11:45] jtv: so its using the installed pocketlint and the newest version as i just did a reinstall of it [11:45] Yeah. [11:45] I wonder if something somewhere installs config for it. [11:46] Maybe an old version of the package, where if you had the older version, some warnings get suppressed or something... [11:46] ...even if you'd since upgraded or reinstalled. [11:47] gmb: It didn't work. The instance never came up. [11:47] rvba: Then we’re back to option one: burn it. [11:48] gmb: I'm wondering why the existing (dead) instance seems to claim it was running Trusty and in Julian's instructions, it says to deploy on Saucy. [11:48] gmb: I'm afraid we're not going to be able to bring up the other landers :/ [11:48] rvba: Oh, that’s very weird indeed. [11:49] When I try deploying the same charm/config on Trusty I get an error about the instance image (specified in the config—I guess) not being found. [11:50] rvba: Is this for the trunk lander? [11:50] Yeah [11:51] rvba: I think that the saucy instruction is a holdover; ca [11:51] *can you update the config and try running it again? [11:51] Also, I'm using juju 1.19 and the bootstrap node is using 1.18. [11:51] rvba: juju upgrade-juju should fix that, I think. [11:52] gmb: or kill the whole env for good :) [11:52] rvba: Well, there’s that option :D [11:53] gmb: looking into it there is nothing in the config that says 'Saucy'. [11:53] Ahahahaohgod. [11:54] Maybe we're running out of instances. I know there is a limit. [11:54] Hmm, that could be it. [11:54] rvba: Or else Canonistack is falling over. [11:55] I'll kill the 1.4 lander and try again… [11:55] ok [11:56] tarmac-maas14 is now 'dying'. [11:56] * rvba hopes for the best… [11:57] I'm afraid I created another zombie machine. :/ [11:58] Brainzzz [11:59] rvba, can you think why passing an "arch" string to the node constraints filter form might be fundamentally different from passing a "name"? [11:59] I'm sure we've seen this before, but I keep getting the arch as a string representation of a list containing my one value. [12:00] jtv: maybe the form expects a list of values. And thus it would use data.request.getlist('arch'). [12:01] rvba: yes, but the field types for those two fields are identical... I wonder if the cleaning does something weird to it. [12:03] jtv: the only difference I can see is the form's clean_arch() method. [12:03] rvba: ahhhh, clean_arch does "return [value]" [12:03] Because... what? [12:06] Changing that breaks a different test, but it's not clear to me why. [12:06] Ah, we filter on architecture__in=arch [12:06] Well, the filtering code assume it's a list. [12:06] assumes* [12:06] Right. [12:06] And yet it's a single-value field. [12:07] I'll try making it a consistently single-value vield. [12:07] Ahhhh, this is for the wildcards. [12:14] Grrr. So we have a single-item field becoming a multi-value field during cleaning. [12:20] blake_r: pocketlint will call external linters... Maybe I've got some other linter installed which then gets called. === CyberJacob is now known as CyberJacob|Away [13:41] blake_r: I'm giving up on the lint mystery for now... a web search suggests I'm seeing messages from jslint, which I don't have installed, but which pocketlint seems to call anyway! [13:48] rvba: for the record, I didn't find any constraints-rendering code... and it's surprisingly hard to get right, what with the type changes happening in cleaning! [13:50] allenap: gmb: jtv: Making progress here: the trunk lander is up and running. Now, it tried to land allenap's branch and failed, maybe there is a config problem. [13:51] New failure or old failure? [13:51] The ssh one? [13:51] https://code.launchpad.net/~allenap/maas/database-locks-revisited-at-start-up/+merge/221799 [13:52] Oh goodie, more locking [13:52] rvba: I suspect it needs an apt-get update. [13:52] rvba: Ah, it’s running on saucy! [13:53] > E: Unable to locate package python-crochet [13:53] > E: Unable to locate package python-seamicroclient [13:53] > make: *** [install-dependencies] Error 100 [13:53] Trunk needs Trusty. [13:53] Right, saucy. Silly me, I followed the instructions from the google doc. [13:55] allenap: same charm is deploying on Trusty… fingers crossed. [13:55] rvba: I’ll mark that branch for merging again. [13:55] allenap: already done [13:56] I was trop tard. [13:56] De peu. [14:03] allenap: gosh, another failure. [14:05] rvba: Guess what, our old friend Mr Rabbit. [14:05] allenap: I did a manual 'make install-dependencies' on the machine and it went fine. [14:30] allenap: lp:~allenap/maas/database-locks-revisited-at-start-up is merged! [14:30] \o/ [14:30] Victory! [14:31] allenap: it needs backporting to 1.5 btw. [14:31] La victoire! [14:31] Guys, the lander for trunk is (finally) back up. [14:32] Anyone up for a tiny review? https://code.launchpad.net/~rvb/maas/pkg-stop-dhcp/+merge/221851 [14:39] rvba: I can probably do it in a few. [14:39] Thanks. [14:40] blake_r, maybe I've found the problem: pocketlint has a tendency to crash if it finds files that it doesn't recognise — and the Makefile rule doesn't filter the filenames at all! [14:46] rvba: reviewed. [14:46] Ta === vladk|offline is now known as vladk [14:54] Got a branch up to produce more helpful error messages when no nodes can be allocated. Anyone want to review? https://code.launchpad.net/~jtv/maas/bug-1274085/+merge/221896 [14:55] blake_r: lint branch is up for review. I restricted the Makefile rule. Probably not going to solve our output difference, but it may prevent some crashes. [14:56] jtv: I'll have a look now [15:05] allenap: your change (revision 2400) fixed the build. === vladk is now known as vladk|offline [15:11] rvba: l'aigle a atterri [15:12] rvba: You’re a good guy today for fixing the landers, even if you do want to kill most of your colleagues pets. [15:12] EAPOSTROPE [15:12] heh [15:13] ESPELLING [15:13] * allenap sense he is low on thinking fuel, goes to get tea [15:16] allenap: if you backport your fix to 1.5 I'll leave your dog alone. At least this week. [15:38] Thanks for the review rvba. [15:39] Welcome. [15:41] rvba: Agreed. [15:52] rvba: We missed an opportunity to fix the lander’s name; it’s still “MaaS Lander” (capitalisation). [15:53] allenap: I don't see that in the config… must be in the charm… === vladk|offline is now known as vladk === vladk is now known as vladk|offline [17:49] can anyone tell me how to suppress netcfg in a preseed? I've tried commenting all of the d-i netcfg commands but it continues to overwrite /etc/network/interfaces. === CyberJacob|Away is now known as CyberJacob [18:07] interesting discussing going on about MAAS here: https://news.ycombinator.com/item?id=7839943 === roadmr is now known as roadmr_afk [19:48] blake_r, I'm getting an OAUTH error during commissioning with a difference of 6 hours. during commisioning, the node is using UTC for some reason, but my maas server is set to local time. I resolved the issue in the preseed by setting an NTP server but commissioning is failing now. Any idea how to resolve this? [19:49] designated: is the maas server in a different timezone from the nodes you are deployin? [19:49] designated: that's the issue, OAUTH issues with different timezones === roadmr_afk is now known as roadmr [19:50] roaksoax, no, they are all in the same timezone [19:50] that;s weird then [19:50] the only reason why that would happen is because of a mismatch between timezones [19:50] or maybe due to them not be in UTC? [19:50] roaksoax, i found /etc/init/hwclock.conf which is setting the node to UTC during commissioning [19:51] designated: and is that being done by cloud init maybe? [19:51] i ran into the same problem during OS load because maas was set to local time, I resolved that issue by setting an ntp server in the preseed [19:53] roaksoax, yes but I don't know where. I found an old bug that explains this problem but it was supposedly fixed. they also reference modifying /etc/init/cloud-init, which doesn't exists anymore [19:53] designated: well smoser is not around and he would be the one to help you with this issue [19:57] roaksoax, i found the following files /etc/init/hwclock.conf and /etc/init/hwclock-save.conf which explain it sets hwclock to UTC on boot then back to localtime on shutdown but it's not documented well enough to permit changing it. [19:57] right [19:58] designated: again, i do not know much about that issue, smoser would be the person you need [19:58] roaksoax, thanks [20:05] roaksoax, o/ [20:05] smoser: o/ [20:05] smoser: designated was having some issue with changing timezones [20:05] smoser: on maas [20:05] designated: ^^ [20:06] cloud-init should work around any difference with clocks. [20:10] smoser, I'm getting oauth failure during commissioning of a node with a 6 hour time difference [20:11] smoser, my maas server is set to local time and i resolved the issue during OS load by adding an NTP server to the preseed but now commissioning is failing with the same problem. [20:11] Have you considered fixing your clock? [20:11] jpds, which clock? [20:12] Your hardware clock. [20:12] jpds, why would i want to go through the bios of hundreds of servers to set the time? Why can't it just synch with my NTP server? [20:13] designated, i see a cloud-init.log ? or a console log ? [20:13] that really should be fixed. [20:16] smoser, i don't have access to the node being commissioned to get the cloud-init.log because it doesn't complete. I do have the following on maas under /var/log/maas/maas.log: OAuthUnauthorized: 'Expired timestamp: given 1401801931 and now 1401823500 has a greater difference than threshold 300' [20:17] designated, do you have console logs ? [20:17] and what version / release is this that is doing commissioning ? [20:18] smoser, 1.5.1+bzr2269-0ubuntu0.1 [20:19] designated, what ubuntu release is running on commissioning. [20:19] http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/sources/DataSourceMAAS.py is the code (_except_cb) that fudges the oauth headers to match whatever the server sent. [20:20] interstingly your ntp fix causes issues also for cloud-init. which make it such that we need the monotonic timer. [20:21] smoser, 14.04 [20:21] on console output of the system , you should see things like 'Setting oath clockskew' [20:21] smoser, i have limited console output because I'm using a java application through BMC [20:21] only thing it shows right now is "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [20:21] yeah, that sucks. [20:21] disables what message ? [20:22] task systemd-udev:577 blocked for more than 120 seconds [20:23] hm.. not sure thats relevant. [20:23] it probably isn't [20:23] I was just sharing the limited console output i see [20:24] the point is, the time difference is 6 hours, which indicates the time is being set to UTC on the node being commissioned, whereas the time set on my maas box is MDT [20:24] designated, but that should not be a problem. [20:25] we ran into "bad clocks" quite a while ago and fixed that. [20:25] cloud-init just reads the time from the server in the 403 header [20:25] and says "ok, i'll just ues that timestamp". [20:25] smoser, i read that bug [20:25] but if ntp is in the mix, then you can hit the timeout and never try more than once. [20:26] let me try rebooting the node and see if i can capture console output [20:27] smoser, the only thing that was changed to resolve the oauth issue i had during OS load was the addition of the following in the preseed. [20:27] d-i clock-setup/utc boolean false [20:27] d-i clock-setup/ntp boolean true [20:27] d-i clock-setup/ntp-server string 192.168.168.1 [20:27] d-i time/zone string US/Mountain [20:29] designated, you can also "backdoor" the image so you can get in. [20:29] the d-i has no affect to commissioning. [20:29] designated, the other change you can make tha tmight get you through commissioning, but i'm not sure why is maasserver/compose_preseed.py [20:29] smoser, I followed that backdoor you wrote and I couldn't get it to work. [20:31] from here: https://maas.ubuntu.com/docs/troubleshooting.html#debugging-ephemeral-image [20:31] http://paste.ubuntu.com/7582798/ [20:32] well, the 'sudo' line should show output, if it does, then you may have to restart tgtd, but i'm not sure you should have to. [20:32] then it should be good. [20:32] designated, ^ try that modification in the paste above. [20:33] you can also add 'timeout' there like this [20:33] http://paste.ubuntu.com/7582817/ [20:35] ok I added 'max_wait': 10 * 365 * 24 * 60 * 60, to /usr/lib/python2.7/dist-packages/maasserver/compose_preseed.py [20:36] right. [20:36] and you may likely have to restart maas-pserv to make that take affect [20:44] smoser, what is the preferred method of restarting maas-pserv? [20:44] smoser, is service maas-pserv restart acceptable? [20:45] yeah [20:45] shoudl be good. [20:45] i think its pserv. [20:46] but i might just restart everything in /etc/init/maas* [20:46] just because i dont know and odnt want to deal with that not being it [20:48] so what does this change do besides increase the wait time to a large number? [20:48] thats really it. [20:49] you probably could have changed it to 60*60*6+30 [20:49] (6 hours and 30 seconds) [20:49] smoser, okay [20:50] the problem is how cloud-init waits. [20:50] it reads clock, tries, reads clock, and determines how much time has passed. [20:50] if the first attempt fails because of bad clock [20:50] and then clock gets fixed (and jumps ahead 6 hours) [20:50] then it will think the amount of time it was supposed to wait total has passed. [20:53] designated, you really should see *something* some kind of warning on the console. or if you get in, and collect /var/log/cloud-init* then you'll see a WARN that will direct us appropriately i think. [21:02] smoser, right now it's just sitting at initramfs [21:02] not doing anything [21:04] thats iscsi. [21:04] its jsut trying to get into its root. [21:13] smoser, now it's just sitting at "(initramfs) [ 29.209404] random: nonblocking pool is initialized" [21:14] would seem unrelated. [21:14] it shouldnt sit that long. unless tgtd didnt come back up or something. [21:14] i can try rebooting the maas server [21:14] designated: you will need to restart "service apache2 restart" for that preseed change [21:15] bah. sorry. [21:16] okay, I restarted apache, rebooting node now [21:22] still hanging at the same point. is there a way to cancel the commissioning task for this server and restart it? [21:37] designated: it would just restart the server [21:37] designated: looks like a iscsi issue then [21:37] designated: sudo service tgt restart [21:41] i restarted the maas server, and then booted node marked for provisioning, it got past that part but right back to oauth failure. [21:41] ERROR 2014-06-03 15:36:37,245 maasserver ################################ Exception: 'Expired timestamp: given 1401809828 and now 1401831397 has a greater difference than threshold 300' ################################ [21:42] smoser: ^ [21:42] i followed the instructions provided here: https://maas.ubuntu.com/docs/troubleshooting.html#debugging-ephemeral-image to try and add the backdoor account that Scott Moser wrote but it still does not allow me to log into the node directly so I cannot access the logs on the commissioned node. === CyberJacob is now known as CyberJacob|Away [21:58] smoser, finally got into the node. this is the WARN message from /var/log/cloud-init.log [21:58] 2014-06-03 15:37:08,873 - DataSourceMAAS.py[WARNING]: Setting oauth clockskew to 21568 [23:14] under /var/lib/maas/boot-resources/current/amd64/generic/trusty/release/ I have a root-image and a root-image.dist, both 1.4GB. Can anyone tell me the difference between the two and which one gets used to commission nodes? [23:33] smoser, when I log into the node that is supposed to be commissioning, the "date" command shows it's using UTC instead of local time.