[00:00] and other packaging bits [00:00] I'll try again [00:01] what upstart file? [00:01] http://paste.ubuntu.com/1228509/ [00:01] you're talking about that change? [00:02] the file is not being written at all [00:02] that change there would have no affect on it being written. [00:02] please wait [00:02] (and just random bit of information, [00:02] sh -c 'echo -n foo > /tmp/bar; read a < /tmp/bar; echo $?;' [00:02] ) [00:03] that does return '1'. [00:03] if you dont have a trailing new line [00:03] its annoying. [00:03] that would explain a lot [00:04] sigh, it's now impossible to use mouse select on konsole when byobu is running [00:04] its dirty magic, that [00:04] somehow tmux actually has "mouse support" [00:04] i was confused by this once too. [00:05] its doing something magic with your mouse [00:05] it seems to be when the clock ticks over each second [00:05] the selection is reset [00:05] ah. [00:05] well, byobu-config [00:05] and you can turn the clock off [00:05] yeah [00:05] or highlight faster [00:06] I need more bandwidth [00:06] interesting. [00:06] (use canonistack) [00:06] interesting... [00:06] can't do that with my test hardware here [00:06] i'm seeing the "No boot images have beeen registered yet" error on maas again. [00:06] I suspect because the celery stuff is not working [00:07] (although i successfully booted off of said unregistered image and enlisted [00:07] ) [00:07] there were some major routing changes [00:07] and there was fallot [00:07] fallout [00:08] bigjools, i suppose you're lamenting the ephemeral images [00:08] when you mention bandwidth [00:08] right? [00:08] no, quantal updates [00:08] ah. [00:08] well, can't really hep you there. [00:08] I already run a caching proxy [00:08] but i was just thinking i should write mirro utility [00:08] which doesn't help on the first run :) [00:09] (the maas.ubuntu.com/ only comes over https, so squid sucks) [00:09] urgh [00:09] and.... pulseaudio... ffs [00:09] but it'd be pretty easy to right a mirror of it. [00:09] well, you probably can't report a bug on it because glib updated yesterday and your out of date. [00:10] :) [00:11] ok I am going to spin up a canonistack and install from scratch [00:11] bigjools, just as another variable to play with, there are now daily images of both quantal and precise that "should work" (they've tested as working for me). [00:11] i'm sure you've seen that script i use [00:11] i dont run it as a script [00:11] but i just copy and paste bits together [00:12] hey, i'm just laying. [00:12] i just registered a node [00:12] commissioned it [00:12] now on its node page i see "start node" grayed out [00:13] ah. needed to add a key. [00:13] which quantal image is best? [00:14] just use latest I guess [00:14] bigjools: Do you happen to know what else is needed to save a setting? http://paste.ubuntu.com/1229537/ [00:14] is there 2? [00:14] daily is what i meant. [00:15] roaksoax: I think that should work [00:15] bigjools: it doesn't :( [00:15] your label should say series not release :) [00:15] alright. i'm out for the night. [00:15] cheers smoser [00:16] roaksoax: ummm [00:27] roaksoax: I'm not sure, sorry. Django is full of traps and I hate it. I'd need to ask rvb [00:29] bigjools: yeah i was planning to ask him [00:30] gah byobu [00:30] can't mouse select even with its clock off [00:31] roaksoax: I can't install maas on a clean canonistack instance, it's blowing up at the db config stage saying there's no pg clusters [00:32] and when the debconf message appears it goes over the whole screen obscuring the error underneath >:( [00:35] bigjools: pastebin? [00:35] roaksoax: any ideas? http://paste.ubuntu.com/1229558 [00:36] bigjools: is postgresql installed and running? [00:36] bigjools, i told you. its magic with your mouse. [00:36] roaksoax: installed but not running === matsubara is now known as matsubara-afk [00:36] smoser: it is :/ [00:36] http://awhan.wordpress.com/2012/04/18/tmux-copy-paste-with-mouse/ [00:36] the solution is to use the shift key. [00:36] thanks [00:37] (note, i wasn't just holding out, i didn't remember. i think i asked kirkland once) [00:37] smoser: not working doing that either [00:37] hm.. [00:44] roaksoax: so installing postgresql on its own doesn't set up a cluster and it's not running afterwards [00:51] bigjools: to me it looks that postgresql is not running [00:51] roaksoax: it's not - I had to manually create a cluster [00:51] then everything works [00:52] bigjools: ah! so maybe there's an issue with postgrsql package? [00:52] possibly [00:52] easy to re-create [00:52] * bigjools debugs non-working dhcp [00:53] bigjools: it doesn't look like postgresql actually. The last package was uploaded long time ago: https://launchpad.net/ubuntu/+source/postgresql-9.1/9.1.5-2 [00:54] dbconfig-common hasn't been updated either [00:54] something odd is going in [00:54] on [00:54] so there's some other reason [00:54] maybe fire up another instance, update, upgrade [00:54] and try again? [00:54] I just started a new quantal instance, sudo apt-get install maas [00:54] and it does it [00:55] roaksoax: there's another bug in that installing maas-dhcp doesn't terminate the existing dhcp [00:55] or prevent it running at all [00:56] bigjools: yeah I saw the report but haven't been able to take care of any packaging related activities today [00:56] bigjools: btw.. we need a way to disbale dhcp from the command line, i/e maas-provision disbale-dhcp [00:57] roaksoax: ok [00:57] so that when maas-dhcp is recmoved, or reconfigured to not have it enabled, it does not stay up and running [00:57] roaksoax: we're going to change it so the maas-dhcp package doesn't configure dhcp at all, just installs packages [00:57] it's really screwing up things with the way we want to configure dhcp otherwise [00:58] bigjools: that's fine too [00:58] it would makes things simpler [00:58] completely [03:13] bigjools: i also saw that issue once [03:13] didn't see it again though [03:13] roaksoax: the rabbit one? [03:13] bigjools: yeah [03:17] hi jam [03:17] hi jtv I mean [03:17] Hi bigjools [03:17] Being greeted is so much better as a way to start your day than a corrupted filesystem. [03:17] What a shame the corrupted filesystem was here first! [03:17] bugger [03:18] Well, it's fixed now. But I used to get this as a recurring problem, so it doesn't bode well. [03:21] Problem with the nvidia driver we suspect. [03:24] roaksoax: still there? [03:24] jtv: damn [03:24] all ok now? [03:24] For now, yes. [03:26] roaksoax: in my maas-cluster-controller.postinst I am doing a db_get to get hold of the url to maas but only on configure/reconfig at the moment, but this means that in normal install/upgrade circumstances it doesn't ask for it and breaks the end config [03:26] * roaksoax is her unfortunately [03:26] I am hoping that I can ask for it in installation as well without violating any rules [03:26] roaksoax: :) [03:27] bigjools: uhmmmm [03:27] jeeepers, Australia has bug butterflies [03:27] big* [03:27] * roaksoax is doing homework again :) [03:28] let me see [03:29] bigjools: right, so there's something wrong there [03:29] roaksoax: what are the rules on using db_get? [03:30] bigjools: let me look for an example [03:30] can't ask during upgrades IIRC [03:30] bigjools: remember, it is not an upgrade [03:30] bigjools: it is a new install [03:30] bigjools: so it *will* have to ask [03:30] roaksoax: that's not true [03:30] the new maas pulls it in [03:31] seed cloud case [03:31] bigjools: so if it pulls it in, what are you trying to achieve with a db_get ? [03:31] roaksoax: in other circumstances it is a stand-alone app running somewhere else [03:31] it has to ask [03:32] but in the seed-cloud case I am wondering if we can just pull the URL out of maas's dbconfig [03:32] bigjools: hold on hold on [03:32] * bigjools holds [03:33] bigjools: so 1. i install maas-cluster-controller and I should receibe a question asking me to input the maas-url ? (this maas-url is the region controller?) [03:33] roaksoax: yes [03:33] and yes [03:33] bigjools: ok, so what's seed cloud again? [03:33] roaksoax: the original 12.04 package [03:34] maas becomes a virtual package and pulls in region & cluster [03:34] bigjools: right, so the workflow is that every time maas-cluster-controller is installed for the first time, it should request the admin the URL right? [03:34] roaksoax: yup [03:35] roaksoax: also [03:35] bigjools: ok so an upgrade from previous maas is the same case [03:35] it needs to ascertain it on upgrade from old maas [03:36] bigjools: right, so we can grab that from celeryconfig [03:36] which is maas/default-maas-url [03:36] bigjools: rigth, so hold on [03:36] * bigjools holds on [03:36] bigjools: normally.... eveyr time maas-cluster-controller is installed for the first time it should ask for the URL [03:36] every time except when upgrading from the 12.04 maas :) [03:37] it doesn't matter if we are upgrading from 'maas' to maas-{region,cluster}-controller [03:37] oh? [03:37] the package is installed for the first time [03:38] so it *should* ask for it, don't you think? [03:38] so when someone with maas installed on 12.04 upgrades to 12.10, I thought it was bad to ask questions [03:38] i don't know to what extend it is bad... but TBH.. we are not free from not asking them [03:39] I think we are though, because it will be in maas/default-maas-url [03:39] bigjools: i mean... I recall having had to answer questions of this sort before [03:39] bigjools you could ask debconf db for the value, sure [03:39] so if I can work out that we're upgrading from old maas... [03:39] I don;t know how to work that out [03:39] bigjools: we could, I don't know whether is a sane thing to do though [03:40] in the way of policy compliant [03:40] as in quering debconf db for another's package db [03:40] ok [03:40] perhaps grep celeryconfig then [03:40] bigjools: that's not all [03:41] bigjools: consider that if we still all-in-one MAAS server from the CD installer, we should not ask that question [03:41] * bigjools 's head explodes [03:42] bigjools: so looking at celery might be agood idea, but what happens if cluster-controller gets installed *before* region-controller [03:42] if celeryconfig is not there, we're not upgrading so we have to ask [03:42] bigjools: right, but what if we don't know it? [03:43] bigjools: as in the administrator doens't know it [03:43] there's the $64m question [03:43] bigjools: i'll have to think about that though [03:44] bigjools: but anyways, the db_get and stuff [03:44] bigjools: see at maas-dhcp.config that will show you how the question is asked [03:44] bigjools: basically, the .config should ask the questions [03:44] db_input high maas-dhcp/maas-dhcp-range || true db_go [03:44] db_input high maas-dhcp/maas-dhcp-range || true db_go [03:44] err [03:44] db_input etc etc [03:44] db_go [03:45] and in .postinst [03:45] db_get should be used to obtain the value [03:45] I already do this [03:45] but only on configure [03:45] or reconfigure [03:45] it doesn't seem to get called on install [03:46] oh [03:46] bigjools: so look at region-controller.postinst for what I do in reconfigure [03:46] I am missing a .config! [03:46] to set the new value [03:47] yeah [03:48] roaksoax: I still think this is a problem [03:49] for dist upgrades [03:49] indeed [03:49] we'll have to get them tested very well [03:49] we not only have to consider dist-upgrade,s but also CD installs [03:49] in maas.config it avoids asking questions on install [03:50] bigjools: that's a hack for dbconfig-common [03:50] I don't know how to detect all these conditions [03:50] bigjools: bigjools and to not ask for default-maas-url, becuase we were "attempting" to determine it automatically [03:51] so maas.postinst was a special case because they didn't want dbconfig-common questions being shown on install [03:51] nor ask for the IP MAAS used [03:51] so we made a first attempt to determine it [03:51] that is why often times [03:51] we have to run a reconfigure [03:51] ok [03:51] to set the IP of the internal network or network that is connected to the clients [03:52] bigjools: we cannot just do that on cluster-controller because is a completely different case [03:52] indeed [03:52] unfortunately :( [03:52] look, I'll set it up now so it asks questions so I am unblocked, we can fix it later [03:53] It's almost in a usable state nw [03:53] now [03:53] awesome! [03:53] i'll play with it tomorrow [03:53] well it installed [03:53] that's a start :) [03:53] I just have this config proboem [03:53] bigjools: hehe, well you got the easy part [03:53] :) [03:53] :) [03:53] separating is much easier than packaging from scratch :) [03:53] I've never made a package before [03:54] i suffered hard when i first packaged maas :) [03:54] yeah well I cobbled up the first revision of packaging too, because I just used a tool to do it :) [03:54] heheh [03:54] well that's how you learn [03:54] whose name I forget [03:54] hands on experience [03:54] totally [03:54] anyways [03:54] i'm off [03:55] good night, and thanks roaksoax [03:55] have a good rest of the day [03:55] yeah, lunchtime is here [04:34] what does maas do if an api user asks for a non precise/quantal series? [04:38] jtv: why is the nvidia driver corrupting your filesystem ? [04:40] jam: I don't know if it is. But the last time I had regular filesystem corruption on this system, William had the same problem at the same time using the same driver. Plus we know the problems with nvidia... [04:41] roaksoax, fwiw merged the branch, and fixed the broken test [04:43] jtv: do you suffer machine lockups ? [04:43] huwshimi: I'd like to chat with you quickly about the UI changes for hardware constraints. We've done most of the backend stuff, but I'm not 100% sure where to put the frontend changes. [04:44] jam: Sure [04:45] lifeless: no, no lockups. I suspect memory corruption just hitting the filesystem because the filesystem keeps so much data in memory. [04:45] Lots of crashes, of course, and no unity. [04:45] But that may be normal unrelated beta fare. [04:45] machine crashes ? [04:45] spontaneous reboots/OOPSes ? [04:47] jam: Hangout/irc? [04:48] huwshimi: well, IRC is fine for me, but we could do hangout, too. [04:48] jam: IRC is fine. [04:49] huwshimi: so I think the basic idea is to expose: cpu_count/mem/arch/tags to the Node page itself [04:49] at least as a first step [04:50] jam: Yup. [04:50] huwshimi: so what do I actually edit on my end to make sure that stuff gets exposed for you. [04:51] is it updating the 'views' or the 'forms' or the HTML template code? (All of the above?) [04:52] hazmat cool thanks [04:52] allenap: around already¿ [04:52] jam: You'll need to expose the hardware info in NodeView (src/maasserver/views/nodes.py) [04:53] (I think that's the right place) [04:53] jam: I can then expose them on the html side [04:54] huwshimi: k [04:55] jam: I'm not sure about the specific implementation, but it will be something along the lines of adding the data to the context for rendering (I'm not sure how familiar you are with Django). [04:56] huwshimi: not very, but I'm pretty good at 'google: django XXX' :) [04:56] hehe [04:56] jam: You can test it by outputting the same variable names somewhere in the template (with {{ var_name }}) [04:57] roaksoax: allenap only went to bed an hour ago :) [04:58] heh :) [04:58] roaksoax: got the dbconf working for now [04:58] jam: Also, somewhere there will be a place to test the view is returning the correct data. [04:58] needs fixing for upgrade + CD install cases but it's a starty [04:58] awesome [04:58] indeed [04:58] alright illntest tomorrow [04:58] typos R us today [04:59] jam: Ah, src/maasserver/tests/test_views_nodes.py and you can test the client response contains the vars [05:00] huwshimi: thanks [05:01] lifeless: not experiencing reboots, no. Individual programs crash. [05:03] jtv: ah, how annoying [05:03] jtv: anything in dmesg ? [05:03] What sort of thing are you thinking of? [05:04] (I also had to switch to KDE for the duration, so a poor comparison base) [05:04] you sometimes get warnings on bad memory reads [05:04] Oh [05:04] or crashes in kernel threads [05:04] The last time I had the FS corruption problem I did a full memory check. Showed no problems. [05:05] Nothing in dmesg that jumps out at me. [05:06] I'll have a closer look on my next few crashes. [05:10] bigjools: how about I remove the uuid option from start-cluster-controller but add the uuid generation/loading in a separate branch? [05:10] jtv: sounds good [05:10] ok [05:12] sorry I missed your reply to the MP, I'm not checking email religiously ATM [05:13] jtv: python-netifaces!!!! [05:13] Yeah [05:14] jtv: we should check with rbasak that it works on arm [05:53] bigjools: mgz and I won't make it to the standup today. He is on a train going to London, and I have to go pick up my son from school (it is an early-out day). [05:53] ok thanks for letting me know [05:54] We are essentially polishing the Tag api a bit, I'm looking to expose the new fields for the HTML, and working up a list of things for Diogo to test. [05:54] I'm also personally running into 'make run' doesn't work on my Quantal VM [05:54] Diogo is going to be a busy man [05:54] what's happening? [05:54] Well, a lot of our testing can be automated, I think. Which is meant to be done as part of the Jenkins w/ packages, right? [05:55] bigjools: well, something starts up, but then I get proxy errors if I try to connect to the web ui. [05:55] I'm also wondering if it is related to my other rabbit failures. [05:55] I can connect to port 5240, but 5243 (?) is failing to respond [05:56] I've had this before [05:56] 'bin/test.maas' is failing a Rabbit test, and hanging on another one. [05:57] and 'make run' doesn't seem to be bringing up a service, but it is hard to sort through the chatter to figure out what is actually wrong. [05:57] jam: manually starting the regular rabbit service didn't work? [05:57] jtv: it is already running [05:57] restarting it didn't help [05:57] :( [05:58] Just now I did uninstall and re-install the rabbitmq-server, but it still failed in the test suite. [05:58] jam: did you try ctrl-C'ing the test to see the traceback? [05:59] jtv: I can get that, (which is how I know it is a rabbit test) [05:59] I can paste it for you. [05:59] In 'ps' I do see a beam.smp process running as a child of the test runner. [06:00] jtv: http://paste.ubuntu.com/1229812/ [06:02] 5243 is longpoll isn;t it? [06:03] jtv, bigjools: It looks like I'm getting a 'logs/webapp/current' message which is lockfile.LockTimeout [06:03] So maybe there is a stale lockfile somewhere? [06:03] where would that be? [06:03] in /run [06:03] Oh, the /run/maas.* [06:03] jtv: no files named ./run/maas [06:03] only named exists there [06:04] no . [06:04] / [06:04] /run/maas.* [06:04] They have a tendency to stick around, don't they? [06:04] database one sometimes does for me [06:04] bigjools: /run is owned by root, and there is no '/run/maas' directory [06:04] are we supposed to be doing 'sudo make run' ? [06:04] no [06:05] hmm [06:05] what's going on [06:05] Not /run/maas/, as I recall, but /run/maas.* [06:06] jtv: ls /run/ma* no such file or directory [06:06] /run/lock maybe? [06:06] That's a directory. [06:06] jtv: lots of /run/lock/maas* files [06:06] just nuke those? [06:06] ah that's it :) [06:06] Better make sure their associated processes are dead. [06:07] jtv: no python processes running that are 'maas' related. [06:08] Doesn't have to be python IIRC. [06:08] The files contain pids, don't they? [06:08] however, I still get 'Service Temporarily Unavailable' on localhost 5240 [06:08] jtv: have we got a quick way of seeing if a string matches an enum value? [06:08] bigjools: map_enum [06:08] jtv: well those files don't exist anymore now... :) [06:08] You turn the enum into a dict, then see if the string is in there. [06:08] jtv: and looking at it now that they exist again, they are all 0 length files. [06:08] no pids [06:08] Erm. Haven't had these problems for a while so don't recall which processes they were. :( [06:09] Try ps -ef | grep maas [06:09] (The "maas" is often in surprising places IIRC, such as the path for the executable; you're not looking for executables called maas) [06:10] jtv: only postgres and my editor [06:11] See any named running? [06:11] jtv: no, because the install recommended disabling named as part of installing bind [06:12] (I can run it in my vm, though) [06:14] Just checking -- it's one of those processes I found myself killing a lot when I had lock-file problems. [06:14] jtv: now I have named running, startup still seems to be not working properly: http://paste.ubuntu.com/1229823/ [06:15] Note that the lock file that seems to be going stale is "maasserver.start_up.lock' [06:15] I see that there *is* such a process [06:15] Maybe it's not stale, just blocked... What were we supposed to be running on 5423? [06:16] (because that file has the same inode as link-vm.MainThread-29249) [06:16] which is an existing pid [06:17] Hmm... the API I guess. I thought that'd be on 5240. [06:18] jam: I guess apache could be implicated then. [06:19] jtv: ./logs/webapp/current still has a timeout trying to grab the FileLock. However, I see that the file exists on disk, and if I delete it, I still get a failure. [06:20] Sounds as if the lingering lock is a symptom, not the cause. [06:20] In other words, maybe it's not stale, just blocked on something. [06:22] jtv: my immediate thought is that the timeout passed to LockTimeout is being misinterpreted. The docs say it is 'seconds', but what if it is 'milliseconds' [06:23] hmmm... the code looks to be using time.timeout, which seems like it would be fine. [06:23] sorry, time.time() which definitely should be seconds. [06:32] jam: any luck with that timeout? I'm guessing you're experimenting with it. [06:33] And by the way, no idea whatsoever whether it matters here, but I believe timestamp granularity in ext3 seems to be one second. So your filesystem _might_ be one of those little things to check. [06:34] bigjools: I'm hoping to write a cluster controller's uuid to a file in /var/lib/maas, but then we'll have to deal with distribution once we start worrying about cluster HA. [06:35] jtv: let's cross that when we come to it [06:35] All for that. Thanks. [06:35] jtv: well it isn't doing a stat on the fs. what is weird is that it appears to be creating the unique name, hard linking it to the lock file it wants, and then failing with a timeout after it all was successful [06:35] What is "it" by the way? [06:39] jtv: lockfile.FileLock seems to be raising the LockTimeout exception, even though it managed to really acquire the lock. [06:40] Urh [06:40] jtv: however, that is a falsity, because nothing is actually writing to logs/webapp/current, it must be stale data. [06:40] Weird clock skew problem? [06:40] (I nuked the files, and started again) [06:40] Or is it doing some post-lock check that's failing, but not showing up in strace? [06:40] I do see an error in 'managed-keys-zone ./IN: loading from master file failed: file not found' [06:41] jam: hunch... could you have a look at your syslog to see if apparmor is throwing a spanner in the works? [06:43] jam: oh, on an unrelated topic -- we have a "make lint" target similar to launchpad, except it doesn't care whether changes are committed or not. Please spread the word that it should be run religiously! [06:43] jtv: named complaining that the socket is already in use? [06:43] [06:43] jtv: yeah, I've been trying to do so [06:43] jam: hnyargh... we've run into that [06:43] This is a fresh trunk, not a PPA build, right? [06:44] jtv: well, my branch, but yeah 'trunk-ish' [06:44] Good enough [06:44] jtv: stracing the 'runserver' pid is showing it blocked on recv(6, [06:44] that sounds like it is listening [06:45] jtv: any way to debug what filehandle '6' is? [06:45] Any idea where that fd goes? [06:45] lsof [06:46] jtv: localhost:58371 which seems to be an ampq? [06:47] TCP localhost:58371->localhost:amqp (ESTABLISHED) [06:48] jtv: so it looks like that is waiting for ampq to tell it to do something, but it isn't getting anything [06:48] which sounds related to 'rabbit is br0ken on this machine' [06:49] yeah [06:49] funny thing is, it's a synchronous exchange. [06:50] request-wait-receive [06:50] You'd expect that to break if there were a serious problem with rabbit. [06:51] I saw 2 beam processes running, and service stop rabbitmq killed only one of them, and caused a traceback in the terminal [06:52] maybe one ate the other one's message [06:52] If rabbitmq-server is stopped [06:53] then I get a socket.error tracebakc in 'write_full_dns_config' [06:53] which seems to be the thing wanting to talk to rabbit [06:57] write_full_dns_config? Isn't that one we replaced? [06:57] * jtv checks [06:57] ah no, I'm thinking of dhcp config. [06:57] jtv: so from what I can tell, the service is hanging on startup trying to tell rabbit to write the dns configuration, and waiting for rabbit to respond. [06:58] if I don't have rabbimq-server running, webapp crashes immediately. [06:58] If I do have it running, it is hung in 'recv()' [06:58] Trying to get a confirmation from rabbit that it's got the message, I guess [06:59] (The synchronous exchange I described earlier was from the traceback you pasted, in the rabbit test. May not be exactly the same situation.) [06:59] Any logs from rabbit? [07:00] "accepting AMQP connection ..." [07:00] but nothing else. [07:00] though it accepts an awful lot of them. [07:01] 'disk free space limit now exceeded' ? [07:01] I have 800MB free, but it seems to have a warning at 1GB? [07:11] jtv: ugh. celeryd has a traceback of 'IOError(ENOENT)' trying to read the DHCP_LEASES_FILE [07:11] Hhnnnny [07:11] I have the feeling the celery worker is dying trying to check if the file is current. [07:12] so it isn't responding [07:12] and the main thread is then just hung waiting for it to return. [07:12] It should be trying to parse that file when it gets the right task. [07:12] Oh! [07:12] And we have it set to keep retrying, don't we? [07:13] We never really decided how to deal with that sort of thing. [07:13] I mean, we decided to set the jobs to keep retrying, but no clear rule for how/when to give up. [07:15] jtv: from what I can tell, it is trying to load the file, and then process updates and re-write the file [07:15] but the file doesn't exist [07:15] so it just dies trying to load it. [07:15] We don't rewrite the leases file as such... dhcpd does that. [07:15] well, I also don't even have a /var/lib/maas directory [07:15] so something isn't creating it. [07:17] hmm.. I did not have 'maas-provisioining' installed. [07:17] still no /var/lib/maas [07:27] * jam off to go get my son. [07:28] bigjools, jtv: what do I need to test works on ARM? [07:29] can anyone run lint on quantal? [07:29] bigjools: we had some lint earlier, but lint worked [07:29] rbasak: python-netifaces package [07:29] rbasak: a python package... hang on... [07:29] ah yes that [07:29] would make jtv happy I reckon [07:30] jtv: I get this: [07:30] ImportError: /home/ed/canonical/maas/sandbox/lib/python2.7/lib-dynload/parser.so: undefined symbol: _PyNode_SizeOf [07:30] make: *** [lint] Error 123 [07:31] I think I had something like that a few days back but "make distclean build" fixed it. [07:32] It works for me in trunk. [07:32] distclean.... weep [07:33] "Have you tried turning it off and on again?" [07:34] that fixed it. Stale .so/.pyc file I suppose === rvba is now known as Guest78352 [08:29] bigjools, jtv: python-netifaces (out of pypi using easy_install) appears to work fine on armhf/highbank: https://pastebin.canonical.com/75431/ [08:30] rbasak: thanks! But... out of pypi? Not the package? [08:30] I didn't see a package [08:30] ah [08:30] OK I see it now [08:30] My hacked ARM MAAS install doesn't set up sources.list right :-/ [08:31] Now...where's easy_uninstall? [08:31] :) [08:31] * rbasak finds rm -Rf [08:32] Execuse me while I go through 2-factor auth... [08:32] I have a yubikey on the side of my monitor [08:32] The hard part is the first factor because I don't know what my password is [08:33] OK so from the package: https://pastebin.canonical.com/75432/ [09:10] _rvba: you mentioned WORKER_UUID... wouldn't CLUSTER_UUID make sense as a name for the UUID setting? [09:11] <_rvba> jtv: yeah, CLUSTER_UUID is probably better. [09:11] OK [09:11] I guess I'll have to set it both in the global and in the local config, and then postinst will override the latter. [09:12] I'm initializing it to None, not the empty string, to avoid any risk of mistaking the initial value for a proper uuid. [09:13] <_rvba> Ok. [09:18] _rvba: I've been wondering... do we also need to write queue definitions in the celery config? [09:18] <_rvba> jtv: no, the queues will be auto-created by celery. [09:18] Sweet. [09:18] <_rvba> # Each cluster should have its own queue created automatically by Celery. [09:18] <_rvba> CELERY_CREATE_MISSING_QUEUES = True [09:18] <_rvba> jtv: ^ [09:19] Perfect. === _rvba is now known as rvba [09:49] rvba: you did it! [09:49] jtv: yep :) [09:49] * jtv applauds === matsubara-afk is now known as matsubara [13:19] rvba: howdy [13:19] roaksoax: Hi. [13:19] rvba: so I was looking to add to the Settings page the commisioning_distro_series [13:20] rvba: so i did, but i think i'm missing something [13:20] rvba: http://paste.ubuntu.com/1230392/ [13:20] allenap: howdy! [13:21] allenap: there's an issue on the tftp server or so it seems... [13:21] roaksoax: the 'initial' stuff should not be required I think, it will be picked up based on the config value. [13:21] roaksoax: I heard that tests were failing but haven't had a chance to look any further. Is that what you're thinking about? [13:22] rvba: that's the thing, it doesn't seem to be picking it [13:22] allenap: nope, let me show you a log [13:22] roaksoax: I can do that change to the Settings page for you. [13:22] rvba: sure! I just wanted to figure out how is it that the value gets updated (the config value) based on what's selected on the UI [13:23] roaksoax: especially if that can free up your time so that you can make progress on the packaging stuff ;) [13:23] roaksoax: I'm sorry to insist like this, but this is really critical for us. We need it to be able to test and fix stuff. [13:25] allenap: so the esceneario is simple. I added the squashfs image to the tftp path so that the installer can use it. However, this error appears: https://pastebin.canonical.com/75454/ [13:25] allenap: that's kinda critical btw :) [13:26] rvba: hehe no worries, either way I'm still blocked on other stuff [13:27] forgot to ask bigjools last night about the power settings update on commissionning [13:27] roaksoax: ok, still, I'll do that change to the settings page. [13:27] roaksoax: that's landed. [13:27] roaksoax: the power settings update stuff I mean. [13:28] rvba: nice! what's the rev? [13:28] rvba: i see it [13:28] 1089 [13:29] rvba: any examples of how to send the data back? [13:30] roaksoax: sure, just one sec. [13:30] smoser: http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/revision/1089 [13:31] roaksoax: you can have a look at test_signal_power_type_stores_params (src/metadataserver/tests/test_api.py) but it's simple: just send a jsonized version of the power parameter in the power_parameter param. [13:35] rvba: awesome, thanks [13:35] welcome [13:36] roaksoax: The file is too big to be transferred over TFTP :-/ [13:39] allenap: right, but what's the size limit then? [13:41] roaksoax: The maximum file size is (blocksize * 2**16), blocksize defaults to 512 bytes but can be negotiated up to 1428 on ethernet, so 93585408 bytes, or ~89MB. [13:41] With negotiation it's 32MB. [13:41] s/With/Without/ [13:42] roaksoax: How big is the squashfs? [13:42] Can it be transferred via HTTP instead? [13:44] allenap: then something is wrong [13:44] ubuntu@maas:/var/lib/maas/tftp/amd64/generic/quantal/install$ du filesystem.squashfs -B MB [13:44] 64MB filesystem.squashfs [13:45] roaksoax: It's not negotiating blksize then. That't not automatic; it's up to the client to request it. [13:45] roaksoax: That's a *very* big file for TFTP. Is there any alternative? [13:46] allenap: http..though we were trying to avoid using it [13:46] dannf: ^^ [13:46] err [13:46] sorry [13:46] Daviey: ^^ [13:46] roaksoax: If it helps, it's trivially easy to add an HTTP server alongside the TFTP server, in the same process, so serving the same files. [13:47] allenap: yeah I already did :) [13:47] allenap: but we were trying to avoid having to do that [13:48] roaksoax: I don't mean Apache or anything like that, I mean a few lines of Python to make the TFTP server also serve up the tree on HTTP. So, no extra packaging or anything. [13:49] allenap: http://paste.ubuntu.com/1230452/ [13:49] allenap: that's what I did [13:50] allenap: what is your idea? [13:51] roaksoax: Right. That's part of the region controller. The TFTP server should be part of the cluster controller, though I'm not sure it is yet, because we don't have a story around mirroring /var/lib/maas/image/ to cluster controllers. [13:52] allenap: i though all the tftp and preseeds where at the region controller [13:52] roaksoax: My idea would be to start an HTTP service in the twistd process that the TFTP server runs in, to save on duplicating config. [13:52] allenap: right, but that would mean we would have to use a different port [13:52] roaksoax: Yeah, they are, but they shouldn't be. The TFTP server was always intended for the cluster controller. [13:53] allenap: if we are running on all-in-one [13:53] escenario [13:53] roaksoax: Yeah, different port. [13:53] allenap: right, but at the moment, where is it that the tftp files are? [13:55] roaksoax: care to review? https://code.launchpad.net/~rvb/maas/improv-settings/+merge/126689 [13:55] roaksoax: The TFTP server needs three things: static file tree (/var/lib/maas/tftp), templates (src/provisioningserver/pxe/config.*), and the URL for the region controller. [13:56] rvba: whjy are you removing the distro_series from the the NodeForm? [13:56] rvba: we still need that [13:57] roaksoax: Templates are in the package, and the URL is config; the first of those, the boot files, isn't mirrored to the clusters, so we have to run on the region controllers until we do. [13:57] that's a different series from the commissioning one and the one in settings [13:57] roaksoax: ah, indeed. [13:57] allenap: and when is that going to be done? [13:58] roaksoax: No time soon, and it doesn't solve the problem of moving the squashfs over the wire. [13:59] allenap: right, so we also need to send something like: d-i live-installer/net-image string http://{{server_host}}/MAAS/static/image/{{node.distro_series}}/{{node.architecture}}/quantal-server-amd64.squashfs [13:59] on the preseed [13:59] allenap: and if it is not going to be done anytime soon [14:00] allenap: then i think we could avoid [14:00] having twisted running an extra HTTP server [14:00] roaksoax: That snippet of Apache conf and that update to the preseed sound like things that can be done fairly quickly. I don't see a blocker for those. [14:01] allenap: right, so when the cluster-controller servers the tftp itself [14:01] i think we can look into the HTTP stuff [14:01] becuase either way, we need to tell the preseed *where* the HTTP files [14:01] allenap: and the preseed wont be able to determine in what cluster controller are they, right? [14:02] roaksoax: Yeah. Twisted can run it, or we can run Apache or whatever. [14:03] matsubara: hi -- I landed that new command for starting a cluster controller. Once you have packaging r98 and trunk r1091, maybe you could try it? I can't build packages at the moment for some reason. [14:03] roaksoax: The code that generates the preseed should know which node it's preparing it for, so it can look-up which cluster controller it should link to. [14:03] roaksoax, the signal thing looks reasonable. [14:03] Ah, sending up power settings to the metadata signal call? [14:04] sorry.. just reading above. [14:04] it would seem simpler to me to have maas run a http server [14:04] than to bother configuring apache [14:04] the only reason i'd be hesitant is that apache can deliver large files REAL FAST and at scale. i dont know how well simple python web server does for that. [14:04] (and i'd run the web server not on port 80) [14:05] jtv, will do [14:05] if i was doing it from twisted. that just avoids conflict with anything else. [14:05] roaksoax: field restored on NodeForm :) [14:05] matsubara: thanks! It should be just a matter of making sure that celery is not running, and then doing "maas-provision start-cluster-controller " [14:06] jtv, do you get any error when you try to request a build? [14:06] jtv, I just did and it seemed to work [14:07] I'm wondering if you're hitting the daily build limit [14:07] Oh, I haven't tried requesting on in Launchpad. Just locally from my own branches. [14:07] jtv, ah ok [14:08] allenap: ok, just tested downloading the image with tftp using a tftp-hpa server, and it worked like a charm [14:08] allenap: so there's an issue on the tftp maas is running [14:08] not client side [14:09] smoser: While we're restricted to running this from the region controller, let's use Apache. [14:09] And think again when that situation changes. [14:09] roaksoax: What did you download it with? [14:10] rvba: http://pastebin.ubuntu.com/1230497/ [14:10] allenap: 'tftp' [14:11] roaksoax: arg, the format-import script failed me, fixed. [14:12] roaksoax: Okay (there are two packages that provide /usr/bin/tftp, but that doesn't really matter). Does the download fail using tftp against maas's tftp server? [14:12] Or was the failure from pxelinux or something? [14:13] allenap: download fails agains maas tftp server, and download succeeds against tftp-hpa server [14:13] against* [14:13] allenap: the file i'm downloading is the squashfs image [14:13] in both servers [14:14] allenap: here's the file: http://cdimage.ubuntu.com/ubuntu-server/daily/current/quantal-server-amd64.squashfs [14:14] the one against the maas tftp server fails, the one against tftp-hpa succeeds [14:15] roaksoax: My guess is there's some option negotiation not happening. I'll see if I can debug it. [14:17] allenap: cool thanks [14:19] smoser: ok so how can we extend maas-signal to send the power settings? [14:20] roaksoax: Yeah, it's getting 32MB then failing. I have to go out now, but I'll be back in ~1h. [14:20] allenap: ok thanks [14:23] roaksoax, i can put that togetheri think. [14:23] can you get a json blob collected? [14:25] sure [14:27] mgz: poke [14:27] jam: wiggle [14:28] mgz: how's London? [14:28] when's the conference? [14:29] My one-hour trip to get my son turned into a 6-hour sojourn to Ikea with my wife, to pick out cabinets and curtains for the new house. [14:29] jam: not till 6:30, so, some hours yet [14:29] So I'm working late tonight. [14:29] ikea is all-devouring [14:30] mgz: I think I have 2 branches for review, but it isn't terribly critical vs making sure we have testable stuff. [14:30] On the flip side, I couldn't get 'make run' to actually work here, because rabbit or DHCP is not playing along nicely. [14:30] so, I have a few things to fix in my pending branches but otherwise we're good I think [14:30] So I wasn't able to get to the point to audit for exposing stuff to the view, or to make sure the CLI worked how I expected. [14:30] I wonder what make run does here... [14:31] django.db.utils.DatabaseError: relation "maasserver_config" does not exist [14:31] LINE 1: ..._config"."name", "maasserver_config"."value" FROM "maasserve... [14:31] hm... [14:31] well, apart from that it did things :D [14:31] mgz: read hacking, I think there is more stuff you need to install [14:38] smoser: http://paste.ubuntu.com/1230560/ ok so we would need to integrate that... how do you think would be the best option? [14:38] smoser: integrate it in maas-signal, use it separately? [14:38] i'd planon having that invoke maas-signal [14:39] and we can modify maas-signal to handle "ipmi" "{json-blob}" or something [14:39] smoser: alrogitj, one thing that the script needs is to create a temp file to avoid all of those bmc-config --commit commands [14:40] smoser:right, so get_maas_power_settings returns that "IPMI" "json-blob" [14:41] roaksoax and Daviey: please could you review my branches in https://bugs.launchpad.net/ubuntu/+source/maas-enlist/+bug/1056816 ? This needs to get SRU'd for ARM on MAAS to work. [14:41] Ubuntu bug 1056816 in maas-enlist (Ubuntu) "maas-enlist does not post subarch" [Undecided,Triaged] [14:41] smoser: now, another thing, do you think we should do this "if Network is set to static, change it back to DHCP and wait for an IP"? [14:43] not without explicit instruction i would htink. [14:44] smoser: cause if it is static, we will set it to DHCP [14:45] right. i was just saying that htat seems like a recipe for breaking something :) [14:45] roaksoax, the maas team is in need of packaging help too, right? [14:46] did you look at that? [14:46] smoser: i have been working with bigjools on the packaging [14:46] smoser: he did some more changes I have yet to test [14:46] ok. good. [14:48] rbasak: is the subarch stuff already in MAAS? [14:48] roaksoax: not yet [14:48] roaksoax: I need this in first before it can land in MAAS, since otherwise it'll break [14:48] roaksoax: by putting this in first, MAAS just ignores the subarchitecture field and doesn't break [14:48] rbasak: right, so I;d recommend this lands first in MAAS [14:49] rbasak: because the fact that it will land in MAAS doesn'yt mean it will be released in Quantal [14:49] roaksoax: no, that breaks maas [14:49] at least not yet [14:49] roaksoax: landing the mass-enlist change first doesn't break anything [14:50] rbasak: indeed. I will merge this with upstream maas-enlist [14:50] and upload from there [14:50] SRU will have to wait though [14:50] roaksoax: thanks! [14:51] roaksoax: SRU will have to wait on what? [14:53] rbasak: this is basically a new feature rather a bugfix because currently, there's no MAAS in quantal that this can be test against, hence not breaking enlistments from precise into quantal MAAS [14:54] so we cannot SRU a feature [14:54] we can SRU a bugfix [14:54] bug this bugfix has no way to test it, until we have MAAS in archives [14:54] roaksoax: right, but for MAAS on ARM enablement this is required, since MAAS will be deploying precise. Unless we switch to using quantal ephemeral images? [14:54] We just need to make sure that precise MAAS isn't broken by this SRU [14:55] mgz: let me know if there is anything you need from me for review, etc [14:55] jam: maybe an eyeball on my current pending maas branch [14:55] rbasak: right, but that means currently, it doesn'yt fix any enlistment process becuase there's no new MAAS in precise [14:55] apart from that I'm okay I think [14:55] roaksoax: so what do you propose I do? [14:56] rbasak: I'll push it to quantal, and wait until we have a new MAAS to SRU [14:56] mgz: link? [14:56] rbasak: i'm pressuming that the MAAS for ARM enablement is going to the SRU too right? [14:56] roaksoax: we can't have a new MAAS. I can't land the corresponding fix in MAAS without maas-enlist in precise updated [14:56] roaksoax: so if you want to wait we're deadlocked [14:57] rbasak: is not about me wanting to wait, its about that there's no test case [14:57] roaksoax: the test case for the SRU only matters for released precise MAAS, surely? [14:57] rbasak: we cannot say "install maas from archives, maas-enlist, things break" [14:57] roaksoax: what else are you worried about the SRU breaking? [14:58] rbasak: right, but SRU needs ttest cases proving that this *fixes* a bug in *precise* [14:58] roaksoax: SRUs for enablements are permitted too [14:58] roaksoax: I don't see any other option. You can't wait for support for this in MAAS trunk, since I can't land that since it'll break without the maas-enlist SRU in place. [14:59] roaksoax: so I don't see what you're proposing [14:59] rbasak: alright, so let's do this, I'll upload to quantal [14:59] roaksoax: apart from being deadlocked and this never landing [14:59] rbasak: and could you please prepare the SRU [14:59] and I'll sponsor your upload [14:59] roaksoax: SRU alreadyu prepared and linked in the bug [14:59] then it will be on the hands of someone over the release team [15:00] I've already built a PPA and tested on that [15:00] rbasak: https://wiki.ubuntu.com/StableReleaseUpdates?action=show&redirect=SRU [15:00] that's what I mean by preparing an SRU [15:00] I don't have an INtel machine to test with, but I can confirm that running curl int he same way that maas-enlist does with the architecture changed to i386 and subarchitecture generic does appear to enlist on precise packaged maas without any issues [15:01] roaksoax: oh, as in in the bug? [15:01] roaksoax: ok, no problem [15:01] bug description I mean [15:01] yeah :) [15:01] roaksoax: is the package OK? [15:01] jam: [15:02] rbasak: yeah it looks good. Note that archdetect corresponds to the one of the dpkg installed [15:03] mgz: ugh. looks like bigjools' change to subscribers means I'm not getting code review email at all now... :( [15:03] jam: I gave in and manually subscribed to the branch [15:03] which I think is working [15:03] so if dpkg is i386 on a amd64 machine, it will be detected as i386. I don't know whether that affects your case [15:04] I purposely skipped that case [15:04] mgz: I thought I was, but maybe I missed it somewhere. I'll try again. [15:04] I'm using the same detection that was there before [15:04] jam: [15:04] And assuming generic for amd64 and i386 [15:04] (rather than asking archdetect) [15:05] roaksoax: ^^ does that sound OK to you? [15:05] rbasak: right, I'm just making sure you don't jump into the same bug I did. That is enlistment with a i386 installer on a x86_64 machine always resulted in a i386 system enlistmed in MAAS because of archdetect. So In order to successfully detect amd64's i had to do somthing else [15:06] OK [15:06] mgz: most classes I've seen in Maas override __unicode__ rather than __str__ is there a reason you did __str__ ? [15:11] it's truthful [15:11] exceptions get stringified in Python 2 [15:11] if you write a __unicode__ method on an exception subclass, you're liable to break worse in edge cases [15:12] there have been at least three bugs across python versions with that [15:13] mgz: well, we only care about py2.7 for starters (there are some significant 2.6+isms in the code), and I'd rather be consistent with other exceptions in the module (if they use __str__ then go with it) [15:13] they don't use anything [15:13] mgz: fun times, all google related addresses are failing for me right now. google/gmail/youtube/googleanalytics/... [15:13] 8.8.8.8! [15:13] mgz: filtered at the ipaddress it seems [15:13] going to 74.125... is failing. [15:13] or has dubai blocked google completely? :D [15:14] I can get a DNS lookup [15:14] ehhee [15:14] and I can use 8.8.8.8 [15:14] I thought that was it, too. [15:14] ping 8.8.8.8 is 35ms, but ping www.google.com is timeout [15:15] mgz: and now it works again. [15:15] Somebody decided to play nice in the last 20 min. [15:15] some google search result had an offensive picture... auto block all of google! [15:16] mgz: you can get lots of stuff from https://www.google.com :) [15:16] The cached results is apparently the favorite way to bypass the UAE firewall. :) [15:16] oops, this isn't SSL IRC, I'll be see.... .gabaou aou [15:16] a [15:17] eheheh [15:21] rbasak: uploaded [15:21] roaksoax: thank you! [15:49] rvba: How do I get South to rename a column? Or do I always have to refer back to the original via db_column? [15:50] allenap: you can use something like db.rename_column() in the transition. [15:53] mgz: https://code.launchpad.net/~racb/maas/subarch-model/+merge/126712 - have I broken any constraint stuff? I haven't touched anything, but Node.architecture is now "arch/subarch". [15:55] rbasak: so, it looks like you've updated the constraint tests to require the form with "/generic" on the end [15:56] so, that's not broken, and can probably be improved in a followup branch [15:56] mgz: ah. That rings a bell, but I don't find it in the diff! [15:56] s/don't/couldn't/ [16:10] allenap: ok so it is becuase tftp server doesn not support extended mode (whatever that menas :) [16:11] roaksoax: I'll have to look that one up. [16:11] allenap: http://ww2.unime.it/flr/tftpserver/tftp.html he problem is that some network devices (Cisco IOS devices are the most common and annoying case) don't support tftp extended mode though, so don't wonder when you are trying to transfer the latest IOS 12.4 image large 56 Mb to your Cisco router and the tftp transfer fails after 32 Mb have been transferred. [16:11] so it is the same problem as those cisco IOS devices :) [16:12] roaksoax: Ah, there's a mention of block number wrap-around. I'll see if that makes a difference. [16:14] allenap: yeah if the server doesn not send a block number for the client to ack then i ugess that's why it would only download the first 32mb [16:17] roaksoax: It's that the blocknum is limited to 2 bytes in headers. [16:19] allenap: http://lists.freebsd.org/pipermail/svn-src-head/2011-July/029443.html [16:21] allenap: now that fix needs to be adapted for python-txtftp [16:23] at least so it seems as a possible solution [16:26] roaksoax: I'll try and get that working. [16:26] roaksoax: Still, HTTP would be much better :) [16:28] allenap: i don't mind, Daviey prefers it to be tftp though === matsubara is now known as matsubara-lunch [16:49] roaksoax: do we also need an FFe for maas-enlist? [16:49] rbasak: not if it is a bugfix, since we are treating it as one, we dont [16:50] roaksoax: OK. Is there anything more I need to do? [16:51] rbasak: no not for now :) [16:52] OK thanks. Just checking! === Guest75974 is now known as dpb___ [17:32] allenap: any luck? === matsubara-lunch is now known as matsubara === melmoth_ is now known as melmoth [21:14] roaksoax, this may or may not be useful (or work) [21:15] lp:~smoser/+junk/mirror-query [21:15] ./mirror-query https://maas.ubuntu.com/images/ my.d -vv --checksum [21:15] that should mirror maas-ephemeral locally [21:18] hmm.. bug [21:20] smoser: cool thanks [21:21] smoser: btw.. i'm seeing something wierd on the cacnonistack images [21:21] smoser: i'm on a root terminal [21:21] and everytime I do sudo whatever [21:21] it takes for ever [21:21] https://pastebin.canonical.com/75400/ [21:21] i opened an rt yesterday [21:23] you can fix the sudo issue by putting your fqdn in /etc/hosts [21:23] but it affects all lookups [21:23] smoser: cool thakns [21:23] i guess the other thing ot do is change /etc/resolv.conf [21:23] to not have that search in it. [21:24] ack [21:50] echo "#######################################################" [21:50] echo "##### Starting maas-region-controller postinst ########" [21:50] err