[00:11] hi roaksoax and smoser [00:11] bigjools, hey [00:12] this morning would have been better had my wife not woken me up from the deepest sleep ever [00:17] bigjools, :-( [00:17] lp:~smoser/+junk/backdoor-image/ [00:17] has "backdoor" [00:17] cool [00:17] * bigjools grabs [00:17] just point it at an image and it will backdoor it [00:18] user 'backdoor' [00:18] I don't normally do backdoor [00:20] --user bigjools [00:20] :) [00:21] it's the ephemeral image I want in this case, right? [00:22] right. [00:22] /var/lib/ephemeral/.....disk.img [00:22] its probably most proper to 'restart tgt' afterwards [00:22] but i've never actually had to do that. [00:23] but clearly if it had open filehandles, it could be confusing. [00:23] yeah [00:23] I was going to ask - is it possible to proxy tgt via the clusters? at the moment ephemerals get pulled from the region [00:24] i dont know. i think the cluster would just want to run tgt also. [00:24] you want that to be as close as possible [00:24] its block level io [00:24] if i understand you correctly [00:24] bigjools, oh. [00:25] i thought about one thign that might be screwing you. [00:25] i dont think proxy settings or mirrors get sent down to commissioning [00:25] it's just that it's a bottleneck right now - we might need to start copying ephemerals to clusters [00:25] right. thats what i was saying. i think you should plan on copying ephemerals to clusters. [00:25] * bigjools tries commissioning again [00:25] i have to run [00:26] thanks for the script [00:26] i've tested it on / [00:26] but not actually on an ephemeral image. [00:26] ok :) [00:26] i've looked at it though (when pointed at an image) [00:26] but on / it added a user that i could ssh in as [00:26] so it seemed ok. [00:26] later. [00:26] cheers [00:27] oooo [00:27] I saw an error flash by [00:27] something about tty error [00:27] * bigjools tries to log in [00:28] tty error. [00:28] that is strange. [00:29] I think it mentioned stderr too [00:29] Oct 5 00:27:16 10-0-0-100 kernel: [ 25.467047] init: cloud-config main process (899) terminated with status 1 [00:29] Oct 5 00:27:16 10-0-0-100 kernel: [ 25.885676] init: cloud-final main process (1049) terminated with status 1 [00:30] copy off /var/log/cloud-init.log [00:30] and /var/log/cloud-init-output.log [00:30] (you cna proably sudo apt-get install pastebinit) [00:30] and do it that way [00:30] ProcessExecutionError: Unexpected error while running command. [00:30] Command: ['locale-gen', 'en_US.UTF-8'] [00:30] Exit code: 1 [00:30] there [00:30] I can scp it then paste, one sec [00:32] smoser: no cloud-init-output file, but here's cloud-init.log: http://pastebin.ubuntu.com/1261098/ [00:33] look around line 246 onwards [00:36] bigjools, /var/lib/cloud/instance/cloud-config.txt and /var/lib/cloud/instance/user-data.txt.i [00:37] i really have no idea why 'local-gen' would fail like that. but its not deadly. [00:37] cloud-config.txt is empty [00:38] user data has plenty [00:39] ah. yeah, it would be user-data. [00:39] there is no cloud-config in that. [00:39] but user-data has that script. and that *shoudl* get run. [00:41] http://paste.ubuntu.com/1261109/ is its contents [00:44] bigjools, something is stopping you from reaching runlevel 3 i think. [00:44] so the cloud-final job is not running [00:45] bigjools, what does [00:45] 'runlevel' [00:45] and sudo status rc [00:45] show? [00:47] N 2 [00:47] and [00:47] rc stop/waiting [00:48] it must be that tty error [00:48] smoser: ^ [00:49] oh wait. that is wierd. [00:49] cloud-final ran [00:50] try running it again [00:50] sudo start cloud-final [00:50] start: Job failed to start [00:56] I can't find any log for it [00:56] ok. try it more manually. i guess. [00:56] sudo cloud-init modules --mode=final --debug [00:57] need --debug before modules [00:58] boom [00:59] well it has a lot of tracebacks but it did complete [00:59] machine powered off [00:59] smoser: so something is wrong with the upstart conf perhaps? [01:00] log: http://paste.ubuntu.com/1261129 [01:02] how much memory on the system? [01:02] the FALLBACK stff is not really a big deal. its operating as mostly designed. [01:02] 2Gb [01:03] the issue is that it logs to rsyslog, but then from within it, the userdata its running calls /sbin/poweroff [01:03] it's an HP cube [01:03] and rsyslog gets killed [01:03] so logging breaks. [01:03] but id ont understand why it wouldnt have run [01:03] i have no idea really. [01:03] why does it only happen when console=ttyS0 is specified? [01:03] something failed. [01:04] it doesn't make any sense to me. [01:04] jeez, 29C at 11am, gonna be hot today [01:06] I'll retry and see if I can see that error on the console, now I know when to look for it [01:08] ah it did it on enlistment too [01:08] stty: standard-input: input/output error [01:09] I am going to file a critical bug [01:10] the stty error is maybe not related. [01:10] I am thinking that [01:10] can you get dmesg [01:10] and basically just tar up [01:10] /var/log/ === matsubara is now known as matsubara-afk [01:11] it just doesn't make sense. [01:11] bigjools, now that i think about it the tee /dev/tty2 is probably a bad idea. [01:11] as maybe the tee is in this situation too. [01:12] since we're /sbin/halting [01:12] the tee might get killed and cause unnecessary angst. [01:12] i like the log though. [01:12] maybe we should have the script do [01:12] sh -c 'sleep 10 && /bin/poweroff' & [01:13] i can come up with something better too, but basically just amke sure that cloud-init is done. [01:14] smoser: why does "start cloud-final" fail? [01:14] isn't that a hint? [01:14] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1061977 [01:14] Ubuntu bug 1061977 in MAAS "Machine fails to commission when console=ttyS0 is present on kernel opts" [Critical,Triaged] [01:15] can youg get dmesg [01:15] and the /var/log attachd there. [01:15] yeah [01:18] it is /dev/console [01:18] it completely is that [01:18] the fact that running it trhoug upstart died [01:18] and then manually did not [01:18] because its output will go to /dev/console when you do it via upstart [01:19] what from /var/log do you want? [01:19] maybe the kernel has a buffer and just gets fed up at some point. [01:19] jsut get it all [01:19] k [01:19] if you dont mind. [01:19] you didn't have any private data there :) [01:19] haha :) [01:21] oh. yeah. [01:21] while you're on that machine [01:21] can you just try [01:21] echo HI MOM | sudo tee /dev/console [01:21] and see what it does [01:22] i think we're gonna get an input output error basically. [01:22] tee: /dev/console: Input/output error [01:22] yeah. so it makes sense. [01:22] but i thought the kernel was supposed to be smarter. [01:22] you do have 'console=tty1' on the cmdline also, right? [01:22] yes [01:26] so there is no physical serial port, right? [01:26] just to be sure. [01:26] there isn't - unless the BMC is providing one surruptitiosly [01:28] hm.. [01:28] interesting. [01:28] bigjools, you see kernel output on the monitor, right? [01:28] I do [01:30] bigjools, the localegen died because it tried to write to its stdout [01:39] bigjools, so i've got to go afk. [01:39] but i'd like it if you reviewd my patch for https://code.launchpad.net/~smoser/maas/preserve-sources-list/+merge/127825 [01:39] smoser: ok [01:40] particularly, i'm not happy about the test, it seems long winded, but didn't know how to make better. [01:40] i dont really know what to do for your ttyS0 issue. [01:40] other than not specifying console= at all [01:40] which imo seems busted. [01:40] smoser: we have to remove it [01:40] as a default. [01:40] busted, but less busted than not booting at all [01:40] well, to be fair, you are the first person to see this. [01:41] how much testing has it had though? [01:41] not on VMs [01:41] more than a few systems. [01:41] I bet the others all have serial console [01:41] well, yes. [01:41] and so will the real target audience. [01:42] so changing the default (which is hard coded in source code) to accomodate a little toy system [01:42] doesn't make a lot of sense to me. [01:43] well hang on [01:43] you don't know that every system will have a serial console [01:45] you're right. [01:45] but i don't that it generally fails so badly if that is the case. [01:46] kernel bug? [01:48] http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt [01:48] i've always wanted to play with that. [01:54] smoser: I'll improve your test and land your branch [01:57] thanks. i'll poke the kernel guys tomorrow am. [01:57] but i dont have any ideas. === jtv1 is now known as jtv [02:11] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg246433.html [02:39] howdy [02:50] bigjools: howdy [02:54] roaksoax: hey [02:55] bigjools: so how do you feel about the packaging? [02:55] roaksoax: positive [02:55] bigjools: alright, i'm gonna try to upload to archive tomorrow so we can test upgrades from precise [02:56] roaksoax: \o/ [02:56] and get community to find more bugs if any [02:56] that's would help us too [02:56] I did a fix the other day to prevent questions when ugprading, fingers crossed :) [02:57] bigjools: which one is it? [02:58] roaksoax: config for cluster controller [02:59] it looks at DEFAULT_MAAS_URL if it can find it [02:59] bigjools: the autodetection of maas-url, yeah I saw [02:59] bigjools: it works [02:59] I did test it :) [02:59] bigjools: yeah me too, it works :) [02:59] are you still thinking about changing the sed stuff? [03:00] bigjools: i don't know TBH.. doing so will require .d support for the py's [03:01] bigjools: and would require a way to provide .d for the yaml conf's too [03:01] I fear that is too much work at this late stage :( [03:01] bigjools: Indeed!! the easier way right now is simply installing to /usr/share/maas and then symlink everything to /etc/maas/ [03:01] yeah [03:02] bigjools: however, the problme with that is that user settings won't be preserved on upgrade [03:02] bigjools: i also found another bug in upgrading but with dbconfig-common, for some reason it was not preserving the password info and was re-writing the config file. [03:03] the latter is expected, but should have presrved the password, but anyways, i work arounded it by simply sourcing that file and grabbing the password from there instead of letting dbconfig-common to rewrite the config [03:04] bigjools: now to continue on the config, I think it would be the best for now... will have to check with Daviey to see what's his appreciation on this [03:04] bigjools: maybe we can simply send the configs in /usr/share/maas and add some code in the configs itself that source whatever is in /etc/maas [03:05] smoser: still around? [03:33] roaksoax: are you going to land your packaging branch? [03:33] bigjools: yeah, i'm just testing one more thing [03:34] ok I put the bug back to in-progress :) [03:34] tarmac will set it when the branch lands [03:34] cool [04:35] roaksoax: I shall force a daily build now [04:36] bigjools: sure [04:37] being a build admin has its bonuses [04:38] bigjools: indeed it is [04:38] i envy you [04:38] hahah [04:38] i used to use PPA's extensively [04:40] * roaksoax bed [04:40] we bumped the prio permanently on the daily build ppa :) [04:40] nn roaksoax [04:40] night [06:23] allenap, rvba: So it turns out that MAASClient doesn't return an object like the Django Client return. It has a 'object.code' vs 'object.status_code' and 'object.read()' rather than 'object.content' [06:23] so... mocking/stubbing/ bad for the real world :) === lifeless_ is now known as lifeless [06:50] allenap: you lied to me... :), list data is not supported by MAASClient. When it gets down into 'make_payload' you support bytes/unicode/IOBase and callable. [07:03] hey lifeless, any idea wtf is going on here? http://paste.ubuntu.com/1261414 [07:21] I think we have a nasty bug with omshell here :/ [07:24] good grief, I thought I'd seen bad code but this is hideous. http://ftp.fr.netbsd.org/cvsroot/src/usr.sbin/dhcp/common/Attic/parse.c,v [07:30] bigjools: looks like a url to me [07:30] bigjools: maybe it needs quotes ? [07:30] lifeless: I think the base64 parser in omshell blows [07:30] quotes don't help [07:30] bigjools: that is very odd [07:30] looking at the code it treats + and / specially [07:30] oh joy. [07:31] but the code is sooooo bad that it's taking me a while to work out what [08:22] jam: Sorry, I didn't realise you were talking about MAASClient. The cli supports multiple values per key, and so does the server side; it's just MAASClient that needs a little love to get it there. [08:31] allenap: yeah, I got it to work with https://code.launchpad.net/~jameinel/maas/maasclient-multipart/+merge/128176 [08:31] though it only fixes POST, GET uses urlencode() which just strs the list [08:32] allenap: is there a good way to generate a huge number of nodes for testing? Like I want to populate the DB with 1000, 10000, 100,000 nodes. [08:33] jam: There's a way. I'm not sure about a good way. [08:33] allenap: well, pressing 'add node' in the web ui is pretty bad [08:33] I can do it in SQL surgery, but I need to generate mac addresses, etc. for each. [08:33] So it makes me think I should do it in python, but via the API or just direct on the db [08:34] but how hard is it to grab maasserver.model objects and play with them [08:34] I imagine 'settings' needs to be set correctly for me to mutate the db [08:42] jam: How about: make syncdb && make harness; from maasserver.testing.factory import factory; factory.make_node() [08:42] allenap: if make harness will do it, that works for me [08:43] That was the sort of command I was looking for [09:08] w7z: mumble? [10:40] allenap: so what can I do to land the multipart list stuff? I don't actually have a 'mapping' I have an Iterable [10:40] but string is an Iterable [10:40] though apparently getattr(s, '__iter__') fails. [10:42] there's no sane way to do it without isinstance [10:42] jam: If it's not a string type, assume it's iterable of string types, and let things blow up if it's not? [10:42] allenap: well there are layering issues. The part that checks if it is a string type only returns 1 payload to attach, so you need to loop at a higher level. [10:42] or change the lower thing to always return a list of payloads [10:42] or.. [10:43] Blast. [10:43] (It also accepts files) [10:43] which are iterable, and you want to upload the whole file as one payload [10:44] jam: A file isn't a collections.Sequence... but I guess stick with list and put it in the docstring. [10:45] allenap: I could do "if isinstance(x, collections.Sequence) and not isinstance(x, basestring)" [10:47] jam: Yeah. Or change make_payload to gen_payloads, so allowing it to yield multiple things, then all the isSomething checks can be done there. [10:49] jam: as I said on the MP, I don't understand ~jameinel/maas/ignore_results, we have a global celery settings for that. [10:49] rvba: where? (my system at least was improved by nuking the mnesia schema) [10:49] rvba: if you can point me to it, I'll happily reject/revert my patch. [10:49] jam: CELERY_IGNORE_RESULT is set to True in etc/celeryconfig_common.py [10:49] as it could be me killing things repeatedly. [10:50] rvba: so right now, I have to install maas-dns in order to do 'make run' is that true? [10:50] if I don't the service fails to startup properly [10:50] jam: Something like http://paste.ubuntu.com/1261655/ [10:50] and the only log I see is the 'you should install maas-dns' [10:51] * allenap looks forward to "yield from" [10:51] jam: no, a local named instance is started when you run 'make run. [10:51] ' [10:51] rvba: well, I can't get 'make run' to talk to me, the last error in the terminal is the 'you should install maas-dns on the region controller' [10:52] I had this problem for quite a while this week. [10:52] until it magically fixed itself [10:52] jam: that message is for the packaged version, I did not bother changing that message in the dev instance. [10:52] (my best guess at this point, is I got enough queues laying around that rabbit would say 'come back later' when asked to update dns, so it wasn't crashing the startup process.) [11:13] allenap: I switched to the 'yield' form, care to approve? [11:13] jam: Sure. [11:19] jam: I'm seeing the dns error too, I'll investigate. But this is only a task failing, it should not cause trouble for the other services. [11:21] rvba: so I installed maas-dns, which installed maas, I then uninstalled it, but I now have a twistd process runinng. [11:21] jam: +1 :) [11:21] everytime I kill it, it comes back with a new pid [11:21] I'm guessing upstart is keeping it alive? [11:22] ah, maas-txlongpoll [11:22] which didn't get uninstalle [11:22] d [11:22] jam: as I said, you don't have to install maas-dns on a dev instance… also, you'll get into trouble if you install the maas package and try to play with a dev instance at the same time :/ [11:22] Gah, so buildout says it's installing testresources 0.2.5, but then uses the system one, 0.2.4. How completely useless. [11:23] allenap: buildout config as it stands takes system packages over packages it installs. [11:23] I brought it up earlier [11:23] as being useless, too. [11:24] (vs using system packages if there isn't a custom package installed) [11:24] allenap: so you have to uninstall the system package to have buildout get the right one. [11:24] and then re-install it in the future when you want to use the system package for some other project. [11:24] jam: Ah, right, thanks. At least there's a workaround. [11:26] rvba: uninstalling package 'maas' is triggering maas-txlongpoll to *start* [11:26] jam: ! [11:26] (which naturally prevents uninstalling maas) [11:28] jam: I've spotted one problem: looks like the wrong celeryconfig is loaded on a dev instance (by the region worker at least). [11:28] That's why its trying to write to /etc/bind/maas and failing. [11:42] rvba: I'm also getting failures in "_write_temp_file" in provisioningserver/utils.py [11:43] (when I create a node, it triggers writing the dns config again, and I get a failure trying to write the temp file) [11:44] rvba: so it looks like that is the N^2 behavior I was seeing. Every node added is regenerating a full DNS list. [11:44] (it is failing, but it still is pulling together the data which is O(N) and when you allocate N nodes == O(N^2) by the end) [11:48] jam: the DNS writing task should only be triggered with the node gets an IP address. Only then do we need to update the DNS config. [11:48] s/with the node/when the node/ [11:48] rvba: well I'm doing factory.make_node() in a loop [11:48] which probably gives addresses? [11:49] jam: no, there is no lease creation in there as far as I can tell. [11:49] rvba: maybe it knows the dns config didn't get written properly yet? [11:50] it is definitely looping on failing to create it. [11:50] but it may be unrelated to me creating 1k nodes. [11:51] "Sending due task 'upload_dhcp_leases'" is in the log file an awful lot. [11:51] That's a celerybeat task that is run every minute. [11:58] rvba: I'm seeing it roughly every few seconds [11:58] however, 'make_node' also creates a new nodegroup if you don't pass one in. [11:59] jam: ah! right, that's what triggering all the DNS stuff. [12:00] jam: because each nodegroup gets created with a configured interface IIRC. [12:13] jam: I've just fixed the "wrong celeryconfig" problem. A stupid mistake in the startup script. [12:13] rvba: \o/ [12:13] so creating 1000 nodegroups takes about 15min, but creating 1000 nodes is <1min. [12:13] that seems reasonable. [12:14] jam: The plan is to remove the DNS config pre-population. But post 12.10 release. [12:15] rvba: sure, but you won't be creating 100,000 nodegroups in the immediate term, so 15min for 1000 seems fine. [12:16] jam: right. [13:20] rvba: Do you know if the [/+] needs to be on both sides for the [/+]no[/+] bug to manifest, or either side? [13:21] allenap: from Julian's investigation (and the mailing list message we found), I think it's on both sides. [13:22] rvba: That's one of the weirdest bugs I've ever heard of. [13:22] allenap: yeah :/. [13:24] rvba: Do you know anything about the code that submits commissioning results (op=signal)? [13:25] allenap: not really, maybe I help you with something still? [13:25] I can* [13:31] rvba: I'm fixing the code that sets the power parameters. Now, it was broken: it was checking for power_type.upper() in map_enum(POWER_TYPES), i.e. checking that the value of power_type given over the wire matched the Python name of the enum item. [13:33] allenap: what's wrong exactly? [13:34] rvba: Well, that works for IPMI, because the Python name == uppercase(enum value). [13:34] rvba: But not for the others. [13:35] rvba: I mean, we should be expecting the enum *values* over the wire, right? [13:36] allenap: right! [13:48] rvba: Okay, I'm not mad then. The reason I want to ask someone about it is to avoid breaking scripts somewhere else. tl;dr this is what tdd looks like without t. [13:49] allenap: haha :). I think Julian mentioned IDD recently. === dpb_ is now known as Guest72485 [14:02] allenap: hm,in his comment in the code, Julian said "if '/' or '+' appear either side of the word 'no'" [14:04] allenap: rarg, I've tested it and it's either side indeed, not both sides. [14:05] rvba: Gah. Top marks for you though, for confirming. === matsubara-afk is now known as matsubara [14:10] mgz: any progress on search? [14:11] jam: getting there [14:13] rvba: Do you have any time this afternoon to review the extra changes I had to make to https://code.launchpad.net/~allenap/maas/anon-power-setting/+merge/128127? [14:13] allenap: sure, I'll do it right now. [14:13] rvba: Thanks. [14:15] rvba howdy¡¡ i pushed the branch and selected u as reviewer [14:15] did you see it¿ [14:15] ? [14:15] roaksoax: I did :) [14:15] rvba cool then :) [14:15] thanks [14:16] roaksoax: I'll get to it in ~30 minutes. [14:16] rvba awesome thanks [14:17] allenap thanks for taking care of the power settings for enlistment [14:20] mgz: so I just did a quick 'wrap process_node_tags in lsprof' and the results are: out of 107s under profiling, 7.8s is parsing and processing the XML, 67s is downloading the hardware details, 32s is uploading the system_ids results, and 2.3s is getting the system_ids list. [14:20] I think the MAASClient handling of repeated tags is a bit slow (I think it encodes each value as a separate multipart message section. [14:20] uploading 12,000 values is a lot, but not 32s a lot, given that we can read that many tags in 2.3s. [14:21] roaksoax: No worries. It hasn't landed yet... when I picked up that rock I found some bugs :) [14:22] so, it is the passing data around overhead... still seems very high [14:22] And then getting the hardware details is *way* too expensive as well, given if it was a raw DB query, we could get the results in... checking. [14:23] I get the first result here in about 10s. [14:23] 32s to get everything. [14:23] mgz: 32s just to get the content out seems a bit too expensive as well. [14:24] time xpath_exists is still 6.8s, time in lxml on the current code is only 7.8s. [14:25] but "select hardware_details from maasserver_node" > /dev/null is 32s. [14:25] mgz: Is there a lot of quoting that Postgres is having to do? [14:26] ...no, but maybe it's re-serialising? it shouldn't need to, but the 9.1 support does seem a little unpolished. [14:28] mgz: time to select from an xml column into a new text table is 4.1s. [14:28] there might be a magic cast that will help? [14:28] what does adding ::text do if anything? [14:29] mgz: time select content from alt_hardware_details >/dev/null => 32s. [14:29] gr. [14:29] might be sane to just ditch some of the db changes and pull from the original location :) [14:31] mgz: interestingly, using 'bytea' instead of text is *slower* [14:31] 1m24s [14:33] heh, I think that's probably django [14:36] mgz: this is in psql [14:36] no django involved [14:36] psql -c "select ..." [14:36] ...that's very suprising then [14:36] mgz: so something very strange when 'select DATA from...' is slower than 'select xpath_exists(DATA)' on the same content. [14:37] mgz: if I alter column set storage plain, it fails because it can't fit the 24kB documents in an 8kb page. [14:37] jam: it would make sense were it storing some custom data structure optimised for querying [14:37] that's not actually the case though as I understand it, but it's probably doing more work than it needs to for serialisation [14:37] mgz: maybe, but we've confirmed that lxml can take raw text and parse it with an XPath object in ~the same time. [14:38] allenap: so the metadata server now imports the method from the maas API? [14:38] I think the big cost is postgres turning the data into an SQL result. [14:39] roaksoax: Yeah, and the implementation has changed quite a lot. [14:40] mgz: select substring(content from 24000 for 100); is 3.7s [14:40] so it is all about the number of bytes coming out of psql [14:42] mgz: -A flag [14:42] mgz: time psql -A -c "select content" >/dev/null is 1.5s [14:42] mgz: --no-align :) [14:42] :) [14:42] so psql is iterating all the data, caching it, figuring out how to align it, before outputting it. [14:43] roaksoax: can we remove maas-provision [14:44] allenap: I'd like to check my sanity… can you run 'make lint' on trunk? [14:44] mgz: and potentially psql is operating in 'fixed max mem' mode, and so is spooling to disk to do that work. [14:45] allenap: I'm getting: src/apiclient/multipart.py:74: undefined name 'make_payload' [14:45] but yeah, 1.6s if not aligning, and piping to 'wc -l -c' (if you leave in -w then wc slows things down to 7s looking for word chunks) [14:45] rvba: Quite a bit of lint. [14:46] rvba: And I see that too. [14:46] rvba: I'm free to clean that up. [14:47] allenap: cool. Make you can clean the regexp in omshell while you're at it. I know you like regular expressions :) [14:47] rvba: As in, the before-or-after thing, or just pretty it up? [14:47] mgz: ts = time(); [node.hardware_details for node in Node.objects.all()]; td = time() [14:47] is 6.8s. [14:47] allenap: the before-or-after thing :) [14:47] Which is a fair amount slower than 1.6, but nothing like the 60s we see [14:48] allenap: I think it just needs a conditional expression based on a backref… but you know that better than me :) [14:49] rvba: Can we check for ([/+]no|no[/+)? [14:49] mgz: get_hardware_details is spending 67s total, 18s is reading the response and json.loads it. However, 48.7s is 'post' [14:49] mgz: so I think... MAASClient.post needs some serious poking. [14:49] Daviey: from the archives? [14:49] Daviey: i'd say we can [14:49] and we should [14:50] smoser: are you planning to SRU bug 978127? [14:50] Launchpad bug 978127 in cloud-init (Ubuntu) "incorrect time on node causes failed oauth" [High,Fix released] https://launchpad.net/bugs/978127 [14:50] (it is possible that some of the slowdown is the server processing our request, but 24s in dispatch to upload, 58s in 'encode_multipart' [14:51] allenap: ^^ I'm guessing encoding wasn't tuned for handling 100,000 records, right? [14:52] rbasak, i suppose you need it after ephemeral also. [14:52] why can't you just get an architecture that doesn't suck, rbasak ? [14:52] smoser: that's the reason I ask, yes. Right now we're still defaulting to a precise ephemeral, and I keep needing to fix the RTC [14:52] precise ephemeral should be fixed [14:53] smoser: can you add a bug task for precise, please? I don't ahve permission [14:53] if you're still seeing an issue, then the problem is not solved correctly. [14:55] smoser: seeing an issue out of today's maas daily [14:55] rbasak, can you show me? [14:55] i basically tested this. [14:55] by having cloud-init's upstart job first break the clock [14:55] smoser: half an hour please? I just worked around it for a juju test and d-i just started [14:56] sure. [14:56] jam: Ha, wow. [14:58] roaksoax: right, i will remove it from the archive... but i want to make sure you have an upgrade path [14:59] roaksoax: can you raise a bug requesting removal as it is deprecated etc.. and i'll process it [14:59] hm, I should make fail an alias for tail -f, that was a funny tyop [15:00] Daviey: right, so maas Conflicts/Replaces on maas-provision, which obviously causes it to be removed. However, it not being in the archives would make the transition smoother, wouldn't it? because the packages would simply be completely removed [15:00] allenap: I think so yeah. All I know about this problem is summarized right here ;) : http://paste.ubuntu.com/1262047/ [15:01] allenap: apparently 30s of the 80s is in 'set_type' [15:02] jam: That's... weird. Maybe it's recoding things at that point. [15:02] allenap: well, it is under lsprof, so some things are more expensive than they are in 'real life'. I'm doing a quick test of casting the strings to bytes rather than unicode [15:02] see if that has a big difference [15:03] since bytes payloads don't have set_type called [15:03] allenap: I've got a tiny branch up for review if you have time: https://code.launchpad.net/~rvb/maas/big-networks/+merge/128269 [15:03] Trunk seems to be failing tests and have lint in it. :( [15:03] jtv: I'm fixing those now. [15:03] Ah [15:04] roaksoax: did you put your branch up for review? I don't see it in the "active code reviews" list. [15:04] allenap: interestingly, rabbit can take ~1 minute from the time I put something in the queue, before it actually triggers on the provisioning worker [15:04] rvba: I change the status to work in progress [15:05] roaksoax: ok, got it. [15:05] jam: That's not good. [15:06] allenap: (the background is, we can run xpath_exists() in the database taking ~7s to run, or we can dump the raw xml out in about 1.5s, but it takes 42s to read the data via APIS and upload the results back) [15:06] so a 6x overhead [15:06] and we only spend 7.8s in etree.XPATH() [15:08] allenap: as for what causes the rabbit slowdown, I'm not sure. It looks a little like it is recoverying from trying to write the dns config or something. [15:08] (waiting for celery-beat?) [15:11] rbasak, is there anything you're aware of that you'd like in SRU not on https://bugs.launchpad.net/ubuntu/precise/+source/cloud-init [15:13] okay, sorted apart from what to do with InvalidConstraint... does the view somehow need to catch it and do something clever... atm you just get a "we broke" error page which is not acceptable for a typo [15:15] roaksoax: do you have everything you need for IPMI in enlistment? [15:15] flacoste: yes, just waiting for it to land [15:16] roaksoax: right, ok [15:17] thanks :) [15:17] roaksoax: state of bug 1052056 ? [15:17] Launchpad bug 1052056 in freeipmi (Ubuntu) "[FFe] [MIR] freeipmi" [Undecided,In progress] https://launchpad.net/bugs/1052056 [15:19] Daviey: need to address the unused variable compiler warnings, but haven't really have the time to investigate how to do it since I have been pretty much swamped with the other stuff [15:19] ok, thanks [15:19] roaksoax: do you have your latest debdiff? [15:20] Daviey: yes, hold on [15:20] thanks [15:21] Daviey: http://paste.ubuntu.com/1262079/ [15:21] thanks [15:22] allenap: well, lsprof says that set_type() spends its time calling get_params set_param, __delitem__ [15:22] which appears to have to encode/decode the params [15:27] however, it is an lsprof-ism, real-world time is 42s => 41.5s using 'bytes' instead of 'unicode', but lsprof time is 109 vs 85s [15:27] so... ignore that one. [15:34] roaksoax: http://paste.ubuntu.com/1262102/. One question: why use the absolute path '/usr/sbin/ipmi-chassis-config' instead of 'ipmi-chassis-config'? I thought Scott said it was a bad thing to do… [15:34] rvba: following what was done before me [15:34] Fair enough :) [15:35] rvba: however, i think usr/sbin is not in the path, and hence there was a problem executing the scripts, I don't know if yourecall having us discuss something like that [15:35] roaksoax: yeah, it rings a bell. [15:36] smoser: I'm not so bothered about bug 1028501 any more, since MAAS doesn't need it to work any more [15:36] Launchpad bug 1028501 in cloud-init (Ubuntu Precise) "cloud-init selects wrong mirrors for arm" [High,Triaged] https://launchpad.net/bugs/1028501 [15:37] smoser: I think that means that bug 978127 is the only one I care about in cloud-init for precise. Though I'm worried that I'm missing something else. [15:37] Launchpad bug 978127 in cloud-init (Ubuntu Precise) "incorrect time on node causes failed oauth" [High,Triaged] https://launchpad.net/bugs/978127 [15:37] rvba: btw.. celeryconfig.py and celeryconfig_cluster.py are not meant to be modified by the user right? [15:38] roaksoax: no. Only maas_local_celeryconfig_cluster.py and maas_local_celeryconfig.py should be modified by the user. [15:38] rvba: not even celeryconfig_common right? [15:38] roaksoax: no. [15:45] rbasak, did your install go? can i see failed oauth now? [16:05] smoser: having trouble getting to the point where I can reproduce. I set the clock back to 1970, and now it can't add my PPA for maas-enlist to work around the SRU not being in yet [16:05] Changing the hardware clock is a little bit tedious [16:06] I have to install some usable OS first, since there's no "recovery disk" [16:06] boot the ephemeral image. [16:06] silly. [16:06] ...which I can't log in to [16:06] http://bazaar.launchpad.net/~smoser/+junk/backdoor-image/view/head:/backdoor-image [16:06] and yes, there are ways round it [16:06] use that to add a user 'backdoor'. [16:06] The easiest way round it is to install a usable OS [16:07] i think that script is easier. [16:07] personally [16:07] ./backdoor-image /var/lib/ephemeral/......./disk.img [16:07] then ssh in [16:07] or login on console. [16:08] What about disabling poweroff? [16:08] Another step to do [16:08] this is true. [16:09] maas needs a "rescue" environment. [16:09] I'm going to make an armhf recovery initrd when I get round to it [16:09] cirros is probably really close [16:09] to what you ened there. [16:09] and if you ever do "get round to it" i'd rather you fix cirros to work for you. [16:11] I was going to base it on ubuntu core [16:11] Which I think might Just Work out of an initrd [16:11] Need to test it though [16:13] well, fo ryou particular use case here, you can probaly just boot the existing initramfs and kernel [16:13] and pass 'break' on the cmdline [16:13] if you have console access. [16:13] if not then you need ssh or the like. [16:19] roaksoax: are you planning on landing a fix for the ipmi_si module parameters today? If not I need to file a bug. [16:19] rbasak: that's actually what i'm doing now [16:19] roaksoax: OK no problem. I'll leave it then. [16:30] smoser: OK, reproduced finally [16:30] smoser: now to get you in [16:30] smoser: which I presume will involve running your backdoor script ;-) [16:40] rvba: updated the branch, ready for final review. Thanks a lot for working on it [16:46] roaksoax: np. Branch approved. [17:07] allenap: are you gonna make changes to the anonymous power settings for enlistment or will you land it? [17:11] rvba: we don't want users to modify the maas-http.conf either right? [17:12] roaksoax: well, if they want to serve the site over HTTPS they will have to change maas-http.conf. [17:13] rvba: ok, so i'll leave it as is [17:13] rvba: btw.. https://jenkins.qa.ubuntu.com/job/maas-merger-quantal/127/console [17:13] Yeah, I've seen. It seems to be related to the changes jam checked in. [17:14] boomer [17:14] But it's really just a problem in the tests I think. [17:14] yeah but that will be holding off all MP's [17:14] Like… the tests were simply not updated. [17:15] Good point. [17:15] hm, I wonder how I could even land a fix then :) [17:15] I guess I just merge manually. [17:17] rvba: a fix won't make the jenkings job fail, so it would land the branch [17:17] ah right. [17:25] rbasak: [17:26] https://code.launchpad.net/~andreserl/maas/maas_commissioning_modprobe_params/+merge/128294 [17:28] roaksoax: approved, thanks! [17:32] allenap: re bug 1059168 [17:32] Launchpad bug 1059168 in MAAS "MAAS should tell IPMI to PXE once" [High,Triaged] https://launchpad.net/bugs/1059168 [17:33] allenap: basically, each team we tell a machine to turn on, it will tell it to PXE boot [17:33] allenap: we don't have to manually configure the BIOS and tell it to PXE boot [17:33] allenap: it will do it on demand [17:33] allenap: sabdfl's request [17:34] s/team/time [17:34] roaksoax: I'm fixing the build now. The fix should land shortly. [17:34] rvba: awesome thanks [17:48] roaksoax: maas is using squashfs by default now? [17:49] Daviey: for quantal yes [17:50] roaksoax: and juju deploy precise uses old method, right? [17:50] Daviey: yes [17:50] Daviey: there are no squashfs images for precise, are they? if there are we could enable it oo [17:51] there are not [17:51] just wanted to check it worked [17:51] alright :) [17:58] smoser: around? [17:59] here [17:59] smoser: so, matsubara just pinged me about the commissioning stuff for IPMI since he mentioned that the cards lost its static address [17:59] smoser: since we are working with the assumption [18:00] smoser: that IPMI will alwas DHCP by design [18:00] smoser: so i think we need to address the fact that some people don't wan't to DHCP their IPMI cards, and will pre-configure them [18:00] smoser: what do you think? [18:00] matsubara: ^^ [18:01] roaksoax, i thought you were wokring under that assumption. [18:01] i thought if the card had an IP you reported it. [18:01] right? [18:02] smoser: nope not really. The assumption I was told to work bsaed on was if the IUPMI card is set Static addres network source, we should change it to DHCP [18:02] smoser: because it could be pre-configured [18:03] ah. [18:03] well i like your new suggestion better. [18:03] i'd say if it is set to static, and appears to be configured that you should leave it as such. [18:03] ideally such things are exposed in maas configuration [18:04] but... its a bit late for that i think [18:04] smoser: right, so I was thinking on simply passing a parameter such as IPMI_DHCP="yes" in commissioning_user_data [18:04] smoser: and if so, send --use-static regardless of what it is [18:05] to the command [18:06] rvba: another build error :( https://jenkins.qa.ubuntu.com/job/maas-merger-quantal/129/console [18:06] roaksoax: I've seen. Looks spurious to me. [18:07] roaksoax, that'd be helpful. I'm setting up dhcp in the lab ipmi network, but it'd be easier to just use the static ip already configured [18:08] rvba: so I'm pretty sure I did land the change for '?op=' but I'm also surprised that it would have landed if tests would then be failing. I can land a fix if you haven't already [18:09] jam: doing it right now: https://code.launchpad.net/~rvb/maas/fix-broken-build/+merge/128297 [18:12] smoser: something like this: http://paste.ubuntu.com/1262416/ [18:16] smoser: or better yet, enable it by default, so if the card is set to static, just use that IP address [18:16] it looks reasonable. [18:16] alright, i'll get that done then [18:17] maybe IPMI_CHANGE_STATIC_TO_DHCP=false [18:17] you did put the ipmi header there, but the name 'use_static' could mean so many things. [18:19] smoser: right :). Will do change that for something more appropriate [18:28] roaksoax: the fix has finally landed! [18:29] rvba: awesome! [18:52] allenap, is this known: [18:53] http://paste.ubuntu.com/1262473/ [18:55] http://paste.ubuntu.com/1262477/ [19:02] it was known. fixed in 1185 [19:09] matsubara: could you please configure one of your IPMI cards to static, and apply this patch to the commissioning_user_data conffile http://paste.ubuntu.com/1262508/ [19:09] matsubara: and re-enlist, and re-commission and see if it works (leaves it as DHCP and obtains the right address) [19:12] roaksoax, sure, using the latest package: maas-0.1+bzr1170+dfsg-0+1192+117~ppa0~quantal1, I take? [19:14] roaksoax, anyone know anythign about "failed to upload?" [19:15] well, never mind. looks like we have one. [19:21] smoser: huh? failed to upload where? [19:21] matsubara: yeah [19:23] ok, waiting it to be published [19:24] matsubara: ok, it doens't really matter what version of maas as long as it is greater than bzr1170 [19:24] matsubara: so you can just patch the file [19:24] ah ok [19:32] smoser: for enlistment, do we have a metadata file we can pass same as commissioning? [19:33] user data for enlistment? [19:33] contrib/preseeds_v2/enlist_userdata [19:38] smoser: ah right lol, but we are pretty much going to use the same script for enlistment, commissioning [19:38] smoser: so it should probably live in a common place [19:39] smoser: any thoughts? maybe in a package? [19:40] cause i was just thinking on shipping it with maas-enlist [19:41] roaksoax, thats not a bad idea. [19:43] smoser: i'll make another ipmi package [19:43] err [19:43] another binary package for maas-enlist [19:43] ? [19:43] smoser: maas-commissioning [19:43] why separate? [19:43] due to dependencies? [19:43] smoser: becuase maas-enlist pulls curl, archdetect-deb, libavahi-core7, libavahi-common3 [19:43] yeah [19:43] good enough. [19:45] roaksoax, enlisting now with your patch [19:46] matsubara: alright, enlistment wont be affected, only commissioning, but we need to enlist first so that commissioning changes take effect [19:46] roaksoax, of the 4 nodes I booted, 3 are static and 1 is dhcp [19:46] smoser: oh btw... if the node is enlisted, and I make changes to commissionining-user-data they don't take effectd [19:46] matsubara: ok cool [19:46] matsubara: that should test it well [19:47] smoser: i have to re-enlist [19:47] roaksoax, relaly? [19:47] that is really bad. [19:47] smoser: yep [19:47] please open a bug for that. [19:54] roaksoax, ok. the nodes are declared [19:54] matsubara: ok, now commission them please [19:54] roaksoax, now I enter the ipmi configuration normally and commission? [19:55] matsubara: no [19:55] matsubara: don't enter IPMI, let it commission [19:55] matsubara: and see if the IPMI gets set [19:55] and not changed to DHCP [19:55] matsubara: so power them on manually once you've accepted&commission [19:56] ok. they're on [19:58] matsubara: did you ifle a bug for the juju issue with the CPU count? [19:58] matsubara: do you know if it's been fixed? [20:01] yes, i filed [20:01] didn't test yet but i think so [20:01] matsubara: what's the bug number [20:02] bug 1061286 [20:02] Launchpad bug 1061286 in juju "juju bootstrap returned ERROR Invalid 'cpu_count' constraint '1.0'" [Medium,Fix committed] https://launchpad.net/bugs/1061286 [20:02] roaksoax, ok, nodes are ready [20:02] and no IPMI config was set [20:03] matsubara: jum [20:03] matsubara: check in the commissioning-user-data that modprobe ipmi_si has arguments [20:04] modprobe ipmi_si type=kcs ports=0xca2 [20:04] smoser: my bad [20:04] smoser: it does work [20:04] smoser: as expected [20:04] i wonder why i thouhght it didnt update [20:53] hey all [20:53] http://bazaar.launchpad.net/~smoser/maas/maas-pkg-test/view/head:/maas-ephemeral-test-quantal.txt [20:54] was mostly functional walk through of ppa test for me [20:54] i now do not have to touch the UI by hand. [20:54] whoowhoo [20:56] smoser: what do you think? http://paste.ubuntu.com/1262708/ [20:59] that looks reasonable to me, roaksoax [21:01] i'm out. [21:01] smoser: hold on [21:01] :/) [21:01] k [21:01] smoser: give me one min [21:01] holding [21:01] for you to approave [21:02] smoser: https://code.launchpad.net/~andreserl/maas/commissioning_improvements/+merge/128318 please :) [21:02] did you test it? [21:03] smoser: yes [21:04] you asked for review of "launchpad code reviewers" ? [21:04] what is doing that. [21:05] smoser: it does that automatically [21:05] smoser: julian (i think) changed it that way [21:07] well that is busted. [21:07] anyway. [21:07] i'm out. [21:07] and i revierwed. [21:07] smoser: awesome, thank you. have a good weekedn [21:37] jtv: if you are making changes to tx-tftp, please make htem in the ubuntu package in archive sa well