[05:44] <jtv> rvba: is the user-data at the end of installation only for the fast-path installer then?
[05:49] <Valduare> hows it going guys
[05:52] <jtv> Hi
[05:54] <Valduare> any news on maas with arm devices
[06:35] <rvba> jtv: the user-data is requested before the f-p installation happens.  Now I'm not sure it happens in the case of d-i.
[06:56] <jtv> rvba: I'm trying it out...
[07:35] <jtv> dimitern: hi there — when could you talk about networking?
[07:36] <dimitern> jtv, hey, how about tomorrow or on friday?
[07:36] <dimitern> jtv, what's a good time for you?
[07:37] <jtv> Anytime before 11:00 UTC.  Except our standup is at 08:30 UTC.
[07:38] <dimitern> jtv, so how about tomorrow @ 10 UTC ?
[07:38] <jtv> Yes, great.
[07:38] <dimitern> jtv, i'll send an invite, cheers
[07:39] <dimitern> i'll invite roaksoax and jam if they want to join
[07:39] <jtv> Sure.
[07:40] <dimitern> jtv, actually, do you mind if we move it 30m earlier - 9:30 UTC, as it will overlap with our standup :)
[07:40] <jtv> Better for me actually!
[07:43] <dimitern> great! invite sent
[07:44] <ramonskie> i have upgraded to trusty on my cluster controller and also to maas 1.5.2 but now i stumbled on to this bug https://bugs.launchpad.net/maas/+bug/1307779
[07:45] <ramonskie> it seems to be fixed in 1.6 but i can't find that package. any idea's?
[07:48] <bigjools> ramonskie: ppa:maas-maintainers/stable
[07:49] <ramonskie> thanks
[08:47] <ramonskie> okay so i upgraded and it finds a image now. only its now stuck on the screen where i see route-info
[09:21] <jtv> rvba: hey, how about I remove that restriction where you need at least 16 bits of netmask on a managed network?  That was for the old generated zone files.
[09:22] <rvba> jtv: yep, we don't need that restriction anymore.
[09:22] <jtv> \o/
[09:22] <jtv> Easy karma.
[09:33] <ramonskie> when enlisting nodes it hangs on the following screen https://dl.dropboxusercontent.com/u/50671970/enlist-hang.jpg
[09:34] <jtv> ramonskie: that part looks OK to me in itself... how long did you watch them hang?
[09:34] <jtv> Because if it failed there, I'd probably expect some error output.
[09:35] <ramonskie> for about 10 minutes now
[09:36] <jtv> If this is all it shows on the console, I'd give it a bit longer.
[09:37] <ramonskie> jtv: thanks i will wait
[09:52] <ramonskie> jtv: how long should i wait....
[09:52] <jtv> ramonskie: any change on the screen?
[09:52] <ramonskie> nope nothing
[09:53] <jtv> Then I guess it's time to go trawling through the logs.
[09:53] <ramonskie> which log files do i need to check..
[09:53] <jtv> I'm looking along.  It's on the maas server, in /var/log/maas.
[09:53] <jtv> To be honest, since we're not seeing any error message, I don't know what to look for in this case.
[09:54] <jtv> Oh, one thing that might also help: shift-PageUp on the node's console might show a bit more history.
[09:54] <jtv> (Might as well keep the node doing whatever it's doing for now, in case it changes its mind)
[09:55] <jtv> First thing to do is a quick scan for obvious errors:
[09:55] <jtv> /var/log/maas/apache2/error.log, /var/log/maas/celery.log, /var/log/maas/maas.log
[09:56] <jtv> If there's an error in there, chances are it'll jump right out at you.
[09:56] <ramonskie> maas.log is empty nothing
[09:56] <jtv> That is really odd.
[09:57] <jtv> Even if there are no requests, it's supposed to have periodic jobs in there.  Unless... which version is this?
[09:57] <jtv> (Roughly — e.g. "the one that came with 14.04")
[09:57] <ramonskie> first i wasn on ubuntu 12.04 with maas 1.4 then upgraded to ubuntu 14.04 and maas 1.5 and now upgraded to 1.6
[09:58] <jtv> OK, that's pretty recent.  Good.
[09:58] <jtv> I'm not sure if you'll have /var/log/maas/apache2 then; but it's just a symlink to /var/log/apache2.
[09:59] <ramonskie> in selery is see some dhcp lease errors
[09:59] <jtv> Oh?
[09:59] <jtv> Can you paste one?
[09:59] <ramonskie> ERROR/MainProcess] Task provisioningserver.tasks.upload_dhcp_leases[e19c4353-92a2-499e-8f95-154b3b017950] raised unexpected: IOError()
[09:59] <ramonskie> also need the trace?
[10:00] <jtv> That'd be nice, thanks.  Maybe use paste.ubuntu.com.
[10:01] <ramonskie> wait let me clean all the logs and restart the controller server and start all over
[10:02] <ramonskie> in case its non related stuff
[10:02] <jtv> OK
[10:03] <jtv> Thanks for the review rvba.
[10:05] <rvba> jtv: np.  I just put up for review https://code.launchpad.net/~rvb/maas/revert-2872/+merge/233191.  A run in the CI shows that it fixes the problem introduced recently.
[10:05]  * jtv looks
[10:06] <ramonskie> jtv: these are the errors after a restart in celery.log http://pastebin.com/ayRxD5v0
[10:06] <jtv> Thanks.
[10:06] <jtv> rvba: done.  :)
[10:06] <rvba> Ta
[10:08] <jtv> ramonskie: scant consolation but this is code that's already removed from the dev version.  :)
[10:08] <jtv> Now, what seems to be going wrong is that the cluster controller is having trouble talking to the region-controller API.
[10:09] <ramonskie> its on the same server
[10:09] <ramonskie> i only have one
[10:09] <jtv> Yeah, it should be easy, shouldn't it?
[10:09] <ramonskie> :P
[10:09] <jtv> You may want to check that your DEFAULT_MAAS_URL is configured sensibly.
[10:10] <jtv> That's in several places.  Just grep /etc/maas for it — but as root, or you won't be able to read some of those files.
[10:10] <jtv> (There's credentials in some of them.)
[10:10] <ramonskie> ./maas_local_settings.py:DEFAULT_MAAS_URL = "http://172.21.42.1/MAAS" ./maas_local_settings.py.dpkg-dist:DEFAULT_MAAS_URL = "http://maas.internal.example.com/"
[10:11] <jtv> That looks sensible — assuming 172.21.42.1 is indeed your server's IP address, and the nodes will be able to reach it.
[10:12] <jtv> You could have a look to see if that same request shows up in the Apache error log, in case it did get through to the server.
[10:12] <jtv> (Or maybe even the Apache access log — but I doubt that)
[10:13] <jtv> Oh!  Just in case, you may want to search the Apache access log for "/MAAS/MAAS"
[10:14] <ramonskie> nope nothing found
[10:15] <ramonskie> only thing i see in the apache error log is this : No such file or directory: mod_wsgi (pid=2708): Unable to change working directory to '/home/maas'
[10:15] <ramonskie> but should not be the problem
[10:15] <jtv> No, shouldn't be.
[10:16] <jtv> It's dumb, but maybe you could just try making a wget request to http://172.21.42.1/MAAS from the server itself, just to make sure that gets through?
[10:16] <ramonskie> strange thing is i see now that i have 2 cluster controllers in clusters
[10:16] <jtv> Two clusters?  That's interesting.
[10:17] <jtv> When it wakes up, the cluster registers itself with the region controller, and then just keeps polling for the region controller to say "sure, yeah, you're accepted."
[10:17] <ramonskie> yeah i think upgrade
[10:17] <jtv> The region controller should identify them by UUID.
[10:18] <ramonskie> i think its a upgrade quirk that happend in 1.5
[10:18] <ramonskie> but i gave that another set op ips and dns zone
[10:18] <jtv> New to me...  I guess they have different UUIDs?  I guess one is "master"?
[10:18] <ramonskie> yes one is master and one is called maas
[10:18] <ramonskie> in dns zone
[10:18] <jtv> If they're both running on the same server, that spells trouble.
[10:19] <jtv> Because they control a DHCP server, a DNS server, iSCSI, a TFTP server, and so on.
[10:19] <ramonskie> ahh well that explains a lot
[10:20] <jtv> I'm still not sure _how_ it would cause the failure in the log, but it's definitely closer to the source.
[10:20] <ramonskie> the only problem is if i delete the newly created cluster it pops backup in pending state
[10:20] <jtv> Yeah.
[10:20] <ramonskie> and the other cluster have still a set of working nodes in them
[10:21] <jtv> You could try stopping the cluster controller, deleting the new cluster, and then in the UI updating the old one to look like the new one.
[10:21] <jtv> Mind you, they'll still have different UUIDs...  so that may not be good enough.
[10:21] <jtv> I think this'll require some database surgery.
[10:21] <ramonskie> they have different uuids
[10:21] <jtv> Yeah.
[10:22] <jtv> So the upgrade generated a new one instead of reusing the old one.
[10:22] <ramonskie> also the old cluster has only 6 boot-images and the new one 126
[10:22] <jtv> Yeah, a lot has changed there.
[10:22] <ramonskie> so can i move the nodes from the old cluster to the new one
[10:23] <jtv> Let's start by getting a good view of the situation... if you grep /etc/maas for "UUID", do you get consistent UUIDs from the various config files?
[10:23] <ramonskie> and then delete the old cluster
[10:23] <jtv> The only way to move nodes is to delete them from one cluster and re-enlist them into the other.
[10:23] <jtv> If that is not a problem, then I think it's the easiest way out.
[10:23] <jtv> But it means that anything you've got running on those nodes is lost.
[10:24] <ramonskie> maas_local_celeryconfig_cluster.py:CLUSTER_UUID = '3d245a63-2b23-42be-8977-f36cb2218b9e'
[10:24] <ramonskie> and thats the new one
[10:24] <ramonskie> yeah i can't delete thos nodes openstack is running on it with alot of vms :(
[10:24] <jtv> Blast.
[10:25] <ramonskie> otherwise i would already have done a clean install :)
[10:25] <jtv> Well, it's going to get tricky at any rate.  First let me have a look for known bugs.
[10:26] <ramonskie> okay
[10:26] <jtv> Meanwhile, could you check that the cluster UUID in /etc/maas/maas_cluster.conf is consistent with the one you found in maas_local_celeryconfig_cluster.py?
[10:27] <ramonskie> yes they are the same
[10:27] <jtv> OK
[10:27] <ramonskie> both the new accepted cluster
[10:28] <ramonskie> can i disabele the other cluster but still let dns work
[10:28] <jtv> ramonskie: safest thing to try I guess would be to set them to the old cluster.  But... first a look for known bugs.
[10:28] <ramonskie> that should solve it
[10:28] <jtv> Well, plus a restart.  :)  And then you'd have to delete the new one.
[10:29] <ramonskie> okay but if i set it to the old cluster will also the new boot images be added?
[10:31] <jtv> Should be, yes.  Because AFAICT the two actually share everything except a process.
[10:33] <jtv> It's the same files on disc, etc.  It may take a few minutes for the remaining cluster controller to inform the region controller of what it has.
[10:33] <ramonskie> you already checked known bugs?
[10:33] <ramonskie> so should i try this?
[10:34] <jtv> I checked known unfixed bugs.  Let me make one more round for ones that may have been fixed later.
[10:38] <jtv> ramonskie: I guess it's not bug 1344089, and that's the best candidate I found.
[10:38] <jtv> (I realise you hit your problem with an earlier version)
[10:39] <ramonskie> i'am on 1.6 now
[10:39] <jtv> Anyway, assuming that's not it, we'll have to make the change.  I'd stop the cluster controllers first.
[10:40] <jtv> Then update the UUID entries in the config, to use your original cluster's UUID.
[10:41] <jtv> I'd also set the cluster interfaces to Unmanaged, just so you can re-enable the right one later.
[10:42] <ramonskie> whats the best and savest way to stop the cluster controller
[10:42] <jtv> Then restart, accept the right cluster controller if needed (it may be automatic), enable the right cluster interface, and see if that fixes things.
[10:42] <jtv> sudo service maas-cluster-celery stop
[10:42] <jtv> sudo service maas-pserv stop
[10:42] <jtv> Then I'd run a “ps -ef | grep maas” to check for lingering processes.
[10:44] <ramonskie> wow there is still a lot running
[10:44] <jtv> Yeah it's not a small thing.
[10:45] <ramonskie> several of these: /usr/bin/python /usr/bin/celeryd --logfile=/var/log/maas/celery-region.log --schedule=/var/lib/maas/celerybeat-region-schedule --loglevel=INFO --beat --queues=celery,master
[10:45] <ramonskie> should i kill them?
[10:45] <jtv> No, those are the region controller's celery.
[10:45] <ramonskie> and there should be 10 of them?
[10:46] <jtv> Probably not.
[10:46] <ramonskie> lol
[10:46] <jtv> But I don't know what might cause there to be more...  I do hope you don't have two region controllers as well!
[10:48] <ramonskie> i cerently hope not
[10:48] <ramonskie> the only thing i did was what i thought a simple upgrade
[10:49] <jtv> Yeah.  This clearly shouldn't have happened.
[10:49] <ramonskie> okay edited both files maas_local_celeryconfig_cluster.py and maas_cluster.conf
[10:50] <ramonskie> with the old uuid
[10:50] <jtv> OK.
[10:50] <jtv> And you've set the cluster interfaces to Unmanaged in the UI?
[10:50] <ramonskie> the new cluster?
[10:51] <ramonskie> done for the new cluster
[10:51] <jtv> OK.  I'd do the old one as well.
[10:52] <jtv> (The only drawback is your DHCP server will be down briefly — let's keep it short)
[10:52] <ramonskie> no dhcp entries will be deleted?
[10:53] <jtv> Not as such, though there may be more confusion that will only become clear later.
[10:53] <ramonskie> backed it up just in case
[10:54] <jtv> Good.
[10:55] <jtv> And then we get to restart.  A reboot would be the most comprehensive.
[10:55] <ramonskie> reboot it is
[10:57] <ramonskie> okay rebooted
[10:57] <ramonskie> i removed the new cluster now
[10:59] <ramonskie> do i need to set managed dhcp on again?
[11:00] <ramonskie> what are the best next steps to take?
[11:00] <jtv> First: is the old cluster now Accepted?
[11:01] <jtv> If it is, then yes, re-enable DHCP management (and DNS management I guess — you mentioned using that)
[11:01] <ramonskie> yes the old cluster is accepted and the boot-images are now also 126 instead of 6
[11:02] <jtv> Excellent!
[11:02] <jtv> Want to try that node again?
[11:02] <ramonskie> yup
[11:03] <ramonskie> let me first check if everything is okay
[11:03] <ramonskie> and that not the ipaddress have changed :P
[11:03] <jtv> Yeah.  Anything you can check is a plus at this point.  :)
[11:03] <jtv> If you feel up to it, maybe a fresh look at those logs in /var/log/maas.
[11:05] <ramonskie> okay check seems okay no error for now
[11:05] <ramonskie> will try a node now
[11:06]  * jtv bates breath
[11:08] <ramonskie> whooopppdidoooh
[11:08] <ramonskie> it works
[11:08] <ramonskie> muchos kudos to you!!!
[11:09] <jtv> Phew.
[11:09] <ramonskie> thanks for helping mate
[11:09] <jtv> Glad I could help — and glad it didn't come crashing down on us.  :-)
[11:10] <ramonskie> yes i'm realy glad i don't need to start over. this saved so much work
[11:10] <ramonskie> and finaly the auto discover of ipmi is working :D
[11:10] <jtv> I'll have to go now, but I would really appreciate if you could file a bug about this — especially the part where you upgraded and got two cluster controllers.  That might still be in the packaging somewhere.
[11:12] <ramonskie> okay will do thanks for all the help
[11:12] <ramonskie> where do you want me to fill in the bug?
[11:13] <jtv> https://bugs.launchpad.net/maas
[11:13] <jtv> (You have a Launchpad account ,right?)
[11:14] <ramonskie> yup
[11:14] <ramonskie> okay creating one now
[11:14] <jtv> Thanks.  If we can prevent this from happening to someone else, that's wonderful.
[11:14]  * jtv runs now
[11:14] <jtv> Good night!
[11:22] <ramonskie> i'm out to bye
[11:22] <ramonskie> bug created https://bugs.launchpad.net/maas/+bug/1364903
[11:31] <rvba> blake_r: Hi Blake.  I had to revert 2872.  See https://code.launchpad.net/~rvb/maas/revert-2872/+merge/233191 for details.
[12:10] <rvba> blake_r: Now I'm thinking that revision 2871 also introduced a problem (but a different one): one CI run failed with the nodes failing to get the images they need to boot.  This looks like the problem you diagnosed yesterday and said you were working on.
[12:11] <rvba> blake_r: I'd say this is a race condition as the CI test passed a couple of times.
[12:12] <rvba> blake_r: since you'll be up in less than an hour I'll refrain from reverting this one again.  Let's talk when you come online.
[13:05] <blake_r> rvba: yes it is possible it will pass
[13:05] <blake_r> rvba: the issue is that RPC is used for the API call but not for the image selection when a node is booting
[13:05] <blake_r> rvba: so pxeconfig will fail
[13:06] <blake_r> rvba: I have a branch that fixes pxeconfig
[13:07] <rvba> blake_r: 2871 causes the images not to be present from time to time.  2872 (which I reverted) was causing the node to fail to enlist (see my paste on the revertion MP).
[13:08] <rvba> noeds*
[13:08] <rvba> nodes*
[13:08] <rvba> arg
[13:15] <blake_r> rvba: its not that the images are not present, its that the images are not present in the BootImage model, which is going away
[13:16] <rvba> blake_r: right, what I meant that, as far as the node can see (and this involves the BootImage model), the images are not there.
[13:17] <blake_r> rvba: yes correct, I was going to get that branch ready and land today, looks like I will have to do all of the again, :(
[13:17] <rvba> blake_r: I just reverted 2872 (which was causing failures all the time), not 2871.
[13:18] <blake_r> rvba: okay
[13:18] <blake_r> rvba: will look at the mp in a moment, getting through email this morning
[13:18] <rvba> blake_r: I'm sorry but I was in the middle of a QA and having trunk broken like that means a lot of time wasted for me.
[13:21] <blake_r> rvba: oh I see the reason
[13:21] <blake_r> rvba: yeah its reporting the avaliable architectures wrong, I will work on a fix
[13:21] <rvba> blake_r: cool, ta.
[13:24] <rvba> blake_r: if you can fix the breakage introduced by 2871 first that would be great.  Because 2871 is still checked in.
[13:24] <blake_r> rvba: okay
[13:24] <rvba> Thanks.
[14:25] <newell> I am getting a 500 error when I go to aquire a commissioned node with latest trunk.  Here is the stacktrace: http://paste.ubuntu.com/8223903/
[14:26] <newell> Anyone seen this before?
[14:27] <rvba> newell: let me have a look at this stacktrace…
[14:30] <rvba> newell: looks like a bug in gen_dynamic_ip_addresses_with_host_maps: it should skip the ngi with no static_ip_range_low/hig
[14:30] <rvba> newell: can you file a critical bug about this?
[14:30] <newell> yeah
[14:35] <newell> https://bugs.launchpad.net/maas/+bug/1364993
[14:37] <rvba> newell: I changed the title for this bug.  We try to explain what the problem is in the title/descriptions.  Suggestions on the possible cause or ideas on how to fix it should be put in the comments.
[14:38] <rvba> newell: this helps triaging and lets people come up with alternative solutions.
[14:39] <newell> ha I just changed the title as well before I just read this
[14:39] <newell> wonder where it stands now
[14:39] <newell> https://bugs.launchpad.net/maas/+bug/1364993
[14:39] <newell> Is that better?
[14:40] <rvba> Yep, it describes the problem.
[14:40] <newell> k, we were both in the middle of changing it and my page wasn't refreshed, that is why I didn't see that you had modified it
[14:41] <rvba> I figured :)
[16:24] <newell> rvba, still around?
[16:26] <rvba> newell: yep
[16:26] <newell> rvba, we need a test for that bug or is it trivial enough to just push the change you mentioned?
[16:28] <rvba> newell: as with any non-trivial change, it's worth a test
[16:30] <rvba> newell: now I'm not so sure my solution is the right one as test__treats_undefined_static_range_as_zero_size_network seems to test the case where ngi has not static range.
[16:31] <newell> yeah I was looking at that too
[16:31] <newell> your change doesn't break any tests though
[16:36] <rvba> newell: there is a bug in the test :)
[16:37] <newell> ha
[16:37] <rvba> newell: can you spot it?
[16:37] <newell> let me take a look
[16:43] <rvba> newell: My fix is up for review.  https://code.launchpad.net/~rvb/maas/bug-1364993/+merge/233244.  And I need to step out now. ttyl.
[16:43] <newell> sounds good I will review it, sorry wife was asking me questions and got pulled away
[16:44] <newell> ha, just needed to save it
[17:09] <dpb2> Hi all -- I'm starting a server, but I don't see any log message for the power on attempt in celery log (just periodic dhcp refreshes, etc).  What is up?
[17:09] <dpb2> (this install had been working fine)
[17:11] <dpb2> roaksoax: ^ any ideas?
[17:15] <roaksoax> dpb2: check whether maas-pserv is running
[17:15] <roaksoax> dpb2: are you importing images?
[17:16] <dpb2> roaksoax: all maas services are reported as running
[17:16] <dpb2> roaksoax: let me check on the images
[17:16] <dpb2> roaksoax: actually, I'm not sure how to check that. :)
[17:17] <roaksoax> dpb2: what MAAS version are you using?
[17:17] <dpb2> 1.6.1+bzr2550-0ubuntu1~ppa2
[17:18] <roaksoax> dpb2: uhmmm
[17:18] <roaksoax> blake_r: ^^ any thoughts?
[17:18] <dpb2> roaksoax: I *could* restart all the services, but I didn't want to mask an issue
[17:19] <roaksoax> dpb2: do please restart the issue. 1.7 will completely change in that area
[17:19] <roaksoax> dpb2: because of celery being silly and causing issues like this
[17:19] <roaksoax> dpb2: (we are getting rid of celery)
[17:20] <roaksoax> dpb2: can you see logs?
[17:20] <roaksoax> dpb2: maas.log celery.log
[17:20] <dpb2> hm
[17:20] <dpb2> roaksoax: I'm seeing the logs now
[17:20] <dpb2> (just now)
[17:20] <dpb2> roaksoax: so...
[17:20] <dpb2> roaksoax: if boot images are importing, does that block power up attempts
[17:20] <dpb2> ?
[17:21] <roaksoax> dpb2: yes it can... celery blocks any other jobs if a bigger job is in progress
[17:21] <dpb2> yikes
[17:21] <dpb2> ok
[17:22] <dpb2> well, I think it's working now.  the old "try it again" fixed it
[17:22] <dpb2> thanks
[17:22] <roaksoax> dpb2: np!
[17:39] <Valduare> hows it going guys
[17:39] <Valduare> any news on maas with arm devices?
[17:50] <newell> Valduare, there is currently some arm support (i.e. arm64/armhf etc.)
[17:51] <Valduare> I have a few mk808 devices here that would be fun to be able to spin them up etc