[05:15] <jam>  /wave
[05:15] <bigjools> jam: so we have someone who can't get constraints working on precise
[05:15] <bigjools> they are using pyjuju with latest maas
[05:16] <bigjools> apparently not even mem= works
[05:16] <bigjools> and given that I realised I don't know much about the work you did I figured I'd better catch up :)
[05:17] <bigjools> so jtv was telling me that he thinks there's a job that runs after every tag change on a node, is that right?
[05:18] <jtv> Not _quite_ what I said.  :)
[05:18] <jtv> A job that _needs_ to run before new tags can be taken into use.
[05:18] <bigjools> meh
[05:19] <jtv> It matters because I'm not sure it wasn't a manual step.
[05:19] <bigjools> ok
[05:20] <jam> bigjools: if you do a change on tags, it fires off a job that gets run by all of the cluster workers
[05:20] <bigjools> jam: I thought you ran the tag jobs on the region worker?
[05:20] <jam> to evaluate the hardware characteristics.
[05:20] <jam> bigjools: my MaaS terminology is a bit out of date, I imagine.
[05:20] <jam> There was 1 central place, and N subset workers, right?
[05:20] <bigjools> yes
[05:21] <jam> this is done on the N
[05:21] <bigjools> 1 region, many clusters
[05:21] <bigjools> ah ok
[05:21] <bigjools> so was anything you added done in the region?~
[05:21] <jam> so that when you have 200,000 machines, each cluster only does 4k (or whatever) hardware jobs.
[05:21] <bigjools> indeed
[05:21] <jam> bigjools: I believe when adding a single machine it is evaluated by the region against all existing tags.
[05:22] <bigjools> ok
[05:22] <jam> however, if mem= isn't working, my guess is that our rules for pulling memory out of lshw characteristics is failing.
[05:22] <jam> we had that problem at least 1 time before
[05:22] <jam> on arm, I think
[05:22] <bigjools> so what is the best way of debugging this:
[05:23] <bigjools> someone is defining a new tag to match certain machines. querying the tag in the api returns the machines correctly, but juju deploy with the "maas-tag" constraint doesn't work
[05:23] <bigjools> but then they also said the mem= tag was not matching correctly, so something is really hosed
[05:24] <bigjools> see https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1214172
[05:25] <jam> bigjools: so from what I saw with "maas-cli api tags nodes ram512" it looks like the tags themselves are getting applied to nodes.
[05:26] <bigjools> yep
[05:26] <bigjools> when they deploy with mem=34359738368
[05:26] <jam> so I can still see "mem=" being broken for other reasons (digging into the code now to remind myself where things are defined)
[05:26] <bigjools> juju shows mem=34359738368.0
[05:26] <bigjools> which is suspicious
[05:27] <bigjools> (that's from get-constraints)
[05:29]  * bigjools considers running tcpdump to see the actual requests
[05:30] <kurt_> bigjools: while working with juju today, we discovered the squid proxy on the cluster controller is missing one of the repositories.  Is that a known problem?
[05:30] <kurt_> roaksoax figured it out
[05:31] <bigjools> the clusters don't have proxies, only the region IIRC
[05:31] <kurt_> ok, fair enough, let me restate - the region controller :)
[05:31] <bigjools> that's a packaging thing, so it's roaksoax's domain :)
[05:31] <kurt_> i c
[05:31] <kurt_> ok
[05:32]  * bigjools volleys back to roaksoax.  15 love.
[05:32] <bigjools> well put it like this, there's no squid set up in maas upstream
[05:33] <bigjools> did you manage to fix it locally?
[05:33] <kurt_> lol
[05:33] <kurt_> we did, but I do think it needs to be updated in whatever packages for the installer.  403 errors abound
[05:34] <bigjools> might just be the squid-deb-proxy package
[05:34] <bigjools> which release?
[05:34] <kurt_> and I thought it was a skillful punt :D
[05:34] <kurt_> precise
[05:34] <bigjools> artful :)
[05:34] <bigjools> and you're downloading packages for precise or...?
[05:35] <kurt_> actually, I'm my platform in quantal, but the images are precise
[05:35] <jam> bigjools: so the xpath to determine how much memory we have is: http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/view/head:/src/maasserver/models/node.py#L389
[05:37] <jam> bigjools: and for mem=34359738368.0 it shouldn't actually be a problem, given: http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/view/head:/src/maasserver/models/node_constraint_filter.py#L32
[05:37] <jam> we cast to float, and then ciel and cast back to int.
[05:38] <jam> bigjools: so to detect what mem we are setting for the nodes, you could do a DB query, or maybe that is exposed on the Node object (in the views)
[05:39] <jam> bigjools: yeah, the node_view.html template has "node.memory" MB there
[05:39] <jam> hmmm....
[05:39] <jam> looks like we are saving the memory in MB into the DB
[05:39] <jam> so integer MB
[05:39] <jam> so the query would need to be: juju deploy --mem=512 for 512MB of memory?
[05:39] <jam> bigjools: ^^ ?
[05:40] <jam> That doesn't explain why MaaS tag isn't working
[05:40] <jam> though
[05:46] <jtv> jam: I had a faint memory of an upgrade breaking all this once...  Probably of some third-party dependency.  Can't find a bug for it though.
[05:47] <bigjools> jam: sorry distracted by the gardener
[05:48] <bigjools> jam: good point on the mem constraint
[05:48] <jam> bigjools: I don't need to know :)
[05:48] <bigjools> jam: haha :)
[05:49] <bigjools> I have someone in top dressing.  does that make it any muddier? :)
[05:54] <bigjools> jam: so if the lshw's mem output is not getting parsed correctly, that self-tag thing they did wouldn't even work would it?
[05:54] <bigjools> so the parsing is probably ok
[05:55] <jam> bigjools: so the easy check is for them to go to the Node page and see what it says the node has for memory. The only difference I can see between their tag constraint and our mem parser is that we enforce "units=bytes"
[05:56] <jam> so say their LSHW didn't have a units field at all
[05:56] <bigjools> ah
[05:56] <jam> and then some other random stuff about bank, etc. But I think that was just to add more matches, rather than more filtering.
[05:57] <jtv> I don't know how pyjuju does it, but I think juju-core has a back-off path for when no available image can match the constraints.
[05:58] <jtv> Any chance that it might be saying "no way I can get 34359738368 MB of memory, but let's try again and be less picky"?
[05:58] <jtv> (Tying into the difference in memory units there)
[06:00] <jtv> Although if that were the case, I don't suppose it'd affect custom tags...
[06:00] <jam> jtv: I would have thought pyjuju would just refuse to deploy a machine until it finally found one that matched. (aka never)
[06:00] <jam> because it was just picky like that
[06:00] <jam> (hence why we had a lot of old problems with maas-name, IIRC)
[06:01] <jtv> I'm looking it up in the code, just in case.
[06:02] <bigjools> ok
[06:05] <jtv> It does look a lot as if ServiceUnitState.assign_to_unused_machine won't notice if no machines satisfy constraints, and just return the last candidate it looked at.  But that may still go through another filtering pass later.
[06:10] <bigjools> I replied to that email asking if they were using matching units
[06:11] <bigjools> also advised that mem= needs MB!
[06:11] <bigjools> let's see how that works out
[06:11] <bigjools> juju needs to document this better I suspect
[06:11] <bigjools> our docs all round are weak :(
[06:12] <jtv> Yeah.  AFAICT, if you asked pyjuju for an outrageous amount of memory that no machine actually had, it'd just give you an arbitrary available machine.
[06:12] <jtv> (As you might well do if you thought you were specifying memory in bytes but Juju interpreted the number as megabytes)
[06:13] <jtv> It's something I ranted about in a review very recently actually: failure to extract search/filter function.
[06:26] <bigjools> that, err, sucks :/
[06:28] <jtv> Rather.  It's an implicit outcome of the loop, not an explicitly visible possibility in the code.  One of those reasons why I recommend extracting search/filter functions.
[06:29] <jtv> Sometimes people deal with it by writing the "oh I've found it, now do my business and return" part *in the body of the search loop*, which avoids the implicit bad result but makes the code worse in every other way.
[14:33] <roaksoax> rvba: ping
[14:33] <roaksoax> rvba: has maas' apache2 config changed in any way?