[05:15] /wave [05:15] jam: so we have someone who can't get constraints working on precise [05:15] they are using pyjuju with latest maas [05:16] apparently not even mem= works [05:16] and given that I realised I don't know much about the work you did I figured I'd better catch up :) [05:17] so jtv was telling me that he thinks there's a job that runs after every tag change on a node, is that right? [05:18] Not _quite_ what I said. :) [05:18] A job that _needs_ to run before new tags can be taken into use. [05:18] meh [05:19] It matters because I'm not sure it wasn't a manual step. [05:19] ok [05:20] bigjools: if you do a change on tags, it fires off a job that gets run by all of the cluster workers [05:20] jam: I thought you ran the tag jobs on the region worker? [05:20] to evaluate the hardware characteristics. [05:20] bigjools: my MaaS terminology is a bit out of date, I imagine. [05:20] There was 1 central place, and N subset workers, right? [05:20] yes [05:21] this is done on the N [05:21] 1 region, many clusters [05:21] ah ok [05:21] so was anything you added done in the region?~ [05:21] so that when you have 200,000 machines, each cluster only does 4k (or whatever) hardware jobs. [05:21] indeed [05:21] bigjools: I believe when adding a single machine it is evaluated by the region against all existing tags. [05:22] ok [05:22] however, if mem= isn't working, my guess is that our rules for pulling memory out of lshw characteristics is failing. [05:22] we had that problem at least 1 time before [05:22] on arm, I think [05:22] so what is the best way of debugging this: [05:23] someone is defining a new tag to match certain machines. querying the tag in the api returns the machines correctly, but juju deploy with the "maas-tag" constraint doesn't work [05:23] but then they also said the mem= tag was not matching correctly, so something is really hosed [05:24] see https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1214172 [05:24] Launchpad bug 1214172 in maas (Ubuntu) "juju/MAAS Tag constraints do not work in Precise" [Critical,New] [05:25] bigjools: so from what I saw with "maas-cli api tags nodes ram512" it looks like the tags themselves are getting applied to nodes. [05:26] yep [05:26] when they deploy with mem=34359738368 [05:26] so I can still see "mem=" being broken for other reasons (digging into the code now to remind myself where things are defined) [05:26] juju shows mem=34359738368.0 [05:26] which is suspicious [05:27] (that's from get-constraints) [05:29] * bigjools considers running tcpdump to see the actual requests [05:30] bigjools: while working with juju today, we discovered the squid proxy on the cluster controller is missing one of the repositories. Is that a known problem? [05:30] roaksoax figured it out [05:31] the clusters don't have proxies, only the region IIRC [05:31] ok, fair enough, let me restate - the region controller :) [05:31] that's a packaging thing, so it's roaksoax's domain :) [05:31] i c [05:31] ok [05:32] * bigjools volleys back to roaksoax. 15 love. [05:32] well put it like this, there's no squid set up in maas upstream [05:33] did you manage to fix it locally? [05:33] lol [05:33] we did, but I do think it needs to be updated in whatever packages for the installer. 403 errors abound [05:34] might just be the squid-deb-proxy package [05:34] which release? [05:34] and I thought it was a skillful punt :D [05:34] precise [05:34] artful :) [05:34] and you're downloading packages for precise or...? [05:35] actually, I'm my platform in quantal, but the images are precise [05:35] bigjools: so the xpath to determine how much memory we have is: http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/view/head:/src/maasserver/models/node.py#L389 [05:37] bigjools: and for mem=34359738368.0 it shouldn't actually be a problem, given: http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/view/head:/src/maasserver/models/node_constraint_filter.py#L32 [05:37] we cast to float, and then ciel and cast back to int. [05:38] bigjools: so to detect what mem we are setting for the nodes, you could do a DB query, or maybe that is exposed on the Node object (in the views) [05:39] bigjools: yeah, the node_view.html template has "node.memory" MB there [05:39] hmmm.... [05:39] looks like we are saving the memory in MB into the DB [05:39] so integer MB [05:39] so the query would need to be: juju deploy --mem=512 for 512MB of memory? [05:39] bigjools: ^^ ? [05:40] That doesn't explain why MaaS tag isn't working [05:40] though [05:46] jam: I had a faint memory of an upgrade breaking all this once... Probably of some third-party dependency. Can't find a bug for it though. [05:47] jam: sorry distracted by the gardener [05:48] jam: good point on the mem constraint [05:48] bigjools: I don't need to know :) [05:48] jam: haha :) [05:49] I have someone in top dressing. does that make it any muddier? :) [05:54] jam: so if the lshw's mem output is not getting parsed correctly, that self-tag thing they did wouldn't even work would it? [05:54] so the parsing is probably ok [05:55] bigjools: so the easy check is for them to go to the Node page and see what it says the node has for memory. The only difference I can see between their tag constraint and our mem parser is that we enforce "units=bytes" [05:56] so say their LSHW didn't have a units field at all [05:56] ah [05:56] and then some other random stuff about bank, etc. But I think that was just to add more matches, rather than more filtering. [05:57] I don't know how pyjuju does it, but I think juju-core has a back-off path for when no available image can match the constraints. [05:58] Any chance that it might be saying "no way I can get 34359738368 MB of memory, but let's try again and be less picky"? [05:58] (Tying into the difference in memory units there) [06:00] Although if that were the case, I don't suppose it'd affect custom tags... [06:00] jtv: I would have thought pyjuju would just refuse to deploy a machine until it finally found one that matched. (aka never) [06:00] because it was just picky like that [06:00] (hence why we had a lot of old problems with maas-name, IIRC) [06:01] I'm looking it up in the code, just in case. [06:02] ok [06:05] It does look a lot as if ServiceUnitState.assign_to_unused_machine won't notice if no machines satisfy constraints, and just return the last candidate it looked at. But that may still go through another filtering pass later. [06:10] I replied to that email asking if they were using matching units [06:11] also advised that mem= needs MB! [06:11] let's see how that works out [06:11] juju needs to document this better I suspect [06:11] our docs all round are weak :( [06:12] Yeah. AFAICT, if you asked pyjuju for an outrageous amount of memory that no machine actually had, it'd just give you an arbitrary available machine. [06:12] (As you might well do if you thought you were specifying memory in bytes but Juju interpreted the number as megabytes) [06:13] It's something I ranted about in a review very recently actually: failure to extract search/filter function. [06:26] that, err, sucks :/ [06:28] Rather. It's an implicit outcome of the loop, not an explicitly visible possibility in the code. One of those reasons why I recommend extracting search/filter functions. [06:29] Sometimes people deal with it by writing the "oh I've found it, now do my business and return" part *in the body of the search loop*, which avoids the implicit bad result but makes the code worse in every other way. === robbiew1 is now known as robbiew [14:33] rvba: ping [14:33] rvba: has maas' apache2 config changed in any way? === freeflying is now known as freeflying_away === freeflying_away is now known as freeflying === kentb is now known as kentb-out === freeflying is now known as freeflying_away === freeflying_away is now known as freeflying