jam | /wave | 05:15 |
---|---|---|
bigjools | jam: so we have someone who can't get constraints working on precise | 05:15 |
bigjools | they are using pyjuju with latest maas | 05:15 |
bigjools | apparently not even mem= works | 05:16 |
bigjools | and given that I realised I don't know much about the work you did I figured I'd better catch up :) | 05:16 |
bigjools | so jtv was telling me that he thinks there's a job that runs after every tag change on a node, is that right? | 05:17 |
jtv | Not _quite_ what I said. :) | 05:18 |
jtv | A job that _needs_ to run before new tags can be taken into use. | 05:18 |
bigjools | meh | 05:18 |
jtv | It matters because I'm not sure it wasn't a manual step. | 05:19 |
bigjools | ok | 05:19 |
jam | bigjools: if you do a change on tags, it fires off a job that gets run by all of the cluster workers | 05:20 |
bigjools | jam: I thought you ran the tag jobs on the region worker? | 05:20 |
jam | to evaluate the hardware characteristics. | 05:20 |
jam | bigjools: my MaaS terminology is a bit out of date, I imagine. | 05:20 |
jam | There was 1 central place, and N subset workers, right? | 05:20 |
bigjools | yes | 05:20 |
jam | this is done on the N | 05:21 |
bigjools | 1 region, many clusters | 05:21 |
bigjools | ah ok | 05:21 |
bigjools | so was anything you added done in the region?~ | 05:21 |
jam | so that when you have 200,000 machines, each cluster only does 4k (or whatever) hardware jobs. | 05:21 |
bigjools | indeed | 05:21 |
jam | bigjools: I believe when adding a single machine it is evaluated by the region against all existing tags. | 05:21 |
bigjools | ok | 05:22 |
jam | however, if mem= isn't working, my guess is that our rules for pulling memory out of lshw characteristics is failing. | 05:22 |
jam | we had that problem at least 1 time before | 05:22 |
jam | on arm, I think | 05:22 |
bigjools | so what is the best way of debugging this: | 05:22 |
bigjools | someone is defining a new tag to match certain machines. querying the tag in the api returns the machines correctly, but juju deploy with the "maas-tag" constraint doesn't work | 05:23 |
bigjools | but then they also said the mem= tag was not matching correctly, so something is really hosed | 05:23 |
bigjools | see https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1214172 | 05:24 |
ubot5 | Launchpad bug 1214172 in maas (Ubuntu) "juju/MAAS Tag constraints do not work in Precise" [Critical,New] | 05:24 |
jam | bigjools: so from what I saw with "maas-cli api tags nodes ram512" it looks like the tags themselves are getting applied to nodes. | 05:25 |
bigjools | yep | 05:26 |
bigjools | when they deploy with mem=34359738368 | 05:26 |
jam | so I can still see "mem=" being broken for other reasons (digging into the code now to remind myself where things are defined) | 05:26 |
bigjools | juju shows mem=34359738368.0 | 05:26 |
bigjools | which is suspicious | 05:26 |
bigjools | (that's from get-constraints) | 05:27 |
* bigjools considers running tcpdump to see the actual requests | 05:29 | |
kurt_ | bigjools: while working with juju today, we discovered the squid proxy on the cluster controller is missing one of the repositories. Is that a known problem? | 05:30 |
kurt_ | roaksoax figured it out | 05:30 |
bigjools | the clusters don't have proxies, only the region IIRC | 05:31 |
kurt_ | ok, fair enough, let me restate - the region controller :) | 05:31 |
bigjools | that's a packaging thing, so it's roaksoax's domain :) | 05:31 |
kurt_ | i c | 05:31 |
kurt_ | ok | 05:31 |
* bigjools volleys back to roaksoax. 15 love. | 05:32 | |
bigjools | well put it like this, there's no squid set up in maas upstream | 05:32 |
bigjools | did you manage to fix it locally? | 05:33 |
kurt_ | lol | 05:33 |
kurt_ | we did, but I do think it needs to be updated in whatever packages for the installer. 403 errors abound | 05:33 |
bigjools | might just be the squid-deb-proxy package | 05:34 |
bigjools | which release? | 05:34 |
kurt_ | and I thought it was a skillful punt :D | 05:34 |
kurt_ | precise | 05:34 |
bigjools | artful :) | 05:34 |
bigjools | and you're downloading packages for precise or...? | 05:34 |
kurt_ | actually, I'm my platform in quantal, but the images are precise | 05:35 |
jam | bigjools: so the xpath to determine how much memory we have is: http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/view/head:/src/maasserver/models/node.py#L389 | 05:35 |
jam | bigjools: and for mem=34359738368.0 it shouldn't actually be a problem, given: http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/view/head:/src/maasserver/models/node_constraint_filter.py#L32 | 05:37 |
jam | we cast to float, and then ciel and cast back to int. | 05:37 |
jam | bigjools: so to detect what mem we are setting for the nodes, you could do a DB query, or maybe that is exposed on the Node object (in the views) | 05:38 |
jam | bigjools: yeah, the node_view.html template has "node.memory" MB there | 05:39 |
jam | hmmm.... | 05:39 |
jam | looks like we are saving the memory in MB into the DB | 05:39 |
jam | so integer MB | 05:39 |
jam | so the query would need to be: juju deploy --mem=512 for 512MB of memory? | 05:39 |
jam | bigjools: ^^ ? | 05:39 |
jam | That doesn't explain why MaaS tag isn't working | 05:40 |
jam | though | 05:40 |
jtv | jam: I had a faint memory of an upgrade breaking all this once... Probably of some third-party dependency. Can't find a bug for it though. | 05:46 |
bigjools | jam: sorry distracted by the gardener | 05:47 |
bigjools | jam: good point on the mem constraint | 05:48 |
jam | bigjools: I don't need to know :) | 05:48 |
bigjools | jam: haha :) | 05:48 |
bigjools | I have someone in top dressing. does that make it any muddier? :) | 05:49 |
bigjools | jam: so if the lshw's mem output is not getting parsed correctly, that self-tag thing they did wouldn't even work would it? | 05:54 |
bigjools | so the parsing is probably ok | 05:54 |
jam | bigjools: so the easy check is for them to go to the Node page and see what it says the node has for memory. The only difference I can see between their tag constraint and our mem parser is that we enforce "units=bytes" | 05:55 |
jam | so say their LSHW didn't have a units field at all | 05:56 |
bigjools | ah | 05:56 |
jam | and then some other random stuff about bank, etc. But I think that was just to add more matches, rather than more filtering. | 05:56 |
jtv | I don't know how pyjuju does it, but I think juju-core has a back-off path for when no available image can match the constraints. | 05:57 |
jtv | Any chance that it might be saying "no way I can get 34359738368 MB of memory, but let's try again and be less picky"? | 05:58 |
jtv | (Tying into the difference in memory units there) | 05:58 |
jtv | Although if that were the case, I don't suppose it'd affect custom tags... | 06:00 |
jam | jtv: I would have thought pyjuju would just refuse to deploy a machine until it finally found one that matched. (aka never) | 06:00 |
jam | because it was just picky like that | 06:00 |
jam | (hence why we had a lot of old problems with maas-name, IIRC) | 06:00 |
jtv | I'm looking it up in the code, just in case. | 06:01 |
bigjools | ok | 06:02 |
jtv | It does look a lot as if ServiceUnitState.assign_to_unused_machine won't notice if no machines satisfy constraints, and just return the last candidate it looked at. But that may still go through another filtering pass later. | 06:05 |
bigjools | I replied to that email asking if they were using matching units | 06:10 |
bigjools | also advised that mem= needs MB! | 06:11 |
bigjools | let's see how that works out | 06:11 |
bigjools | juju needs to document this better I suspect | 06:11 |
bigjools | our docs all round are weak :( | 06:11 |
jtv | Yeah. AFAICT, if you asked pyjuju for an outrageous amount of memory that no machine actually had, it'd just give you an arbitrary available machine. | 06:12 |
jtv | (As you might well do if you thought you were specifying memory in bytes but Juju interpreted the number as megabytes) | 06:12 |
jtv | It's something I ranted about in a review very recently actually: failure to extract search/filter function. | 06:13 |
bigjools | that, err, sucks :/ | 06:26 |
jtv | Rather. It's an implicit outcome of the loop, not an explicitly visible possibility in the code. One of those reasons why I recommend extracting search/filter functions. | 06:28 |
jtv | Sometimes people deal with it by writing the "oh I've found it, now do my business and return" part *in the body of the search loop*, which avoids the implicit bad result but makes the code worse in every other way. | 06:29 |
=== robbiew1 is now known as robbiew | ||
roaksoax | rvba: ping | 14:33 |
roaksoax | rvba: has maas' apache2 config changed in any way? | 14:33 |
=== freeflying is now known as freeflying_away | ||
=== freeflying_away is now known as freeflying | ||
=== kentb is now known as kentb-out | ||
=== freeflying is now known as freeflying_away | ||
=== freeflying_away is now known as freeflying |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!