=== matsubara-afk is now known as matsubara | ||
roaksoax | bigjools: waking up early lately :) | 00:44 |
---|---|---|
bigjools | roaksoax: it's actually late for me - bad night of twins | 00:46 |
* bigjools otp | 00:46 | |
roaksoax | aw :( | 00:46 |
bigjools | roaksoax: where are you guys with the ipmi config? | 00:48 |
roaksoax | bigjools autodetect? needs integration will be done by EOW | 00:50 |
roaksoax | bigjools is trunk broken? i cannot make run | 00:50 |
* bigjools has not tried make run for a while | 00:50 | |
roaksoax | lol | 00:52 |
bigjools | packaging baby, packaging | 00:53 |
* roaksoax bbl | 00:53 | |
roaksoax | bigjools: around? | 02:36 |
bigjools | in body, yes | 02:36 |
roaksoax | bigjools: you prefer to leave the discussion about the packaging for tomorrow ? | 02:36 |
bigjools | roaksoax: yeah if you don;t mind - I have a million other packaging bugs to fix | 02:37 |
bigjools | I might be more compos mentis tomorrow | 02:37 |
roaksoax | bigjools: cooil, anything ai can help with? | 02:38 |
bigjools | roaksoax: always! :_ | 02:38 |
bigjools | :) | 02:38 |
bigjools | roaksoax: bug 1059556, bug 1059459 | 02:38 |
ubot5 | Launchpad bug 1059556 in MAAS "/etc/init/maas-celery.conf not removed on upgrade" [Critical,Triaged] https://launchpad.net/bugs/1059556 | 02:38 |
ubot5 | Launchpad bug 1059459 in MAAS "Existing DHCP server not stopped" [Critical,Triaged] https://launchpad.net/bugs/1059459 | 02:38 |
bigjools | bug 1059416 | 02:39 |
ubot5 | Launchpad bug 1059416 in MAAS "When upgrading from the maas package, the cluster controller package doesn't detect MAAS_URL but it could" [Critical,Triaged] https://launchpad.net/bugs/1059416 | 02:39 |
bigjools | bug 1059569 | 02:39 |
ubot5 | Launchpad bug 1059569 in MAAS "Impossible to start cluster controller as maas user" [Critical,Triaged] https://launchpad.net/bugs/1059569 | 02:39 |
bigjools | take your pick | 02:39 |
bigjools | I need to fix all these today | 02:39 |
bigjools | oh and bug 1059453 | 02:39 |
ubot5 | Launchpad bug 1059453 in maas (Ubuntu) "The celery cluster worker is not properly stopped" [Critical,Triaged] https://launchpad.net/bugs/1059453 | 02:39 |
roaksoax | bigjools: I think this would solve the first one: http://paste.ubuntu.com/1255220/ | 02:41 |
bigjools | yeah we talked about that, thanks, I'll stick it in | 02:41 |
roaksoax | bigjools: i'll test it | 02:42 |
roaksoax | bigjools: i'll grab the latest from PPA and start pushing my fixes there, and if they get fixed I will propose abranch | 02:42 |
bigjools | cool | 02:43 |
roaksoax | i wanted to fix these stuff this morning but got stuck with something else | 02:44 |
bigjools | brb | 02:45 |
roaksoax | bigjools: bug #1059569 is weird, becuase installing the sudoers file should have solved the issue. I think i actually found myself against that error to before | 02:48 |
ubot5 | Launchpad bug 1059569 in MAAS "Impossible to start cluster controller as maas user" [Critical,Triaged] https://launchpad.net/bugs/1059569 | 02:48 |
roaksoax | gonna check the logs | 02:48 |
bigjools | roaksoax: it's not a sudoers problem | 02:49 |
roaksoax | oh i see, what does the maas-provision command require as root? | 02:49 |
bigjools | roaksoax: we either need to not enforce that maas-provision runs as root, or make it start the celeryd as a different user. the latter is problematic as we'd have to put more packaging smarts through | 02:50 |
bigjools | I don't know why it is required to be root | 02:50 |
roaksoax | bigjools: i'm pretty sure the old maas-celery didn't have that problem | 02:51 |
roaksoax | and was running as non-root | 02:52 |
roaksoax | so should be the same thing | 02:52 |
bigjools | roaksoax: no - the process is a wrapper around celery now | 02:52 |
roaksoax | bigjools: right, so that shouldn'y have changed anything | 02:52 |
roaksoax | bigjools: so maybe you guys are storing something on a location which is non maas owned? | 02:52 |
roaksoax | bigjools: are logfiles being pre-created with correct permissions? | 02:52 |
roaksoax | maybe is that | 02:53 |
bigjools | roaksoax: you don't understand, let me explain again | 02:53 |
bigjools | maas-provision now starts the cluster celeryd. maas-provision needs to run as root, but celery needs to run as maas | 02:53 |
bigjools | the upstart conf starts the maas-provision wrapper, not celeryd | 02:54 |
bigjools | if conf is set to run it as maas user, maas-provision bails out as it checks to see if it's root | 02:54 |
bigjools | I don;t know why that check is there | 02:54 |
roaksoax | bigjools: right, I see the point now | 02:55 |
roaksoax | bigjools: so that's design falw then | 02:55 |
roaksoax | bigjools: maas-provisiond should be starting the daemon | 02:55 |
roaksoax | or something | 02:55 |
roaksoax | bigjools: the problme is that we are using a wrapper both a config/admin tool to do certain stuff (such as install pxe images), and you are also using to start a daemon | 02:56 |
roaksoax | and that should not be done that way | 02:56 |
roaksoax | IMHO | 02:56 |
bigjools | I agree | 02:56 |
roaksoax | a dameon is different from a command line tool | 02:56 |
bigjools | it's not a daemon | 02:57 |
roaksoax | bigjools: hence, maas-provision wrapper makes sure we run it as root to do the necessary operations | 02:57 |
roaksoax | bigjools: maybe a good idea would be to have another wrapper that starts the daemon | 02:57 |
roaksoax | without the check | 02:57 |
roaksoax | but either way, celery should be started differently IMHO | 02:58 |
bigjools | I might just remove the check. it's pointless | 02:58 |
roaksoax | bigjools: it is not | 02:58 |
roaksoax | bigjools: it displays nasty errors when run as user | 02:58 |
bigjools | if it needs root, it'll fail anyway | 02:58 |
roaksoax | bigjools: it needs root to be able to write directories | 02:58 |
roaksoax | bigjools: if you do maas-provision -install-pxe-image as normal user it will display a nasty error that should not be displayed | 02:59 |
roaksoax | then, the root check should be done within provisioningserver code | 02:59 |
bigjools | the check should be done elsewhere then | 02:59 |
roaksoax | the wrapper simply checks that the user has appropriate permissions to execute those operations | 02:59 |
bigjools | in fact the code should catch the nasty errors | 03:00 |
bigjools | and display a nice one | 03:00 |
bigjools | wrapping the whole script in a root check is wrong IMO | 03:00 |
roaksoax | k, etiher way i don't agree with having a command line tool starting a daemon | 03:00 |
roaksoax | bigjools: IMO, it is not, because the wrapper runs code that is (or was) meant to be run as root only | 03:01 |
roaksoax | so it should make the check | 03:01 |
roaksoax | but yes, the check should go in the python code now | 03:01 |
roaksoax | that a command line tool starts a daemon | 03:02 |
roaksoax | the daemon should be independent from a command line tool | 03:02 |
bigjools | "meant to be run as root only" - I disagree :) | 03:02 |
roaksoax | bigjools: the wrapper belongs to /usr/sbin | 03:03 |
roaksoax | why? because it messes up with the system | 03:03 |
roaksoax | it does not only mess with the environment of a user | 03:03 |
roaksoax | for that reason, is mean to be run as a privileged user | 03:03 |
bigjools | maas-provision is a command utility. we never intended it to be turned into a root only tool | 03:05 |
roaksoax | bigjools: right, but look, maas-provision is a command line utility that interacts with MAAS operations that require privileged user | 03:05 |
bigjools | yes | 03:05 |
roaksoax | hence, it is a tool that belongs in usr/sbin | 03:05 |
roaksoax | which is meant to be run as root | 03:05 |
bigjools | but not exclusively as rot | 03:06 |
bigjools | root | 03:06 |
roaksoax | bigjools: ok, let e rephrase, privileged user (this means users under sudo). | 03:07 |
roaksoax | bigjools: the check done, is for root user, and sudo users | 03:07 |
roaksoax | bigjools: for example, apachectl can only be run as priviliged user | 03:08 |
roaksoax | maas-provision is exactly the same thing | 03:08 |
roaksoax | the check it has, is to check it is a privileged user | 03:09 |
bigjools | basically we have friction between upstream's design and requirements, and Ubuntu's policies | 03:09 |
roaksoax | bigjools: I dont agree :). These are not Ubuntu policies. These are Operating System policies | 03:10 |
roaksoax | daemons and interaction with daemons are meant to be done by privileged user | 03:10 |
bigjools | the fact that you want it root-only and in sbin is not ubuntu policy? | 03:11 |
roaksoax | bigjools: maybe adding parameter for user/group to run the celery dameon are needed for maas-provision | 03:11 |
roaksoax | bigjools: thta's OS principles, not only ubuntu's | 03:11 |
roaksoax | fedora, redhat, debian, etc etc etc | 03:11 |
bigjools | that was one of my potential solutions yeah | 03:11 |
bigjools | well look, we'll do a separate command | 03:12 |
roaksoax | ok, longtemr thoguh, it should support the arguments for user/group I think | 03:15 |
roaksoax | because it is better to tell the daemon under what user to run | 03:15 |
roaksoax | that way | 03:15 |
bigjools | that's upstart's job isn't it? :) | 03:16 |
roaksoax | bigjools: not completely no | 03:16 |
roaksoax | bigjools: twistd allows passing the user/group that you would like the daemon to run as | 03:17 |
roaksoax | celery doesn't | 03:17 |
roaksoax | but since celery is now run by maas-provision, then maas-provision should probbaly allow passing user/group to run the daemon as | 03:17 |
bigjools | yeah | 03:19 |
bigjools | perhaps that's a better solution here | 03:19 |
roaksoax | indeed | 03:19 |
bigjools | easier, I mean. jtv ^ ? | 03:19 |
roaksoax | bigjools: and as you mentioned, the other operations should check they are run as root, and display an error accordingly | 03:19 |
roaksoax | bigjools: so that we ditch that check in the wrapper | 03:19 |
jtv | I don't quite understand: if the command to start a cluster will run celery as another user than it itself runs under, that makes it a privileged command. What's the argument for making it a command separate from maas-provision? | 03:22 |
roaksoax | jtv: right, it makes it a privilleged command, that the upstart job will run as root, and will tell the daemon "run as the user" | 03:23 |
bigjools | jtv: maas-provision is root-only according to roaksoax, so either it runs celery as a different user or another command run by maas has to start it | 03:23 |
roaksoax | jtv: so for exmaple, the upstart job for txlongpoll is invoked as root, but we are telling it that the daemon (twistd) should run as the 'maas' user group | 03:23 |
roaksoax | bigjools: not just according to me. They way I see it, it is a design principle. If a utility messes up with the system, it is a privileged utility | 03:24 |
bigjools | roaksoax: ok you and others :) | 03:25 |
roaksoax | bigjools: hehe isn't it fun to be in between different worlds ? :) | 03:25 |
bigjools | jtv, if it's easy to become a different user in Python, taking a -uid and -gid cmd line option seems ok to me | 03:25 |
jtv | But for maas-provision, or for a separate command? | 03:27 |
bigjools | jtv: for maas-provision's start_cluster_controller | 03:28 |
roaksoax | jtv: the "separate command" was simply a wrapper to start_cluster_controller | 03:28 |
jtv | OK, so keep start_cluster_controller but make it startable as another user. And obviously that means it'll have to fork() as well. | 03:29 |
bigjools | jtv: indeed | 03:29 |
bigjools | jtv: but fork() is good because it means I can start tracking it properly in upstart | 03:29 |
bigjools | since at the moment it fails to do so | 03:29 |
jtv | Well I say "obviously" -- I don't actually know there's no other way if you're root. :) | 03:30 |
jtv | Yeah, I just hate to build on two variables. | 03:30 |
jtv | It'd be bloody annoying to find that the solution for changing the user makes the upstart problem impossible to solve! | 03:30 |
bigjools | jtv: fork(), change user, exec(). Sorted. | 03:31 |
bigjools | roaksoax: are you landing that Conflicts: change or shall I? | 03:32 |
roaksoax | bigjools: i'm about to test it | 03:32 |
bigjools | roaksoax: excellent, thanks | 03:32 |
bigjools | roaksoax: what is the best way of stopping/disabling the existing dhcp server when installing maas-dhcp? | 03:39 |
jtv | Added note: have to do it synchronously, I think, or it may keep ours off its port. | 03:40 |
bigjools | yeah | 03:40 |
bigjools | easy | 03:40 |
bigjools | roaksoax: I presume invoke-rc followed by update-rc | 03:41 |
roaksoax | bigjools: yes, if it is using sysvinit script | 03:41 |
bigjools | it is | 03:42 |
bigjools | iirc | 03:42 |
roaksoax | bigjools: this should probbaly done on preinst | 03:42 |
bigjools | roaksoax: oh actually it's an upstart conf | 03:44 |
bigjools | how do you disable those? | 03:44 |
roaksoax | bigjools: probably by an override, let me check | 03:46 |
roaksoax | bigjools: http://upstart.ubuntu.com/cookbook/#disabling-a-job-from-automatically-starting | 03:47 |
bigjools | ah thanks | 03:47 |
bigjools | roaksoax: should that be a package-installed file? | 03:48 |
roaksoax | bigjools: not necessary as long as it gets removed on postrm | 03:49 |
bigjools | ok | 03:49 |
bigjools | so preinst and postrim | 03:49 |
* bigjools hacs | 03:49 | |
* bigjools eats first in fact | 03:50 | |
jtv | roaksoax: do we still need a pid file, with upstart watching our celery process? | 04:19 |
roaksoax | jtv: i think we do | 04:20 |
roaksoax | jtv: for twistd | 04:21 |
roaksoax | but not for celery | 04:21 |
* roaksoax checks | 04:21 | |
jtv | Oh, I meant specifically about celery. | 04:21 |
roaksoax | no we dont | 04:21 |
roaksoax | old celery was exec /usr/bin/celeryd --logfile=/var/log/maas/celery.log --loglevel=INFO --beat --schedule=/var/lib/maas/celerybeat-schedule | 04:21 |
roaksoax | so no pid | 04:21 |
jtv | Yeah, I'm asking whether that's right though. Because we had trouble tracking it with upstart. | 04:22 |
roaksoax | jtv: you can specify it if you like | 04:23 |
jtv | I don't like, I'm just not sure whether it's needed! | 04:23 |
roaksoax | jtv: i don't think it is mandatory | 04:24 |
roaksoax | we have been without a pidfile | 04:24 |
roaksoax | so i don't think we need it | 04:24 |
jtv | But it hasn't been working properly, which is why I'm asking. | 04:24 |
roaksoax | jtv: idk TBH. :( | 04:25 |
roaksoax | bigjools: uhm conflicts/replaces didn't seem to work | 04:25 |
jtv | Because it's going to get more complicated to maintain a pidfile with the setuid. | 04:25 |
bigjools | roaksoax: damn :( | 04:25 |
roaksoax | bigjools: maybe just conflicts! i'll give it a try | 04:26 |
bigjools | jtv: it's not working because the wrapper execs in a way that doesn't use fork but somehow changes its pid | 04:26 |
jtv | So we know for sure it's not just lack of pidfile? Good. | 04:26 |
roaksoax | yes I | 04:27 |
roaksoax | i'd agree with bigjools | 04:28 |
roaksoax | s/i'd/i | 04:28 |
bigjools | jtv: once you use fork we can tell upstart about that | 04:29 |
bigjools | and it'll DTRT | 04:29 |
jtv | Yes, that's the hypothesis. I just don't know enough about what's going on to be confident, which is why I ask around. | 04:30 |
jtv | bigjools: well, it turns out one of my hypotheses yesterday was right. We do fork something else before we fork off celery. | 04:59 |
jtv | It's ifconfig. | 04:59 |
=== matsubara is now known as matsubara-afk | ||
bigjools | jtv: gnargh! | 05:06 |
jtv | Yup. | 05:06 |
jtv | So this might be a good time to consider netifaces. | 05:06 |
bigjools | yes | 05:07 |
bigjools | JFDI | 05:07 |
bigjools | and it's in main, unbelievable | 05:07 |
jtv | I was wondering why changing fork-exec to fork-setuid-exec would fix anything :) | 05:07 |
jtv | In good news, it was the tests that brought this to light. | 05:08 |
jtv | What needs doing to add the netifaces dependency? | 05:08 |
bigjools | 2 places: required-dependencies and packaging dependencies | 05:11 |
bigjools | I'm doing loads of packaging changes, I'll do it there for you | 05:11 |
bigjools | just in cluster controller right? | 05:12 |
jam | jam | 05:19 |
jam | morning all | 05:19 |
bigjools | hey jam | 05:24 |
jtv | Hi jam | 05:31 |
jtv | bigjools: yes, just cluster controller. | 05:31 |
jtv | It's lunchtime. I'm going to try to get some equipment. Almost bought a nice little lightweight machine, but found that unity probably won't support its AMD graphics chip. :( | 05:32 |
jtv | Oh crap here comes the rain. Perfect timing as always. | 05:32 |
bigjools | heh | 05:34 |
jam | bigjools: question for you about using celery | 05:52 |
bigjools | yup | 05:53 |
jam | for the Tag table, we know that when we create a tag, it can take 10s for 10,000 nodes to get checked. | 05:53 |
jam | So we want to push that out into an async job | 05:53 |
jam | (cron or rabbit comes to mind) | 05:53 |
jam | bigjools: how do we integrate that with the system? | 05:53 |
bigjools | jam: pretty easy, I'll take you through the steps: | 05:53 |
bigjools | 1. Edit src/provisioningserver/tasks.py and define the async job | 05:54 |
jam | k | 05:54 |
jam | bigjools: just as a function with @task on it? | 05:54 |
bigjools | it's a function decorated with @task | 05:54 |
jam | k | 05:54 |
bigjools | 2. keep the function small - just call out to something in provisioning server so that the tasks file stays small | 05:55 |
jam | (and jinx, I guess :) | 05:55 |
jam | sure, I expect the logic to stay on the Tag object. | 05:55 |
bigjools | 3. from the appserver code, call it with funcname.apply_async(queue=nodegroup.uuid) | 05:55 |
bigjools | and it'll run on that nodegroup's worker | 05:55 |
jam | bigjools: so we have to loop over N nodegroups? | 05:56 |
bigjools | jam: if your nodes are across multiple groups, yes | 05:56 |
jam | (and the nodegroups talk directly to the DB, or we need another bit to send a message back) | 05:56 |
bigjools | to talk back you need an API method defined that the nodegroup can call | 05:56 |
jam | k | 05:57 |
bigjools | s/nodegroup/celeryd/ | 05:57 |
bigjools | the celeryd caches credentials for the API | 05:57 |
bigjools | see report_boot_images() for an example | 05:57 |
jam | bigjools: so what data do the individual nodegroups have access to? (they don't have a local DB do they?) | 05:58 |
bigjools | they don't | 05:58 |
bigjools | they can only see what is passed in the task call | 05:58 |
jam | do they have access to the main? or we need 2 apis, one to scrap the data out of the db, and another to put it back in? | 05:58 |
bigjools | what are you doing, exactly? | 05:58 |
jam | bigjools: we need to take the hardware details, filter it by the xpath statement, and determine a list of Nodes that match particular Tags | 05:59 |
bigjools | ah ok then let's do this in another way | 05:59 |
jam | the long-term idea is that the hardware details would sit on the cluster controler | 05:59 |
bigjools | we have a worker local to the region for this stuff | 05:59 |
jam | however, they don't have anywhere to put it *today8 | 05:59 |
jam | bigjools: sounds like the worker we'd like to talk to for the initial implementation, then. | 06:00 |
jam | bigjools: do you set that by the 'queue=...' stuff? | 06:00 |
bigjools | jam: yes | 06:00 |
bigjools | so if you look in etc/celeryconfig_common.py | 06:00 |
bigjools | you'll see WORKER_QUEUE_DNS = 'celery' and WORKER_QUEUE_BOOT_IMAGES = 'celery' | 06:01 |
bigjools | Add another one, WORKER_QUEUE_<something> | 06:01 |
bigjools | set to "celery", which is the name of the region's worker | 06:01 |
bigjools | then when you define the task you can decorate it like this: | 06:01 |
bigjools | @task(queue=celery_config.WORKER_QUEUE_<thing>) | 06:02 |
bigjools | which ensures it always gets sent to the same worker | 06:02 |
jam | so I'm guessing we should have a similar config as WORKER_QUEUE_TAG_??? and then also point it at 'celery' | 06:02 |
jam | right | 06:02 |
bigjools | and you change the apply_async() to a delay() | 06:02 |
bigjools | so function.delay() | 06:02 |
bigjools | and pass any args in the delay params | 06:02 |
jam | bigjools: ah, ok | 06:03 |
bigjools | it sounds like you'll need some way of querying the API for the data you need | 06:03 |
bigjools | and to store it back | 06:03 |
jam | bigjools: so we still need api calls, but it is just run locally. | 06:03 |
jam | NP | 06:03 |
bigjools | right | 06:03 |
bigjools | make sure the api calls run *quick* then the job can take a bit longer | 06:04 |
jam | bigjools: is there plans in say 13.04 to add some sort of local db/storage on the cluster controllers? | 06:04 |
jam | or is the goal for them to be fully stateless? | 06:04 |
bigjools | possibly. Mark wants to do that. | 06:04 |
bigjools | we might thrash that out a bit more in COP | 06:04 |
jam | bigjools: so how fast is quick? <1s, <100ms, <10ms? | 06:05 |
bigjools | <1s is probably ok | 06:05 |
jam | (shoot for 100ms assuming that sometimes load will cause it to be 1s?) | 06:05 |
bigjools | what I mean is - optimise the DB queries :) | 06:05 |
bigjools | otherwise you're negating the effects of offloading jobs from the appserver | 06:06 |
jam | bigjools: so we have 2 options, we can compute the XPATH content in the DB (postgres has native support for it), or we can compute it in the local process using LXML. It is slightly faster to do it in the DB, because we don't have to read out the XML content, but obviously it adds more load in the DB | 06:07 |
jam | either way we can probably do it in batches | 06:07 |
jam | so we update, say 1000 nodes at a time. | 06:07 |
bigjools | jam: yeah batching is important here for 1000s of nodes | 06:07 |
bigjools | not sure if we have a batching mechanism in Piston | 06:07 |
bigjools | jam: if the xpath runs quick enough on the DB, there's no problem with that | 06:08 |
jam | bigjools: 'quick enough' is 6s for 10,000 nodes. | 06:08 |
jam | so batching at 1,000 nodes would be 600ms, as a ballpark sort of thing. | 06:08 |
bigjools | mmmm might be pushing it | 06:08 |
bigjools | yeah | 06:08 |
jam | but you still spend 6 total seconds doing the work | 06:08 |
bigjools | you'll probably have to manually batch | 06:08 |
jam | or in the 100k node space, you are talking 60s total CPU time. | 06:09 |
bigjools | this is fine to dump on the celery worker | 06:09 |
bigjools | since most of the time it is not doing much ATM | 06:09 |
bigjools | the DB is potentially more precious resources-wise | 06:09 |
jam | bigjools: and the long term goal is to dump down to the individual regions | 06:09 |
bigjools | right | 06:10 |
jam | I'm wondering it should be architectured as: 1) grab the list of nodes that need touching right away, 2) farm out to each nodegroup for their respective nodes | 06:10 |
jam | 3) pass just the node ids | 06:10 |
jam | 4) the provisioning_servers then batch requests for XML content, and parse it. | 06:10 |
jam | 5) and poke the results back into the DB as they go. | 06:10 |
jam | The main trick with all this, is knowing when everyone is "done" | 06:11 |
jam | but I think that gets us a good CPU story | 06:11 |
jam | because the processing is properly farmed out across the 'scaling' portion of the system. | 06:11 |
jam | We have a small bandwidth issue tod | 06:11 |
bigjools | you can do it like that if you like - I'm just saying that the region celeryd is fairly idle | 06:11 |
jam | today, because the data is in the central DB | 06:11 |
jam | but it is where we want it to be done in 13.04 or whatever. | 06:12 |
bigjools | but scaling out is a good plan for the future | 06:12 |
jam | bigjools: from what I can tell, lshw -xml is about 24kB * 10,000 nodes is 240MB being downloaded in these requests. | 06:13 |
jam | Is that too much load? | 06:13 |
jam | on the DB | 06:13 |
bigjools | sounds ok | 06:13 |
jam | (note that they probably compress fantastically well, if that is possible in the API) | 06:13 |
jam | though adding that may be premature optimization at this point. | 06:13 |
jtv | The rain's letting up. I'm going to make my shopping run. | 06:30 |
jtv | bigjools: any chance of a review? https://code.launchpad.net/~jtv/maas/netifaces/+merge/127420 | 06:33 |
jtv | While I'm out? | 06:33 |
bigjools | one sec | 06:33 |
bigjools | yes, OTP | 06:33 |
jtv | Wow, that's fast! | 06:33 |
jtv | :P | 06:33 |
Fajkowsky | hello guys, I have problem with my maas server I described it here - http://www.tinyurl.pl?QSsXP6oX - its "no instance data found in start local" | 08:17 |
bigjools | Fajkowsky: your link is invalid | 08:29 |
Fajkowsky | http://askubuntu.com/questions/195115/nodes-cant-connect-to-server-after-bootstrap | 08:30 |
Fajkowsky | I try install again maas and add nodes, maybe it works this time. | 09:19 |
bigjools | Fajkowsky: I am just adding an answer | 09:20 |
Fajkowsky | ok | 09:21 |
allenap | rvba: Sorry, I just switched your branch back from Approved. | 10:03 |
rvba | allenap: a) I don't agree with what you say in your comment. b) we've got many other place where it's done this way. If we've going to change that behavior, then we better do it everywhere. | 10:06 |
rvba | places* | 10:06 |
allenap | rvba: Well, start here then. That we've done it wrong elsewhere doesn't make it right. | 10:08 |
rvba | allenap: indeed. But I'm really not sure it's right :) | 10:08 |
rvba | allenap: ok you might be right about that idempotent stuff. But, if you don't mind me saying that, your method is wrong here. You should let me land that branch, file a bug about the problem, an *then* someone will pick up that branch and change all the delete methods. | 10:14 |
rvba | allenap: because that bug is marked critical and changing the behavior of all the 'delete' methods is not. | 10:15 |
rvba | It's 'high' at best. | 10:15 |
allenap | rvba: Okay, fair enough; I just wanted to save you an extra branch and proposal for what seems like a simple change. | 10:21 |
rvba | allenap: I just wanted to land a quick fix to unblock Diogo. And also, who's tell you that I'm going to be the one doing that extra branch ;). No, seriously, I've got to focus on my UI stuff right now. | 10:22 |
rvba | s/tell/telling/ | 10:23 |
allenap | rvba: I wasn't suggesting you fix everything. I was just commenting on this one proposal. I changed it back to Needs Review to stop Tarmac from landing it, so that we could talk about amending it, saving the effort of filing a bug, proposing a merge, making a card, etc. I didn't realise it was a bigger problem. I'm sorry that I caused such distress! :) | 10:25 |
rvba | allenap: no distress, really, but if your goal was to save time for the both of us, then that's a fail :). We definitely will have to file a bug for the other delete() methods so fixing up this one and have it half-done is not the way to go I think. "Ni fait ni Ă faire", here is another nice french expression :). | 10:29 |
jtv | Ah, is the "one more small change" worm rearing its ugly head? | 10:32 |
jtv | Meanwhile, I wonder why one of our postinst scripts uses '[a-z]\{0,\}' instead of '[a-z]*' | 10:33 |
jtv | Actually, it's pretty sick to "sed" for an entry in a multi-line dict. What if some other dict contains the same key? I think I'd rather define a variable, and have the dict refer to that. The dict will not need any patching, there'll be no leading whitespace, etc. | 10:36 |
jtv | allenap: maybe you can help me out with this question. What is the relationship between the patch we have in packaging that sets the db password to 'maas', and the postinst code that sets a proper password? Why do we have both? | 10:49 |
allenap | jtv: Eugh, I don't know. Intriguing. | 10:52 |
jtv | Or wtf dbc_dbpass comes from... it seems to be coming from thin air. | 10:52 |
jtv | Bit annoying when you want to verify that it really consists only of ASCII letters and digits. | 10:52 |
jtv | (I don't see why the regex needs to check for exact contents of the string: '[^']*' is both easier and afaict, more appropriate) | 10:53 |
jtv | Review needed! Spot the stupid mistake that I keep repeating... https://code.launchpad.net/~jtv/maas/pkg-bug-1060095/+merge/127451 | 10:55 |
allenap | jtv: I'll do it; I haven't a clue about dbc_wibble so I'll keep my head down reviewing. | 10:56 |
jtv | That's good too. :) Thanks. | 10:56 |
rvba | jtv: Andres will probably know the answers to these packaging questions :). | 10:57 |
jtv | Yeah. Wonder if he's here yet... I need to leave soon. | 10:57 |
jtv | roaksoax, are you here? | 10:57 |
allenap | jtv: It's definitely maas_local_settings not maas_local_settings.py, right? | 10:57 |
jtv | allenap: what is? I think the module is maas_local_settings and the file is maas_local_settings.py. | 10:58 |
jtv | The latter is what we mostly refer to. | 10:58 |
allenap | jtv: It's chmod'ing /etc/maas/maas_local_settings <-- no .py | 10:58 |
jtv | Ooo! | 10:58 |
jtv | Fixed. | 11:00 |
jtv | Good thing you spotted that. | 11:02 |
allenap | jtv, rvba, anyone: I have to stop work early today to collect Robin from school, at 1340 UTC, but I'll be back around this evening, after 1900 UTC. | 11:10 |
jtv | I'll be away for the rest of the night as well. | 11:10 |
jtv | I'll email roaksoax with my questions. | 11:10 |
mgz | allenap: as worded, I don't think bug 1060114 is true, or in need of fixing. | 11:21 |
ubot5 | Launchpad bug 1060114 in MAAS "DELETE operations are not idempotent" [Medium,Triaged] https://launchpad.net/bugs/1060114 | 11:21 |
allenap | mgz: Fancy rewording it? ;) | 11:24 |
allenap | mgz: Ah, I've just seen your comment on the proposal. Interesting point. | 11:25 |
allenap | mgz: I like your explanation. Can you add it to the bug and mark it Invalid? | 11:26 |
mgz | allenap: sure | 11:27 |
allenap | mgz: You don't happen to be in town today? | 11:28 |
allenap | I'm trying to find an excuse to go out for lunch. | 11:28 |
mgz | allenap: alas :) | 11:28 |
mgz | you're making lunch on thursday though, right? | 11:32 |
mgz | well, as it, getting there, kat will be making it :D | 11:32 |
mgz | *as in | 11:32 |
mgz | bah, I can't type for toffee | 11:33 |
allenap | mgz: Yeah, I'll be there, probably at about 1200, because Chantal and I are collecting a puppy before then. | 11:35 |
mgz | bring the puppy to lunch! :D | 11:36 |
mgz | ...how house trained is it going to be? | 11:38 |
allenap | mgz: Not very, and it has to stay at home until it's fully inoculated so I'm told, so a few weeks. | 11:40 |
mgz | ;_; | 11:40 |
allenap | mgz: We can't even take it out for walks at first. It gets to shit in the back garden, that's it :) | 11:40 |
jtv | Okay folks, I'm off for tonight. | 11:41 |
allenap | Cheerio jtv. | 11:41 |
jtv | nn | 11:41 |
jtv | When Raphers comes up to breathe, tell him I wish him good luck and God speed with his UI branches. :) | 11:41 |
jam | jelmer: how's it going? Want to skype some more? | 11:48 |
jam | mgz: how are things looking for you? | 11:48 |
jam | allenap: I have some questions about how the celery stuff works. are you knowledgable or should I chat with bigjools tomorrow? | 11:49 |
allenap | jam: rvba is the man on that front, but I might be able to help. | 11:49 |
jelmer | wb jam | 11:49 |
jelmer | jam: making some progress, trying to get some tests going | 11:50 |
jam | the big question is that the workers need someone to call 'record_secrets' before they can do any work | 11:53 |
jam | but the examples we have seem to just drop the request on the floor if they don't haev the secrets yet | 11:53 |
jam | but how do we make sure that the work is always done | 11:53 |
jam | do we need to put the work in the DB as 'todo' | 11:53 |
jam | and then have it marked 'done' by the callback? | 11:53 |
jam | rvba, allenap^^ | 11:53 |
mgz | jam: landed some tag sample stuff | 11:56 |
=== matsubara-afk is now known as matsubara | ||
jam | and if the work is still pending who retries it? | 11:58 |
rbasak | smoser: ping, for when you get in | 12:03 |
rvba | jam: (was out having lunch) Celery can handle that for you I think, you just need to get the task retried instead of dropping it on the floor if the secrets are not there. | 12:04 |
jam | rvba: how does one signal that? | 12:05 |
rvba | jam: see rndc_command in src/provisioningserver/tasks.py. | 12:05 |
jam | (and not have it go into a death spiral trying to retry the job) | 12:05 |
rvba | http://docs.celeryproject.org/en/latest/userguide/tasks.html#retrying | 12:05 |
jam | rvba: ah the function itself calls func.retry | 12:07 |
rvba | Yeah. | 12:07 |
mgz | okay, now I feel a lot less clever about not using real xpaths... | 12:15 |
jam | mgz: we have code that asserts they are valid | 12:16 |
jam | like when allocating a new node | 12:16 |
jam | mgz: quick poll for you | 12:17 |
mgz | I'll fake that up here | 12:17 |
jam | we know that after updating a tag | 12:17 |
jam | there will be some time where the node <=> tag mapping is inconsistent. | 12:17 |
jam | Is it better to drop all mapping, and then add them slowly? | 12:18 |
jam | or is it better to slowly update the nodes to make it consistent? | 12:18 |
mgz | ah, interesting | 12:18 |
jam | A user fixes a tag definition, and then goes to start nodes based on that tag. | 12:18 |
jam | is it better to not match anything, and get tried again later | 12:18 |
jam | or better to match something that it used to match, on the premise that a small update is unlikely to actually change the node set dramatically. | 12:18 |
jam | I'm tending towards the former, on the premise that 'juju deploy' will keep trying to fulfill your request | 12:19 |
jam | and then you won't accidentally deploy on machines that no longer match the new tag vaule. | 12:19 |
jam | value. | 12:19 |
mgz | I feel a common "update the same name" case might be to slightly tweak what gets selected | 12:19 |
mgz | so, having an interval where the stuff that used to be selected and will still be selected is not, is probably the most suprising | 12:20 |
jam | mgz: right 'has_nvidia' and you realize that sometimes it is NVIDIA and sometimes nvidia | 12:20 |
jam | so you want to make it case insensitive | 12:20 |
jam | mgz: the flip side is 'big >= 2GB' and you update it to 'big >= 4GB' and you deploy, and it picks a 2GB node. | 12:21 |
mgz | that seems less harmful, having an interval where the old rules apply | 12:21 |
jam | mgz: my argument for 'not selected for a while' is that it will eventually be selected and retrying the query will get you the right value. | 12:21 |
mgz | this is true, if we stick with only using tags as positives | 12:22 |
jam | mgz: so I think jelmer's preference is to do delta updates | 12:22 |
mgz | the other option... | 12:22 |
jelmer | jam: if it's retrying it might be that the tag is still only half up-to-date when the set of nodes is non-empty | 12:22 |
mgz | is to set an update time | 12:22 |
mgz | and if asked to acquire something with an old tag set, defer until the tags are fresh | 12:22 |
jam | jelmer: it may be only have up-to-date, but everything that is tagged *definitely* matches the new value. | 12:22 |
jam | so a 'big' node will always have 4GB after changing the definition. | 12:23 |
mgz | acquire is expected to be slow | 12:23 |
jam | mgz: we do have an updated field already (it is part of the model) | 12:23 |
mgz | adding 6s (currently) or per-cluster update delay, would not be too bad | 12:23 |
jam | and we can do other work to detect if all nodegroups have given their responses. | 12:23 |
mgz | if a cluster doesn't do any acquire till after it's done tag updates, you'd still get some responses fast | 12:24 |
jam | (essentially some sort of db entry that indicates whether nodegroup X has responded for tag Y) | 12:24 |
mgz | (slow/big clusters would tend not to be used straight after tag updates, but that's not terrible) | 12:24 |
jam | mgz: in the end, this sounds like something to bring up at the standup, and we can move forward with what we have until then. | 12:25 |
jam | I have the small feeling that a coin flip may end up involved somewhere. | 12:25 |
mgz | yup, all options are reasonable, and changing strategy after we have running code is fine. | 12:25 |
jam | allenap: also, for the 'nodegroup' changes, are all nodes going to have a nodegroup? | 12:26 |
jam | mgz: it does change the api, 'add_nodes' vs 'update_nodes', etc. | 12:26 |
allenap | jam: They should do already, but let me check. | 12:26 |
rvba | jam: it is already the case. | 12:26 |
jam | rvba: I see that it says 'this should be not null, but we can't do that yet' | 12:26 |
smoser | rbasak, here now. | 12:27 |
allenap | rvba: Node.nodegroup is null=True, blank=False -- what does that mean on a foreign key? | 12:27 |
allenap | rvba: Ah, I've read the comment now ;) | 12:27 |
rvba | allenap: :) | 12:28 |
smoser | hm... did we get a bug for my issue with daily ppa not installable? | 12:28 |
mgz | bah, pants: TypeError: 'Tag' instance expected, got <Tag: big> | 12:30 |
mgz | south wants to make my life miserable | 12:30 |
mgz | can I refactor that bit out... | 12:30 |
jam | mgz: I think you can do add(Tag.id) but I might be wrong. | 12:31 |
mgz | jam: I'm trying to share code with the migration... when they're really using different model classes | 12:32 |
rbasak | smoser: good morning! | 12:32 |
mgz | passing it in might be th lease stupid option... | 12:32 |
rbasak | smoser: the precise daily ephemeral image seems to work. I didn't add BOOTIF_DEFAULT and it works. | 12:33 |
mgz | *the least | 12:33 |
rbasak | smoser: I had a question though. Looking at making maas-import-ephemerals work for ARM without hackery. It should be a pretty minor change, but I can't remember what we said about adding multiple subarchs in ephemeral images | 12:33 |
rvba | mgz: I don't think you should try to share code with the migration because the migration runs with the models being in a special state (when not all of the migrations have been run yet) so if you change that code later, it might break the migration. | 12:34 |
rbasak | smoser: is the plan to have one image for all of armhf? In which case, how does that work with install_tftp_image? | 12:34 |
smoser | rbasak, it only works for you because you get a dhcp response on both interfaces. | 12:34 |
smoser | s/both/all/ | 12:34 |
mgz | rvba: this is all basically a trigger | 12:35 |
mgz | when hardware_details field changes, update these other fields | 12:35 |
smoser | rbasak, for multipel sub-arches' the plan would just to change the format of the tar file so that there were multiple kernels pulled out. | 12:35 |
rbasak | smoser: OK. I want to break it before I fix it in order to test the fix | 12:35 |
smoser | rbasak, fi you want more than "highbank" then ephemeral images and import scripts probably need work. | 12:36 |
mgz | I don't know how to do the migration correctly if setting that field does not also do the (db contents specific) updates to the rest of the stuff | 12:36 |
rbasak | smoser: OK, but does that mean that the maas-provision install-pxe-image interface will need to be changed? Currently it expects an entirely different image directory per subarch, which would be a waste of space I think | 12:36 |
mgz | and it's complex, writing it in two places, one of which is not tested, does not appeal to me. | 12:36 |
rvba | mgz: ok, I don't know what particular problem you're facing but it's just that we've been bitten by that once :) | 12:37 |
mgz | I could just copy the current code though | 12:37 |
smoser | one could just complain that "subarch" should never have been invented :) | 12:37 |
rbasak | :-) | 12:37 |
rbasak | One day we'll have device tree and it'll all go away | 12:37 |
smoser | i'd have to look at it, rbasak, but yeah, our goal would be to have one ephemeral iamge | 12:37 |
smoser | and multiple tftp'd kernels | 12:37 |
rbasak | smoser: ok. So I think it's not practical to get multiple subarch support into maas-import-ephemerals right now, so I think I'll add some kind of ugly hack for highbank and leave it at that for 12.10. Is this OK with you? | 12:39 |
smoser | what is it that you're concerned about? | 12:41 |
rbasak | smoser: right now it doesn't import highbank at all, since "generic" is hardcoded. I need to have this fixed by 12.10, that's all. | 12:41 |
smoser | and it doesn't fail? | 12:42 |
rbasak | smoser: I've been patching it by hand up to now | 12:42 |
rbasak | smoser: and now I'm at the point where I'd like to get it working in trunk and in the package | 12:42 |
rbasak | smoser: and have it import all three arches by default | 12:42 |
sanderj | How come MaaS have a bug not including udev in the initrd script? So it dosn't work to boot up remote vms? | 12:53 |
sanderj | scripts/init-bottom/udev should look like this: http://paste.ubuntu.com/679222/ | 12:54 |
rbasak | sanderj: what exactly is the problem? | 13:03 |
sanderj | rbasak, when booting up a vm from spx with maas.. I get the error: cannot find "bnx2-mips-09-6.2.1a.fw" | 13:04 |
rbasak | what's spx? | 13:05 |
sanderj | when I unpacked the initrd and added two lines to the abow script, then it workes. | 13:05 |
sanderj | pxeboot | 13:05 |
sanderj | I mean | 13:05 |
sanderj | Sorry | 13:06 |
rbasak | which lines did you add? | 13:06 |
rbasak | any idea what bnx2-mips is? | 13:06 |
sanderj | network card driver | 13:06 |
sanderj | . /scripts/functions | 13:07 |
sanderj | wait_for_udev | 13:07 |
sanderj | Those two lines I added | 13:07 |
sanderj | into scripts/init-bottom/udev | 13:07 |
rbasak | which initrd did you modify? | 13:07 |
sanderj | The initrd for the kernel maas uses to boot up remote machines. | 13:08 |
rbasak | There are a few | 13:08 |
rbasak | What was the path? | 13:08 |
rvba | allenap: I'm reviewing your branch: https://code.launchpad.net/~allenap/maas/query-strings-and-request-bodies/+merge/127479 | 13:10 |
allenap | rvba: Thanks. | 13:10 |
rvba | allenap: I think it will fix many bugs in one go :) | 13:10 |
allenap | rvba: Yeah, I hope so. What *was* I thinking before? | 13:10 |
=== flacoste changed the topic of #maas to: 1 week until Final Freeze | Discussion of upstream development of Ubuntu's Metal as a Service (MAAS) tool | MAAS jenkins: https://jenkins.qa.ubuntu.com/job/maas-trunk/ | ||
sanderj | rbasak, /var/lib/maas/ephemeral/precise/ephemeral/amd64/20120424 | 13:12 |
sanderj | rbasak, /var/lib/maas/ephemeral/precise/ephemeral/amd64/20120424/initrd | 13:13 |
rbasak | sanderj: so I think that's not a maas specific issue, although perhaps maas is the only thing to exhibit it. I think the file you modified is coming from initramfs-tools or some package like that | 13:13 |
rbasak | sanderj: can you check for existing bugs and if you can't find one, then please file a bug report with as much detail as you can? | 13:14 |
sanderj | rbasak, where do I find maas bugs? | 13:15 |
sanderj | rbasak, I think the bug is corrected with ubuntu, but not in maas. | 13:15 |
rbasak | sanderj: https://bugs.launchpad.net/maas/ | 13:15 |
sanderj | No bugs reported when searching for udev there. | 13:16 |
jam | mgz, jelmer: https://code.launchpad.net/~jameinel/maas/get-nodes-for-group/+merge/127484 | 13:16 |
mgz | jam: looking | 13:27 |
sanderj | rbasak, ok, now it's reported. Let's hope it helps. | 13:28 |
rvba | matsubara: the fix for 1060079 should land any minute now. | 13:41 |
matsubara | rvba, great | 13:41 |
roaksoax | rvba: howdy!! Is make run broken? | 14:05 |
roaksoax | err upstream trunk borken? | 14:05 |
rvba | roaksoax: jenkins seems happy, let me check | 14:05 |
rvba | roaksoax: everything seems fine, what error are you seeing? | 14:07 |
roaksoax | rvba: an import error. give me a sec an i'll show you | 14:10 |
rvba | roaksoax: apt-get install python-netiface maybe? | 14:11 |
roaksoax | rvba: django.db.utils.DatabaseError: relation "maasserver_config" does not exist | 14:11 |
roaksoax | LINE 1: ..._config"."name", "maasserver_config"."value" FROM "maasserve... | 14:11 |
roaksoax | rvba: also this: it is not being cleaned: setlock: fatal: unable to lock /run/lock/maas.dev.cluster-worker: temporary failure | 14:13 |
rvba | roaksoax: the first error makes me thing that the database is simply not there because "maasserver_config" is an old table that was created months ago. | 14:15 |
rvba | roaksoax: the second one: sometimes celery gets stuck, just killall the celery processes and remove that file (/run/lock/maas.dev.cluster-worker). | 14:15 |
roaksoax | rvba: ImportError: No module named netifaces --> even though it is installed | 14:15 |
roaksoax | there's really something weird going on | 14:16 |
roaksoax | rvba: what do you think might be causing that in my ssytem? | 14:23 |
rvba | roaksoax: difficult to say remotely, can you make sure first that all the processes have been killed? | 14:24 |
roaksoax | rvba: yeah they were, I just had rebooted my machine. IU'm trying again now | 14:25 |
mgz | yeay, working migration | 14:43 |
mgz | now that wasn't at all painful or owt | 14:43 |
mgz | ...which still falls over if run from scratch... | 14:45 |
mgz | hm, the maasserver gets migrated before anything at all is done with metadataserver? that's fun. | 14:47 |
mgz | well, should be easy to fix (ho ho ho) | 14:49 |
roaksoax | rvba: http://pastebin.ubuntu.com/1256069/ | 14:50 |
roaksoax | rvba: that's weird postgresql is running | 14:50 |
rvba | roaksoax: you can try to remove db/.s.PGSQL.5432 | 14:52 |
roaksoax | rvba: the file doenst exist | 14:54 |
rvba | roaksoax: not even db/.s.PGSQL.5432.lock ? | 14:54 |
roaksoax | nope | 14:54 |
rvba | roaksoax: can you wipe out the db or is there things in there you'd like to keep? | 14:56 |
roaksoax | rvba: so i rm -rf db/ and make run again and same issues | 14:57 |
rvba | roaksoax: does 'make sampledata' work? | 14:58 |
roaksoax | it did | 14:58 |
roaksoax | i'm re-making the whole environment | 14:58 |
rbasak | smoser: I can't get the precise daily armhf image to fail. I tried disabling dhcp on eth1 and it still works | 14:59 |
rbasak | smoser: but whichever way, please could you promote it to a release? Then I can have maas-import-ephemerals import armhf by default without breaking anything | 14:59 |
smoser | rbasak, can you send me a console log ? | 15:00 |
rbasak | smoser: ...of it working? | 15:00 |
smoser | because i dont like that i dont think it should work | 15:00 |
rbasak | OK | 15:00 |
rbasak | smoser: err | 15:01 |
rbasak | smoser: BOOTIF seems to have arrived | 15:01 |
rbasak | Well this is embarrasing | 15:01 |
rbasak | smoser: I'll check after this test but it seems that IPAPPEND support might h ave appeared in the lastest highbank U-BOot update | 15:01 |
smoser | well that'd be neat. | 15:02 |
Daviey | lol | 15:03 |
roaksoax | rvba: make sampledata works | 15:07 |
smoser | roaksoax, ping | 15:19 |
rbasak | smoser: yeah IPAPPEND now works! | 15:20 |
smoser | well that is nice indeed. | 15:20 |
rbasak | smoser: one note though. With DHCP disabled on eth1, everything worked all the way through except after the installation cloud-init hung | 15:20 |
rbasak | smoser: and at that stage it's a local boot so no BOOTIF expected | 15:20 |
rbasak | smoser: thoughts? | 15:20 |
smoser | it looks like little intel-jr is growing up. | 15:20 |
rbasak | :-)_ | 15:21 |
smoser | precise | 15:21 |
smoser | ? | 15:21 |
smoser | quantal should work | 15:21 |
rbasak | Yes | 15:21 |
rbasak | I'll check quantal | 15:21 |
smoser | you need a couple fixes SRU'd to precise | 15:22 |
smoser | (they happen to be in that maas-ephemeral ppa , so you could try just adding that ppa and seeing if that makes it magic) | 15:22 |
rbasak | smoser: which ppa please? | 15:23 |
rbasak | smoser: ephermal-fixes? | 15:24 |
smoser | https://launchpad.net/~maas-maintainers/+archive/maas-ephemeral-images | 15:24 |
rbasak | thanks! | 15:24 |
roaksoax | smoser: pong | 15:26 |
rbasak | smoser: is sources.list expected to be wrong in precise still? | 15:26 |
smoser | rbasak, yeah | 15:27 |
smoser | roaksoax, so ... we are to fix ipmi today | 15:27 |
rbasak | smoser: the PPA seems to have fixed it | 15:27 |
smoser | you did an install that quickly? | 15:28 |
rbasak | I just updated | 15:28 |
roaksoax | smoser: ok | 15:28 |
rbasak | smoser: unless first boot is expected to be different for some reason? | 15:29 |
roaksoax | rbasak: quick question... if we upgrade from precise to quantal, how is te change in arch from i386 to i386/generic is handled? | 15:31 |
rbasak | roaksoax: the db migration just slaps /generic on the end of all existing nodes' architectures | 15:31 |
roaksoax | rbasak: cool! | 15:31 |
roaksoax | allenap: aroung? | 15:41 |
roaksoax | around* | 15:41 |
rvba | roaksoax: allenap will be back around 1900 utc. | 15:41 |
roaksoax | rvba: alright. So you might help then :). So for the power related stuff, IPMI specifically, we need to ship a especial config file that will be used every time an IPMI command is executed | 15:42 |
roaksoax | rvba: were do you think the file should live, and how should it be referenced | 15:42 |
roaksoax | i was thinking it should live with the templates | 15:44 |
roaksoax | rvba: oh wai,t you were the one who did the power stuf right? | 15:44 |
rvba | yeah | 15:44 |
roaksoax | rvba: hehe alright so it its you then :) | 15:44 |
rvba | roaksoax: "live with the templates"… which templates? :) | 15:45 |
roaksoax | rvba: http://paste.ubuntu.com/1256181/ | 15:46 |
roaksoax | rvba: live with the templates, as in this file should be placed in the power template directory, (were ipmi.template is) | 15:46 |
rbasak | roaksoax: mind that workaround stays in the right place in that patch | 15:47 |
rvba | roaksoax: src/provisioningserver/power/config/ seems like a good place to me | 15:47 |
rbasak | roaksoax: also ipmi-chassis-config might need it too. Probably worth testing with </dev/null | 15:47 |
roaksoax | rbasak: the workaround is not being affected | 15:47 |
rbasak | roaksoax: your patch puts it on the wrong line | 15:48 |
roaksoax | rbasak: ack! | 15:48 |
roaksoax | rbasak: k | 15:48 |
roaksoax | rbasak: i see now | 15:48 |
roaksoax | :) | 15:49 |
rbasak | :) | 15:50 |
roaksoax | rvba: do you know anything about jtv's changes on running maas-cluster-celery under user/pass? | 16:14 |
roaksoax | user/group? | 16:14 |
rvba | roaksoax: yeah, I think he has landed that branch. | 16:15 |
rvba | roaksoax: is there a problem with that change? | 16:17 |
rvba | roaksoax: btw, did you manage to get rid of that weird problem you had? | 16:17 |
roaksoax | rvba: yteah i did manage to get rid of the problme i had | 16:17 |
roaksoax | rvba: and i think there is, i saw maas-cluster-celery bein unable to start | 16:18 |
roaksoax | but now it starts :/ | 16:18 |
rvba | Be aware of the fact that it does not start celery instantly, if first need to get the credentials from the region controller. | 16:18 |
roaksoax | rvba: yeah so it seems to do that | 16:19 |
roaksoax | but still | 16:19 |
roaksoax | let me check again | 16:19 |
rvba | roaksoax: arg, it seems the packaging has not been cleaned up: debian/maas-cluster-controller.maas-cluster-celery.upstart still contains setuid maas/setgid maas | 16:21 |
rvba | roaksoax: so apparently, he made the upstream change, but not the related packaging change. | 16:21 |
rvba | roaksoax: and /usr/sbin/maas-provision will refuse to do anything if not run as root. | 16:22 |
rvba | roaksoax: so now I wonder, how can you see it running… ? | 16:22 |
roaksoax | rvba: i didn't the problme was that it failed to start on an upgrade | 16:22 |
roaksoax | hence leaving the package unconfoigured | 16:23 |
roaksoax | rvba: i'll upload a fix | 16:23 |
rvba | Ok, thanks. | 16:23 |
rbasak | smoser: just finished testing quantal daily ephemeral for quantal install. It works all the way through without problems. | 16:42 |
rbasak | smoser: can we get the precise armhf daily converted to a release soon now please? Is there anything blocking this? Then I can land a change for maas-import-ephemerals to import armhf by default. | 16:43 |
smoser | i can do that now. | 16:44 |
rbasak | thanks! | 16:44 |
rbasak | Although you changed 'cloud-init boot finished' to 'Cloud-init v. 0.7 finished' so my expect script didn't match for success :-P | 16:44 |
mgz | rbasak: yeah, smoser likes doing small cloud-init changes to break your scripts :) | 16:45 |
mgz | I do now work with lucid->quantal though | 16:46 |
smoser | recent changes can be more blamed on harlowja | 16:46 |
smoser | sorry about that. | 16:47 |
mgz | smoser: the main annoyance is needing to support all versions, the new improved is very nice but having to work with lucid still makes it painful... | 16:48 |
mgz | I'd like to use the file injection stuff now josh implemented it, but having two versions loses the simplifications... | 16:50 |
roaksoax | rvba: do you think it is possible to ship maas_local_settings.py in /usr/share/maas and have that source somthing in /etc/maas/local_settings.py or similar? | 16:55 |
roaksoax | rvba: or have a proper conffile? | 16:56 |
mercsniper_ | is the cloud init package still out of date for 12.04? | 16:57 |
melmoth | mercsniper_, i do not know, but last time i used maas (2 weeks ago) i did not experience problem with cloud-init | 16:58 |
mercsniper_ | k | 16:58 |
rvba | roaksoax: that's possible but that would simply add one additional level of complexity. | 16:58 |
rvba | roaksoax: what would it give us? | 16:59 |
roaksoax | rvba: ok so the problme is that we can no longer modify /etc/maas/maas_local_settings.py in packaging | 16:59 |
roaksoax | rvba: so it should only be modified by the user | 16:59 |
roaksoax | not the package | 16:59 |
roaksoax | so if the user makes changes, on upgrade it gets prompted | 16:59 |
roaksoax | rvba: it would be like adding .d support | 16:59 |
roaksoax | rvba: cause if it is not done upstream, i'm gonna have to patch maas up and do it myuself | 17:00 |
roaksoax | Daviey: what was the package with the .d support for cobbler? | 17:00 |
Daviey | roaksoax: it was a custom thing i did | 17:04 |
Daviey | never ht the archive, still in my PPA | 17:04 |
rvba | roaksoax: that's definitely doable, we've got a tiny utility method to do that so the change should be simple. Could you please file a bug with the details. I'm probably not gonna be able to do it right now but Gavin might be able to do it later today. | 17:04 |
mercsniper_ | Is there a way to remove a node if the status is commissioning? | 17:05 |
roaksoax | rvba: cool, that way, we can simply edit /usr/share/maas/maas_local_settings.py or whatever in packaging | 17:06 |
roaksoax | rvba: and if the user wants to override something he can do it | 17:06 |
mercsniper_ | melmoth: are you using 1204 or 12.10 for your maas | 17:22 |
melmoth | 12.04 | 17:22 |
mercsniper_ | still getting commissioning... | 17:22 |
melmoth | mercsniper_, did the machine restarted ? did you try to reboot it ? | 17:24 |
melmoth | i did not understand exactly how the power management thing worked, so some times i just rebooted nodes. | 17:25 |
mercsniper_ | I tried rebooting, I get cloud-init-nonet killed(300) | 17:25 |
melmoth | some times they rebooted on their own (i m still puzzled as to why :-) ) | 17:25 |
mercsniper_ | while it boots i get a landscape-client is not configured | 17:26 |
melmoth | i think i have seen that but did not looked like a real problem. | 17:26 |
melmoth | but i did not see a cloud-ini-nonet killed | 17:26 |
mercsniper_ | init: clount-init-nonet main process (269) killed by TERM signal | 17:27 |
melmoth | https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1015223 | 17:28 |
ubot5 | Ubuntu bug 1015223 in cloud-init (Ubuntu) "cloud-init-nonet main process killed by TERM signal" [Low,Triaged] | 17:28 |
melmoth | dont panic i think it says :) | 17:28 |
melmoth | hmm, but there s a link to https://bugs.launchpad.net/ubuntu/+source/maas/+bug/992075 that is worryiing | 17:30 |
ubot5 | Ubuntu bug 992075 in maas (Ubuntu) "Commissioning status persists with cloud-init 0.6.3-0ubuntu1" [Undecided,Confirmed] | 17:30 |
melmoth | mercsniper_, is the date and time the same on the maas server and the machine you try to comission ? | 17:32 |
mercsniper_ | hm.... i would imagine | 17:32 |
mercsniper_ | is there a standard login for nodes? | 17:32 |
melmoth | not untill they got the public juju ssh key injected | 17:33 |
melmoth | if you want a password login you need to make your own image (i dont know how, i saw once a doc telling how to) | 17:33 |
mercsniper_ | ah | 17:33 |
melmoth | i m asking about the clock thing because it hit me several time with juju stuff and because of comment 2 in https://bugs.launchpad.net/ubuntu/+source/maas/+bug/992075 | 17:34 |
ubot5 | Ubuntu bug 992075 in maas (Ubuntu) "Commissioning status persists with cloud-init 0.6.3-0ubuntu1" [Undecided,Confirmed] | 17:34 |
melmoth | mercsniper_, see comment 12 https://answers.launchpad.net/maas/+question/196791 | 17:36 |
melmoth | (and what about booting on a live cd and running a ntpade so the local clock is roughly on the correct date and time on next reboot ? ) | 17:37 |
mercsniper_ | true | 17:40 |
mercsniper_ | trying to bootstrap juju, i get an unexpected http 500 code....mean anything to anyone? | 17:52 |
mercsniper_ | this directory didnt exist /var/lib/maas/media/storage | 17:59 |
mercsniper_ | how long does commissioning take? | 18:35 |
melmoth | couple of minutes | 18:35 |
mercsniper_ | then it must not be commissioning properly | 18:35 |
melmoth | the only real long stuff is when you deploy a service, then it s areal install | 18:35 |
mercsniper_ | gonna try virtual box instead of vmworkstation | 18:43 |
=== mercsniper__ is now known as mercsniper | ||
roaksoax | smoser: please :) https://code.launchpad.net/~andreserl/maas/packaging_updateS_bzr1134/+merge/127570 | 18:54 |
smoser | roaksoax, you copied bug https://bugs.launchpad.net/maas/+bug/1039513 to import-squashfs | 19:03 |
ubot5 | Error: ubuntu bug 1039513 not found | 19:03 |
roaksoax | smoser: ack | 19:05 |
roaksoax | yeah we need to do verifications | 19:05 |
rbasak | allenap: thanks for the review! | 19:07 |
allenap | rbasak: Welcome :) | 19:08 |
smoser | https://code.launchpad.net/~smoser/maas/trunk-remove-hostname-kludge/+merge/127571 | 19:08 |
smoser | someone can tak that too | 19:08 |
rbasak | allenap: there's https://code.launchpad.net/~racb/maas/arch-detect/+merge/127458 too, and then I'm done :-P | 19:09 |
allenap | Okay, I'll try to look at those both. | 19:09 |
rbasak | thank you! | 19:09 |
rbasak | After that the daily PPA should in theory work for ARM | 19:10 |
allenap | smoser: Is there any way to make set -e work for broken command substitution? | 19:25 |
smoser | what does that mean? | 19:25 |
allenap | wrt. bug 1060411 | 19:25 |
ubot5 | Launchpad bug 1060411 in MAAS "maas-import-pxe-files does not catch failure of compose_installer_download_files" [Undecided,New] https://launchpad.net/bugs/1060411 | 19:25 |
smoser | oh. | 19:25 |
smoser | thats not hte issue. | 19:25 |
smoser | commented in bug. | 19:26 |
smoser | the issue is just 'local' as the declaration succeeds, and that is what is checked. | 19:26 |
smoser | so you just do those on 2 separate lines. | 19:26 |
allenap | smoser: If I do: | 19:27 |
allenap | set -e; echo $(does_not_exist); echo $? | 19:28 |
allenap | I get 0. | 19:28 |
allenap | Ah! But if I do variable=$(something) it will break. | 19:28 |
allenap | Okay, got it. | 19:28 |
roaksoax | allenap: https://code.launchpad.net/~andreserl/maas/use_squashfs_filesystem_2/+merge/127577 --> I adressed 1, the rest is addings tests | 19:30 |
smoser | this is one reason i prefer: | 19:30 |
roaksoax | allenap: if you could take care of that, I'd deeply appreciate it | 19:30 |
smoser | myfunc && myvar=$_RET || return 1 | 19:30 |
smoser | to | 19:30 |
smoser | myvar=$(myfunc) || return 1 | 19:30 |
smoser | in addition to the fact that the second incurs a fork | 19:30 |
smoser | you're welcome to make fun of my hatred of forks. but you'll see why i hate them next time you dist-upgrade. | 19:31 |
melmoth | mercsniper, you install things in virtual machines ? | 19:31 |
roaksoax | smoser: uhmm enlistment doesn't seem to be working : | 19:32 |
roaksoax | :s | 19:32 |
allenap | smoser: Are you sure it incurs a fork, when calling a shell builtin? Try: echo $$ $(echo $$) | 19:32 |
smoser | i'm positive. | 19:32 |
mercsniper | Mel: I am doing this work as a learning experience on my work laptop | 19:33 |
melmoth | how much ram ? | 19:33 |
roaksoax | smoser: did you change the console to which it is displaying the output? | 19:33 |
mercsniper | vmworkstation lets you switch between machines | 19:34 |
smoser | allenap, compare: | 19:34 |
smoser | $ time sh -c 'for i in "$@"; do echo $(echo $i); done' -- $(seq 1 1000) >/dev/null | 19:34 |
smoser | to | 19:34 |
melmoth | i m using kvm to play with things and learn maas here | 19:34 |
smoser | time sh -c 'for i in "$@"; do echo $i; done' -- $(seq 1 1000) | 19:34 |
melmoth | mercsniper, http://bazaar.launchpad.net/~pierre-amadio/+junk/c6100-jumpstart-maas/view/head:/README.txt | 19:36 |
roaksoax | smoser: nevermind :) | 19:36 |
melmoth | ir you want to give kvm a try instead of virtuabox..Should just work "out of the box" | 19:36 |
allenap | smoser: Yeah, I can see it now; another way to demonstrate it is looking at the process list when running: echo $(read) | 19:36 |
mercsniper_ | unfortnately, im on a windows host | 19:40 |
melmoth | hey, good reason to install something new on your laptop :) | 19:40 |
mercsniper_ | laptop needs to stay windows per company policy | 19:41 |
mercsniper_ | thats why its all virtual | 19:41 |
melmoth | i do it in kvm because it s easier to get one machine with lots and lots of ram, than 10 little ones with switch and wire and stuff | 19:42 |
roaksoax | smoser: updated | 19:44 |
* roaksoax hates chromium crashing | 19:45 | |
smoser | reguarding wrap-and-sort, i was only really complaining about the python-netifaces | 19:46 |
roaksoax | allenap: what extra tests did you have in mind for the suqashfs | 19:46 |
smoser | and it turns out i was wrong there anyway | 19:46 |
smoser | :) | 19:46 |
smoser | (i thought that would sort after the ${misc:Depends} | 19:46 |
smoser | i'll ack this because wrap-and-sort is generally a good thing. | 19:46 |
smoser | ah. but you approved already. | 19:47 |
smoser | :) | 19:47 |
roaksoax | smoser: yeah :) | 19:48 |
roaksoax | thanks | 19:48 |
smoser | oh... roak! | 19:50 |
smoser | set -e ? | 19:50 |
smoser | er... | 19:50 |
smoser | set -x ? | 19:50 |
smoser | really? | 19:50 |
roaksoax | smoser: in maas-import-squashfs? | 19:50 |
roaksoax | smoser: yeah I forgot | 19:51 |
roaksoax | smoser: a branch is ready for review that allenap needs to review | 19:51 |
roaksoax | that completes that | 19:51 |
roaksoax | and fixes it | 19:51 |
roaksoax | completes the support and fixes that | 19:51 |
smoser | roaksoax, isntall of maas-dhcp from experimental results in no /etc/maas/dhcpd.conf | 19:53 |
roaksoax | smoser: tbh i hjave not been following up on what they've been doing with maas-dhcp | 19:53 |
roaksoax | smoser: but i'll audit | 19:53 |
smoser | ok. well i'm looking for a way to have a maas functional | 19:55 |
smoser | you suggested experimental | 19:55 |
smoser | that didn't work | 19:55 |
roaksoax | smoser: yeah so that means that's broken somehow | 19:58 |
roaksoax | smoser: i don't think the dhcp server is still functional | 19:58 |
roaksoax | clear | 20:04 |
smoser | matsubara, did you install maas-dhcp recently? | 20:08 |
matsubara | smoser, yes | 20:09 |
matsubara | well | 20:09 |
matsubara | found a bug with the package this morning | 20:09 |
matsubara | https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1060237 | 20:09 |
ubot5 | Ubuntu bug 1060237 in maas (Ubuntu) "apt-get install maas maas-dhcp maas-dns fails" [Undecided,New] | 20:09 |
smoser | matsubara, roak has a fix for that https://code.launchpad.net/~andreserl/maas/packaging_updateS_bzr1134/+merge/127570 | 20:10 |
smoser | matsubara, do you have notes available on how you install and configure? | 20:11 |
matsubara | smoser, I follow the checkbox tests and have a local note like this: https://pastebin.canonical.com/75751/ | 20:13 |
smoser | where are the checkbox tests? | 20:14 |
smoser | allenap, https://code.launchpad.net/~smoser/maas/trunk-remove-hostname-kludge/+merge/127571 | 20:19 |
roaksoax | smoser: you want me to integrate it inside maas-signal right? | 20:22 |
smoser | well, we want maas signal to call it. | 20:23 |
smoser | err.. | 20:23 |
smoser | the scripts there to call yours, and then post back the results | 20:23 |
roaksoax | smoser: ok | 20:23 |
roaksoax | smoser: i'm doing this too: http://paste.ubuntu.com/1256796/ | 20:23 |
smoser | well config probably shouldnt be executable | 20:26 |
smoser | but other than that i think it looks reasonable | 20:27 |
roaksoax | cool | 20:30 |
smoser | matsubara, how do i enable dhcp? | 20:35 |
matsubara | $ maas-cli api maas node-group-interfaces new master ip=192.168.21.1 interface=eth0 management=2 subnet_mask=255.255.255.0 broadcast_ip=192.168.21.255 router_ip=192.168.21.1 ip_range_low=192.168.21.10 ip_range_high=192.168.21.50 | 20:36 |
matsubara | smoser, ^ | 20:36 |
smoser | so where do you have those notes? | 20:36 |
smoser | ie, is this part of the "checkbox install" that you mentioned? | 20:36 |
matsubara | smoser, checkbox tests are linked in this doc: https://docs.google.com/a/canonical.com/document/d/1GNrJCL8EyfSw7ypCCYjH0BuIgIEDP2E6Y9Xbb7Gx8rs/edit | 20:37 |
matsubara | and my notes are in the pastebin | 20:37 |
matsubara | and I rely a lot in the shell history as well :-) | 20:37 |
smoser | this *really* needs to not be a private google doc | 20:37 |
smoser | matsubara, and how do you set up a maas user and such ? | 20:38 |
smoser | it seems like you probably have done a lot of things that i want to do | 20:39 |
smoser | and i'm just trying to avoid us both doing them. | 20:39 |
matsubara | sudo maas createadmin --username=admin --password=test --email=example@canonical.com | 20:39 |
roaksoax | smoser: any thoughts "1349210267.806 72 192.168.123.101 TCP_DENIED/403 3728 GET http://192.168.123.2/MAAS/static/images/amd64/generic/quantal/filesystem/filesystem.squashfs - NONE/- text/html" | 20:39 |
roaksoax | ? | 20:39 |
smoser | i would say you are being denied access to that | 20:39 |
smoser | :) | 20:39 |
smoser | check /var/log/apache/*.log | 20:40 |
smoser | (including error) | 20:40 |
roaksoax | smoser: lol yeah I mean, squid-deb-proxy doesn't allow the installer to download the squashfs image | 20:40 |
smoser | i suspect your being expected to oauth | 20:40 |
roaksoax | smoser: any thoughts on how can we fix it? | 20:40 |
roaksoax | smoser: i was thinking on telling squid-deb-proxy to allow access to the maas server in question | 20:40 |
roaksoax | by hacking on the packaging | 20:41 |
roaksoax | but maybe you know of a better way | 20:41 |
smoser | well why is that going through the proxy | 20:41 |
smoser | squid-deb-proxy is explicitly a *deb* proxy | 20:41 |
roaksoax | smoser: because we are telling the installer to use the proxy | 20:41 |
smoser | hm.. | 20:41 |
smoser | well that would seem like a bug one way or another | 20:42 |
smoser | either in that we're gelling it there is a generic proxy | 20:42 |
smoser | or in that it is assuming what we said it should use for an archive proxy it can use for other things | 20:42 |
smoser | but yeah, to fix that i guess you will probably have to have it allow proxying of /MAAS/static/images/* | 20:43 |
roaksoax | maybe squid itself is blocking it | 20:43 |
smoser | roaksoax, i thought htat is what you were saying | 20:47 |
smoser | i'm confused now. | 20:47 |
smoser | were you thinking maas was saying that? | 20:47 |
roaksoax | smoser: i meant squid3 itself (not the instance squid-deb-proxy spawns) | 20:47 |
roaksoax | smoser: but it is squid-deb-proxy | 20:48 |
roaksoax | if I add the IP address of the MAAS server facing that network, it allows it | 20:48 |
smoser | roaksoax, i'll be back in later tonight. (probably 3+ hours from now) | 21:05 |
allenap | roaksoax: I'm looking at use_squashfs_filesystem_2 now. | 21:07 |
roaksoax | smoser: alright, i'll be later here too | 21:08 |
roaksoax | allenap: awesome thank you! | 21:08 |
roaksoax | allenap: are you gonna be at UDS btw? | 21:25 |
allenap | roaksoax: Yeah, you? | 21:25 |
roaksoax | allenap: yeah!! is the rest of the team gonna be there? | 21:26 |
roaksoax | s/team/squad | 21:26 |
allenap | roaksoax: Yeah, I think we're all going. smoser, you at UDS? | 21:27 |
roaksoax | allenap: yeah all of our team is gonna be there | 21:27 |
allenap | Cool :) | 21:28 |
roaksoax | allenap: alright, so I hope you guys don't run away from Peruvian Pisco :P | 21:28 |
allenap | roaksoax: Oh god, I was given Pisco by Nicolas at the Barcelona UDS. I haven't been able to drink spirits since then. | 21:33 |
allenap | I'll give it a go though :) | 21:34 |
roaksoax | lol | 21:34 |
allenap | roaksoax: I've changed is_squashfs_image_present in lp:~allenap/maas/use_squashfs_filesystem_2, and updated the tests. It doesn't test the expansion of the templates, but I need to go and sleep now ;) | 21:35 |
roaksoax | allenap: is there any example? | 21:36 |
roaksoax | on how to do it? | 21:37 |
allenap | roaksoax: There are general tests for expansion, but not for the specific templates in contrib/... | 21:38 |
allenap | roaksoax: Don't worry about it. It's something we ought to address after 12.10. The confusion about inheritance means that we should revisit this stuff anyway. | 21:39 |
roaksoax | allenap: alright, cool | 21:39 |
roaksoax | thanks for helping out! | 21:39 |
allenap | roaksoax: Pull my branch (it directly follows on from yours), push it up, and I'll +1 that mp. | 21:39 |
allenap | roaksoax: So, sorry I didn't get to this yesterday. | 21:44 |
roaksoax | allenap: no worries :) | 21:44 |
roaksoax | allenap: thank you for helping tho | 21:44 |
roaksoax | allenap: pushed the changes to the MP!! thanks a lot again! and have a good night! | 21:47 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!