[01:02] hello === Omega is now known as bootri === james is now known as Guest60011 === r is now known as Guest20791 === _LibertyZero is now known as LibertyZero === msnsachin12 is now known as msnsachin === msnsachin12 is now known as msnsachin [10:55] on [10:55] hello [10:56] hi [10:56] can you tell me what's going on here [10:56] i don't find kim0 [10:57] so do I not find him, may timing [10:57] as you see he is not online yet [10:58] ohh...this was supposed to start at 4 right [11:00] I think the event is three hours from now [11:01] info [11:01] help === smspillaz|zzz is now known as smspillaz === ziviani is now known as JRBeer [15:02] hi guys [15:06] hi [15:06] Hi === sadeeb is now known as olutayo === olutayo is now known as sadeeb [15:39] is there a session on GlusterFS happening right now? [15:40] i didn't get the timezone conversion right, so i'm coming late to the party :/ [15:41] natea: what timezone are you? [15:41] EvilPhoenix: EST [15:41] if you read the schedule, it starts at 4PM [15:41] oh wait [15:41] that's GMT [15:41] * EvilPhoenix does the math [15:41] UTC... -0400... [15:42] oh [15:42] 12PMish [15:42] oh, it looks like it's not until 13:00 [15:42] according to http://www.timezoneconverter.com/ [15:42] "17.00 UTC Scaling shared-storage web apps in the cloud with Ubuntu & GlusterFS — semiosis" [15:43] 17.00 UTC -> 13.00 EST [15:43] grah my system isnt displaying times right [15:43] * EvilPhoenix shoots his system [15:43] 13:00 is about 1PM [15:43] * EvilPhoenix shall return after destroying his system [16:00] Howdy [16:01] Hello everyone, welcome to the very first Ubuntu Cloud Days [16:01] yay! === ChanServ changed the topic of #ubuntu-classroom to: Welcome to the Ubuntu Classroom - https://wiki.ubuntu.com/Classroom || Support in #ubuntu || Upcoming Schedule: http://is.gd/8rtIi || Questions in #ubuntu-classroom-chat || Event: Ubuntu Cloud Days - Current Session: Cloud Computing 101, Ask your questions - Instructors: kim0 [16:01] * ttx attends two conferences at once. [16:01] Logs for this session will be available at http://irclogs.ubuntu.com/2011/03/23/%23ubuntu-classroom.html following the conclusion of the session. [16:02] So again, good morning, good afternoon and good evening wherever you are [16:02] Please be sure you're joined to this channel plus [16:02] #ubuntu-classroom-chat : For Questions [16:03] In case you would like to ask a question [16:03] please start it with "QUESTION: [16:04] and write it down in the #ubuntu-classroom-chat channels [16:04] This session is mostly about taking questions and making sure everyone is well seated :) [16:04] Seems like I have a question already [16:05] EvilPhoenix asked: I think this could be the start of it. Could you give a brief explanation of what "Cloud Computing" is defined as? [16:05] Hi EvilPhoenix .. Good question indeed [16:06] Trying to answer your question .. I will begin by saying [16:06] Cloud has so many different definitions already :) [16:06] Almost all companies by bent it to mean whatever product they're selling [16:06] the term has really been abused [16:07] The are also various definitions by institutions like NIST and others [16:07] since there is no one single true definition .. I'll lay down some properties [16:07] that almost everyone agrees should be present in a "cloud" [16:08] 1- Pay per use .. Cloud are online resources that can be characterized by "pay per use" [16:08] you only pay for the resources that you need .. the storage you consume [16:08] the CPU/Memory compute capacity that you are using ..etc [16:09] You never really (or should never) pay in advance .. (just in case you need that resource) [16:09] 2- Instant scalability: Cloud solutions should be instantly scalable [16:09] that is .. with one api call (that's one command, or a click of a button for non programmers) [16:09] you should be able to allocate more resources [16:10] Clouds convey the feeling of inifinite scale .. of course in reality it's not truly infinite .. but it's large enough [16:11] 3- API programmability .. Most cloud solutions are going to have an API .. an API is a programmatic way to control your resources [16:11] Taking a prime example .. The largest commercial compute and storage cloud today is Amazon's AWS cloud [16:11] With Amazon's cloud, with an api call (or running a command) [16:12] you can instantly allocate "servers" [16:12] so it's got an API interface [16:12] it's scalable .. since you can always add more servers (or S3 storage) should you want to [16:12] and you only pay for the consumed CPU hours .. or gigabytes of storage [16:13] Clouds are usually split up by their type as well [16:13] IaaS , PaaS and SaaS [16:13] let me quickly comment on those types [16:13] IaaS : Infrastructure as a Service [16:14] This basically means you get "infrastructure" components (that is servers, storage space, networking ...etc" as as service .. [16:14] You use those to build your own cloud or application [16:14] PaaS : Moves a little up the value stack [16:14] It provides a complete development environment as a service [16:15] so you basically upload some code .. and without needing to worry about servers or networks/switches or storage ..etc [16:15] your application just runs on the "cloud" .. is scalable, is redundant [16:15] someone else (the PaaS provider) did that work for you [16:16] Examples of PaaS would be Google's AppEngine .. salesforce.com or others [16:16] The last type is SaaS : Software as a Service [16:16] This basically means providing a full complete application, that you are directly using in the cloud [16:16] examples of that would be facebook, gmail, twitter ..etc [16:17] Those are "applications" if you come to think of it .. more so than the notion of webpages [16:17] * kim0 checks if he has more questions [16:17] BluesKaj asked: ok then what Ubuntu Cloud about ? [16:18] Hi BluesKaj [16:18] Very good question as well [16:19] So Amazon's cloud is a very popular IaaS cloud. However, some people are not totally happy with the fact that they'd upload their data to amazon's datacenters [16:20] some enterprises or ISPs .. would like to utilize the improved economics of the cloud model [16:20] however still keeping their data and servers in-house (whatever that means to them) [16:20] In order to build a cloud that competes with Amazon's cloud [16:20] you need various software components [16:21] Ubuntu packages, integrates and makes available the best of breed open-source software [16:21] that enables you to build and operate your own cloud should you want to [16:21] In the upcoming 11.04 natty release [16:21] Ubuntu packages two open-source complete cloud stacks [16:21] those would be [16:22] - Ubuntu Enterprise Cloud : An Ubuntu integrated and polished cloud stack based on the popular Eucalyptus stack [16:22] - OpenStack : A new opensource cloud stack that's gaining a lot of popularity [16:22] Actually we have dedicated sessions for each of those cloud stacks! [16:23] An interesting fact .. is that UEC and OpenStack both allow you to expose an API that is the equivalent of Amazon's API [16:23] that means you can use the same management tools to control both the public (Amazon's ) cloud and your own private one! [16:24] This is also great for providers wanting to run their own clouds [16:24] so that was an overview of the cloud stacks available to enable [16:24] you to build your own cloud envrionment [16:25] Other than that .. and to fully answer the question of "What is ubuntu cloud" .. I need to add a few more points [16:25] Ubuntu makes available official Ubuntu images that run on the Amazon cloud as well [16:25] You can check them out (as they're regularly updated) on http://cloud.ubuntu.com/ami/ [16:26] you basically search for what you want, like (maverick 64 us-east) pick the ami-id [16:26] and launch that [16:27] Also Canonical makes available Landscape a cloud management tool .. you can check it out at https://landscape.canonical.com/ [16:27] Also, Ubuntu is soon unleashing cloud management and orchestration tool called "ensemble" [16:28] that is going to revolutionize cloud deployments and management .. it's still in early tech-preview stage [16:28] however we're having an ensemble session and demo today [16:28] I think that mostly covers a broad definitions of ubuntu and cloud [16:28] Kruptein asked: so dropbox isn't cloud related? as you don't have to pay for it (basic) [16:29] Hi Kruptein [16:29] Well .. dropbox is cloud storage indeed [16:29] I meant that with cloud .. when you want to grow you pay for what you used/need [16:30] as opposed to buying a 1TB disk that lays on your desk so that when you need the capacity it'll be available for you === cmagina is now known as cmagina-lunch [16:30] with dropbox you pay for what you use .. although I believe they only allow payment in coarse packages [16:30] as opposed to Amazon's S3 which charges you per GB of storage per month [16:31] which is a more fine grained model [16:31] BluesKaj asked: ok then what is Ubuntu Cloud about ? [16:31] So I believe we covered that [16:31] To quickly recap [16:31] - Building your own private cloud : UEC/Eucalyptus or OpenStack [16:32] - Running over the Public Amazon Cloud : Official Ubuntu Server images http://cloud.ubuntu.com/ami/ [16:32] - Systems Management tools : https://landscape.canonical.com/ [16:32] - Infrastructure automation : Ensemble (tech-preview) [16:33] Again all of those tools and technologies (except for landscape) are having their own sessions that you'll enjoy :) [16:34] Let me not forget as well about "Ubuntu ONE" [16:34] a personal storage cloud (very similar to dropbox) [16:34] Check it out at https://one.ubuntu.com/ [16:34] popey asked: Should your average end-user care about Ubuntu cloud? If so, why? If not, what do we say to end users when they see all this promotion of Ubuntu cloud stuff? [16:34] Hi popey [16:35] Great question [16:35] It really depends on your point of view [16:36] The usual-suspects to care about "cloud" stuff are going to be sys-admins, devops, IT professionals .. people who care about server environments and such .. However! [16:36] If you ask me, yes non IT pros should care as well [16:37] because the computing model is quickly shifting to a cloud model [16:37] that is .. instead of you buying a pc, loading it with your personal applications and settings [16:37] and being a sysadmin for yourself .. handling backups .. troubleshooting, software upgrades ..etc [16:38] the world is shifting into an ipad/iphone/thin-client/mobile devices world [16:38] where your data lives on a cloud [16:38] is accessible by a wide varierty of tools [16:38] and all tools sync up together [16:39] obviously the point of interest is going to be different, however it remains that the cloud touches all of us [16:39] cdbs asked: The Clous world is buzzing about OpenStack. Natty will include support for OpenStack along with Eucalyptus. Once OpenStack Nova becomes stable enough (should happen soon, by May) then will Ubuntu begin recommending OpenStack for its cloud offerings? [16:40] Hi cdbs [16:40] Seems you're on top of things hehe I can't really claim to foresee the future. Ubuntu is and has always aimed at providing the best of class open-source cloud technologies and software [16:41] As it stands, UEC product is based on Eucalyptus bec it is a mature product [16:41] however since openstack is rapidly maturing, it has been packaged and made available as well [16:42] I am confident Ubuntu will continue to make available all mature choices of best of breed software [16:42] Yuvi_ asked: you can differentiate between public cloud and private cloud? [16:42] Hi Yuvi_ [16:42] Well, yeah I guess [16:43] Public clouds are cloud operated by an entity you don't control [16:43] and that provide services to multiple other tenants [16:43] examples would be Amazon cloud, rackspace, go-grid, terremark ...etc [16:43] A private cloud, is a cloud that probably runs behind your firewall on your own servers [16:44] and that you can control, i.e. is operated by IT people you have direct influence upon [16:44] at141am asked: Is the demo open to all for ensemble, if so when and where? [16:44] Hi at141am [16:44] Yes absolutely! [16:45] The Ensemble session is today in less than a couple of hours [16:45] right here in this same channel [16:46] The session leader is probably going to be copy/past'ing text so that you can follow up the demo [16:46] I'm not really sure how it would go .. but I'm sure it's gonna be loads of fun [16:46] marenostrum asked: What does "Ubuntu One" have to do something with "cloud" concept? [16:46] Hi marenostrum [16:47] Ubuntu ONE is a personal cloud service [16:47] It is designed for end-users .. that is non IT pros [16:47] It provides services to sync your files and folders to the cloud [16:47] sharing them to other people [16:47] not only that .. but also [16:47] sync's your "notes" across multiple machines [16:47] your music [16:47] Bookmarks [16:48] I think soon it might sync application settings and the apps installed [16:48] so that when you get a new Ubuntu machines .. it installs all your applications, applies all settings, syncs your data/notes/bookmarks ..etc [16:48] that would be lovely indeed .. I'm not sure if it can do all that just yet thought [16:48] though* [16:49] sveiss asked: do the official Ubuntu EC2 images receive updates? Specifically kernel updates, which are a bit of a pain to deal with via apt-get on boot. [16:49] * kim0 trying to answer questions quickly :) [16:49] Hi sveiss [16:49] The answer is absolutely YES [16:49] they do receive regular updates [16:49] of course you can always apt-get upgrade them any way [16:50] the one potential pain point .. is the one you have mentioned "kernel upgrades" [16:50] for that .. I've some good news [16:50] Newer AMIs are designed to use pv-grub [16:50] which is a method exposed by Amazon to load the kernel from inside the image [16:51] which means .. you can now apt-get upgrade your kernel .. and very simply reboot into it [16:51] There are 10 minutes remaining in the current session. [16:51] if you need to know which exact version switched to pvgrub .. check in at #ubuntu-cloud [16:51] IdleOne asked: Repost for AndrewMC :What would be the benifits of using the "cloud" instead of, say a dedicated server? [16:52] Hi IdleOne [16:52] the main benefits is really [16:52] - Pay per use .. I might need ten servers today .. but only one tomorrow .. cloud allows that .. dedicated servers don't (you'd have to buy 10 servers all the time) [16:53] - flexibility .. If we web application gets slashdotted .. and the load is too high .. within a few seconds .. I can spin up 20 extra cloud servers to handle the load [16:53] - Also .. since almost all clouds provide an extensive API [16:54] it really helps with IT automation .. spin up servers, assign them IPs, attach storage to them, mount a load balancer on top [16:54] all by running a script .. not by running around connection cables :) [16:54] Yuvi_ asked: What is hybrid cloud? Under which scenario we can use that [16:54] A hybrid cloud is a mix of public + private [16:55] a typical use case would be [16:55] you prefer running everything on a private cloud that you own and operate [16:55] *however* should the incoming load by too high [16:55] like your application was slashdotted [16:56] you would dynamically "expand" to using a public cloud like amazon/rackspace [16:56] to take some heat for you .. to lessen the load on your servers [16:56] There are 5 minutes remaining in the current session. [16:56] You can pull off something like that today with UEC and some smart scripts [16:56] chadadavis asked: what advantage does a private cloud provide, vs a traditional server cluster, assuming that then the sysadmin work is not outsourced? [16:56] running out of time .. [16:57] trying to quickly answer [16:57] well basically it's the same concept of public cloud [16:57] Benefits would be [16:57] - Complete infrastructure automation [16:57] - Enabling "teams" to handle their own needs .. a team would spin up/down servers according to their needs [16:57] lessening the load on IT staff [16:58] also .. "pooling" of IT servers into one private cloud [16:58] means providing a better service to everyone [16:58] since everyone can use some of the resources when they need it [16:58] so in short .. pooling, self service, low overhead, spin up/down [16:59] Great [16:59] Seems like I did manage to bust all questions :) [16:59] If anyone would like to get a hold of me afterwards [16:59] I am always hanging out in #ubuntu-cloud [17:00] you can ping me any time and I will get back to you once I can [17:00] The next session is by semiosis [17:00] o/ === cmagina-lunch is now known as cmagina [17:00] Using gluster to scale .. very intersting stuff! [17:00] I love scalable file systems :) [17:00] Thanks kim0 [17:00] Hello everyone [17:01] This Ubuntu Cloud Days session is about scaling legacy web applications with shared-storage requirements in the cloud. [17:01] I should mention up front that I'm neither an official nor an expert, I don't work for Amazon/AWS, Canonical, Gluster, Puppet Labs, or any other software company. [17:01] I'm just a linux sysadmin who appreciates their work and wanted to give back to the community. === ChanServ changed the topic of #ubuntu-classroom to: Welcome to the Ubuntu Classroom - https://wiki.ubuntu.com/Classroom || Support in #ubuntu || Upcoming Schedule: http://is.gd/8rtIi || Questions in #ubuntu-classroom-chat || Event: Ubuntu Cloud Days - Current Session: Scaling shared-storage web apps in the cloud with Ubuntu & GlusterFS - Instructors: semiosis [17:01] Logs for this session will be available at http://irclogs.ubuntu.com/2011/03/23/%23ubuntu-classroom.html following the conclusion of the session. [17:01] My interest is in rapidly developing a custom application hosting platform in the cloud. I'd like to avoid issues of application design by assuming that one is already running and can't be overhauled to take advantage of web storage services. [17:02] I'll follow the example of migrating a web site powered by several web servers and a common NFS server from a dedicated hosting environment to the cloud. In fact this is something I've been working on lately, as I think others are as well. [17:02] I invite you to ask questions throughout the session. I had a lot of questions when I began working on this problem, but finding answers was very time-consuming and sometimes impossible. [17:02] My background is in Linux system administration in dedicated servers & network appliances, and I just started using EC2 six months ago. I'll try to keep my introduction at a high level, and assume some familiarity with standard Linux command line tools and basic shell scripting & networking concepts, and the AWS Console. [17:02] Some of the advanced operations will also require euca2ools or AWS command line tools (or the API) because they're not available in the AWS Console. [17:03] Cloud infrastructure and configuration automation are powerful tools, and recent developments have brought them within reach of a much wider audience. It is easier than ever for Linux admins who are not software developers to get started running applications in the cloud. [17:03] I've standardized my platform on Ubuntu 10.10 in Amazon EC2, using GlusterFS to replace a dedicated NFS server, and CloudInit & Puppet to automate system provisioning and maintenance. [17:04] GlusterFS has been around for a few years, and its major recent development (released in 3.1) is the Elastic Volume Manager, a command-line management console for the storage cluster. This utility controls the entire storage cluster, taking care of server setup and volume configuration management on servers & clients. [17:04] Before the EVM a sysadmin would need to tightly manage the inner details of configuration files on all nodes, now that burden has been lifted enabling management of large clusters without requiring complex configuration management tools. Another noteworthy recent development in GlusterFS is the ability to add storage capacity and performance (independently if necessary) while the cluster is online and in use. [17:04] I'll spend the rest of the session talking about providing reliable shared-storage service on EC2 with GlusterFS, and identifying key issues that I've encountered so far. I'd also be happy to take questions generally about using Ubuntu, CloudInit, and Puppet in EC2. Let's begin. [17:05] There are two types of storage in EC2, ephemeral (instance-store) and EBS. There are many benefits to EBS: durability, portability (within an AZ), easy snapshot & restore, and 1TB volumes; the drawback of EBS is occasionally high latency. [17:05] Ephemeral storage doesn't have those features, but it does provide more consistent latency, so it's better suited to certain workloads. [17:05] I use EBS for archival and instance-store for temporary file storage. And I can't recommend enough the importance of high-level application performance testing to determine which is best suited for your application. [17:05] GlusterFS is an open source scale-out filesystem. It's developed primarily by Gluster and has a large and diverse user community. I use GlusterFS on Ubuntu in EC2 to power a web service. [17:06] What I want to talk about today is my experience setting up and maintaining GlusterFS in this context. [17:06] First I'll introduce glusterfs architecture and terminology. Second we'll go through some typical cloud deployments, using instance-store and EBS for backend storage, and considering performance and reliability characteristics along the way. [17:06] I'll end the discussion then with some details about performance and reliability testing and take your questions. [17:07] I think some platform details are in order before we begin. [17:07] I use the Ubuntu 10.10 EC2 AMIs for both 32-bit and 64-bit EC2 instances that were released in January 2011. You can find these AMIs at the Ubuntu Cloud Portal AMI locator, http://cloud.ubuntu.com/ami/. [17:07] I configure my instances by providing user-data that cloud-init uses to bootstrap puppet, which handles the rest of the installation. Puppet configures my whole software stack on every system except for the glusterfs server daemon, which I manage with the Elastic Volume Manager (gluster command.) [17:07] I've deployed and tested several iterations of my platform using this two-stage process and would be happy to take questions on any of these technologies. [17:07] Unfortunately the latest version of glusterfs, 3.1.3, is not available in the Ubuntu repositories. There is a 3.0 series package but I would recommend against using it. [17:08] I use a custom package from my PPA which is derived from the Debian Sid source package, with some metadata changes that enable the new features in 3.1, my Launchpad PPA's location is ppa:semiosis/ppa. [17:08] Gluster also provides a binary deb package for Ubuntu, which has been more rigorously tested than mine. You can find the official downloads here: http://download.gluster.com/pub/gluster/glusterfs/LATEST/ [17:08] You can also download and compile the latest source code yourself from Github here: https://github.com/gluster/glusterfs [17:08] Now I'd like to begin with a quick introduction to GlusterFS 3.1 architecture and terminology. [17:09] EvilPhoenix asked: repost for marktma: any consideration for using Chef instead of Puppet? [17:09] i chose puppet because it seemed to be best integrated with cloud-init, it's mature, and has a large user community [17:10] kim0 asked: Could you please mention a little intro about cloud-init [17:11] CloudInit bootstraps and can also configure cloud instances. This enables a sysadmin to use the standard AMI for different purposes, without having to build a custom AMI or rebundle to make changes. [17:11] CloudInit takes care of setting the system hostname, installing the master SSH key and evaluating the userdata from EC2 metadata. That last part, evaluating the userdata, is the most interesting. [17:11] It allows the sysadmin to supply a brief configuration file (called cloud-config), shell script, upstart job, python code, or a set of files or URLs containing those, which will be evaluated on first boot to customize the system. [17:12] CloudInit even has built-in support for bootstrapping Puppet agents, which as I mentioned was a major deciding factor for me [17:13] Now getting back to glusterfs terminology and architecture... [17:13] Of course there are servers and there are clients. With version 3.1 there came the option to use NFS clients to connect to glusterfs servers in addition to the native glusterfs client based on FUSE. [17:13] Most of this discussion will be about using native glusterfs clients, but we'll revisit NFS clients briefly at the end if theres time. I havent use the NFS capability myself because I think that the FUSE client's "client-side" replication is better suited to my application [17:13] Servers are setup in glusterfs 3.1 using the Elastic Volume Manager, or gluster command. It offers an interactive shell as well as a single-executable command line interface. [17:14] In glusterfs, servers are called peers, and peers are joined into (trusted storage) pools. Peers have bricks, which are just directories local to the server. Ideally each brick is its own dedicated filesystem, usually mounted under /bricks. [17:14] natea asked: Given the occasional high latency of EBS, do you recommend it for storing database files, for instance PostgreSQL? [17:15] my focus is hosting files for web, not database backend storage. people do use glusterfs for both, but I haven't evaluated it in the context of database-type workloads, YMMV. [17:15] as for performance, I'll try to get to that in the examples coming up [17:16] natea asked: Can you briefly explain the differences between GlusterFS and NFS and why I would choose one over the other? [17:17] simply put, NFS is limited to single-server capacit, performance and reliability, while glusterfs is a scale out filesystem able to exceed the performance and/or capacity of a single server (independently) and also provides server-level redundancy [17:18] there are some advanced features NFS has that glusterfs does not yet support (UID mapping, quotas, etc.) so please consider that when evaluating your options [17:18] Glusterfs uses a modular architecture, in which “translators” are stacked in the server to export bricks over the network, and in clients to connect the mount point to bricks over the network. These translators are automatically stacked and configured by the Elastic Volume Manager when creating volumes (under /etc/glusterd/vols). [17:18] A client translator stack is also created and distributed to the peers which clients retrieve at mount-time. These translator stacks, called Volume Files (volfile) are replicated between all peers in the pool. [17:19] A client can retrieve any volume file from any peer, which it then uses to connect to directly to that volume's bricks. Every peer can manage its own and every other peer's volumes, it doesn't even need to export any bricks. [17:19] There are two translators of primary importance: Distribute and Replicate. These are used to create distributed or replicated, or distributed-replicated volumes. [17:19] In the glusterfs 3.1 native architecture, servers export bricks to clients, and clients handle all file replication and distribution across the bricks. [17:19] All volumes can be considered distributed, even those with only one brick, because the distribution factor can be increased at any time without interrupting access (through the add-brick command). [17:19] The replication factor however can not be changed (data needs to be copied into a new volume). [17:19] In general, glusterfs volumes can be visualized as a table of bricks, with replication between columns, and distribution over rows. [17:20] So a volume with replication factor N would have N columns, and bricks must be added in sets (rows) of N at a time. [17:20] For example, when a file is written, the client first figures out which replication set the file should be distributed to (using the Elastic Hash Algorithm) then writes the file to all bricks in that set. [17:20] Some final introductory notes... First as a rule nothing should ever touch the bricks directly, all access should go through the client mount point. [17:20] Second, all bricks should be the same size, which is easy with using dedicated instance-store or EBS bricks. [17:20] Third, files are stored whole on a brick, so not only can't volumes store files larger than a brick, but bricks should be orders of magnitude larger than files in order to get good distribution. [17:21] Now I'd like to talk for a minute about compiling glusterfs from source on Ubuntu. This is necessary if one wants to use glusterfs on a 32-bit system, since Gluster only provides official packages for 64-bit. [17:21] (as a side note, the packages in my PPA are built for 32-bit, but they are largely untested, i have only begun testing the 32 bit builds myself yesterday, and although it's going well so far, YMMV) [17:22] Compiling glusterfs is made very easy by the use of standard tools. [17:22] First, some required packages need to be installed, these are: gnulib, flex, byacc, gawk, libattr1-dev, libreadline-dev, libfuse-dev, and libibverbs-dev. [17:22] After installing these packages you can untar the source tarball and run the usual “./configure; make; make install” sequence to build & install the program. [17:22] By default, this will install most of the files under /usr/local, with the notable exceptions of the initscript placed in /etc/init.d/glusterd, the client mount script placed in /sbin/mount.glusterfs, and the glusterd configuration file /etc/glusterfs/glusterd.vol. [17:23] (thats a static config file which you'll never need to edit, btw) [17:23] If you wish to install to another location (using for example ./configure –prefix=/opt/glusterfs) make sure those three files are in their required locations. [17:23] Once installed, either from source or from a binary package, the server can be started with “server glusterd start”. This starts the glusterd management daemon, which is controlled by the gluster command. [17:23] The glusterd management daemon takes care of associating servers, generating volume configurations (for servers & clients,) and managing the brick export daemon (glusterfsd) processes. Clients that only want to mount glusterfs volumes do not need the glusterd service running. [17:24] Another packaging note... the official deb package from Gluster is a single binary package that installs the full client & server, but the packages in my PPA are derived from the Debian Sid packages, which provide separate binary pkgs for server, client, libs, devel, etc allowing for a client-only installation [17:25] Now, getting back to glusterfs architecture, and setting up a trusted storage pool... [17:25] Setting up a trusted storage pool is also very straightforward. I recommend using hostnames or FQDNs, rather than IP addresses, to identify the servers. [17:26] FQDNs are probably the best choice, since they can be updated in one place (the zone authority) and DNS takes care of distributing the update to all servers & clients in the cluster, whereas with hostnames, /etc/hosts would need to be updated on all machines [17:26] Servers are added to pools using the 'gluster peer probe ' command. A server can only be a member of one pool, so attempting to probe a server that is already in a pool will result in an error. [17:26] To add a server to a pool the probe must be sent from an existing server to the new server, not the other way. When initially creating a trusted storage pool, it's easiest to use one server to send out probes to all of the others. [17:26] remib asked: Would you recommend using separate glusterfs servers or use the webservers both as glusterfs server/client? [17:28] excellent question! there are benefits to both approaches. Without going into too much detail, read-only can be done locally but there are some reasons to do writes from seperate clients if those clients are going to be writing to the same file (or locking on the same file) [17:30] there's a slight chance for coherency problems if the client-servers lose connectivity to each other, and writes go to the same file on both... that file will probably not be automatically repaired, but that's an edge case that may never happen in yoru application. testing is very important [17:30] thats called a split-brain in glusterfs terminology [17:31] writes can go to different files under that partition condition just fine, it's only an issue if the two server-clients update the same file and they're not synchronized [17:31] and i dont even know if network partitions are likely in EC2, it's just a theoretical concern for me at this point, so go forth an experiment! [17:32] When initially creating a trusted storage pool, it's easiest to use one server to send out probes to all of the others. [17:32] As each additional server joins the pool it's hostname (and other information) is propagated to all of the previously existing servers. [17:32] One cautionary note, when sending out the initial probes, the recipients of the probes will only know the sender by its IP address. [17:32] To correct this, send a probe from just one of the additional servers back to the initial server – this will not change the structure of the pool but it will propagate an IP address to hostname update to all of the peers. [17:32] From that point on any new peers added to the pool will get the full hostname of every existing peer, including the peer sending the probe. [17:33] kim0 asked: What's your overall impression of glusterfs robustness and ability to recover from split-brains or node failures [17:34] it depends heavily on your application's workload, for my application it's great, but Your Mileage May Vary. this is the biggest concern with database-type workloads, where you would have multiple DB servers wanting to lock on a single file [17:34] but for regular file storage i've found it to be great [17:34] and of course it depends also a great deal on the cloud-provider's network, not just glusterfs... [17:35] resolving a split-brain issue is relatively painless... just determine which replica has the "correct" version of the file, and delete the "bad" version from the other replica(s) and glusterfs will replace the deleted bad copies with the good copy and all futhre access will be synchronized, so it's usually not a big deal [17:36] natea asked: Is the performance of GlusterFS storage comparable to a local storage? What are the downsides? [17:37] that sounds like a low-level component performance question, and I recommend concentrating on high-level aggregate application throughput. [17:37] i'll get to that shortly talking about the different types of volumes [17:37] Once peers have been added to the pool volumes can be created. But before creating the volumes it's important to have set up the backend filesystems that will be used for bricks. [17:38] In EC2 (and compatible) cloud environments this is done by attaching a block device to the instance, then formatting and mounting the block device filesystem. [17:38] Block devices can be added at instance creation time using the EC2 command ec2-run-instances with the -b option. [17:38] EBS volumes are specified for example with -b /dev/sdd=:20 where /dev/sdd is the device name to use, and :20 is the size (in GB) of the volume to create. [17:38] Glusterfs recommends using ext4 filesystems for bricks since it has good performance and is well tested. [17:38] As I mentioned earlier, the two translators of primary importance are Distribute and Replicate. All volumes are Distributed, and optionally also Replicated. [17:39] Since volumes can have many bricks, and servers can have bricks in different volumes, a common convention is to mount brick filesystems at /bricks/volumeN. I'll follow that convention in a few common volume configurations to follow. [17:39] The first and most basic volume type is a distributed volume on one server. This is essentially unifying the brick filesystems to make a larger filesystem. [17:39] Remember though that files are stored whole on bricks, so no file can exceed the size of a brick. Also please remember that it is a best-practice to use bricks of equal size. So, lets consider creating a volume of 3TB called “bigstorage”. [17:39] We could just as easily use 3 EBS bricks of 1TB each, 6 EBS bricks of 500GB each, or 10 EBS bricks of 300GB each. Which layout to use depends on the specifics of your application, but in general spreading files out over more bricks will achieve better aggregate throughput. [17:40] so even though the performance of a single brick is not as good as a local filesystem, spreading over several bricks can achieve comparable aggreagate throughput [17:40] Assuming the server's hostname is 'fileserver', the volume creation command for this would be simply “gluster volume create bigstorage fileserver:/bricks/bigstorage1 fileserver:/bricks/bigstorage2 … fileserver:/bricks/bigstorageN”. [17:40] This trivial volume which just unifies bricks on a single server has limited performance scalability. In EC2 the network interface is usually the limiting factor, and although in theory a larger instance will have a chance at a larger slice of the network interface bandwidth, in practice I have found that this usually exceeds the bandwidth available on the network. [17:40] And by this I mean what I've found is that larger instances do not get much more bandwidth to EBS or other instances (going beyond Large instance anyway, i'm sure smaller instances could get worse but haven't really evaluated them.) [17:41] Glusterfs is known as a scale-out filesystem, and this means that performance and capacity can be scaled by adding more nodes to the cluster, rather than increasing the size of individual nodes. [17:41] neti asked: Is GLusterFS using local caching in memory? [17:42] yes it does do read-caching and write-behind caching, but I leave their configuration at the default, please check out the docs at gluster.org for details, specifically http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Setting_Volume_Options [17:43] Glusterfs is known as a scale-out filesystem, and this means that performance and capacity can be scaled by adding more nodes to the cluster, rather than increasing the size of individual nodes. [17:43] So the next example volume after 'bigstorage' should be 'faststorage'. With this volume we'll combine EBS bricks in the same way but using two servers. [17:43] First of course a trusted storage pool must be created by probing from one server (fileserver1) to the other (fileserver2) by running the command 'gluster peer probe fileserver2' on fileserver1, then updating the IP address of fileserver1 to its hostname by running 'gluster peer probe fileserver1' on fileserver2. [17:43] After that, the volume creation command can be run, 'gluster volume create faststorage fileserver1:/bricks/faststorage1 fileserver2:/bricks/faststorage2 fileserver1:/bricks/faststorage3 fileserver2:/bricks/faststorage4 ...” where fileserver1 gets the odd numbered bricks and fileserver2 gets the even numbered bricks. [17:43] In this example there can be an arbitrary number of bricks. Because files are distributed evenly across bricks, this has the advantage of combining the network performance of the two servers. [17:44] (interleaving the brick names is just my convention, it's not required and you're free to use any convention you'd like) [17:44] kim0 asked: Since you have redudancy through replication, why not use instance-store instead of ebs [17:46] ah I was just about to get into replication, great timing. in short, you can, and I do! instance-store has consistent latency going for it, but EBS volumes can be larger, can be snapshotted & restored, and can be moved between instances (within an availability zone) so that makes managing your data much easier [17:46] Now I'd like to shift gears and talk about reliability. [17:46] In glusterfs clients connect directly to bricks, so if one brick goes away its files become inaccessible, but the rest of the bricks should still be available. Similarly if one whole server goes down, only the files on the bricks it exports will be unavailable. [17:46] This is in contrast to RAID striping where if one device goes down, the whole array becomes unavailable. This brings us to the next type of volume, distributed-replicated. In a distributed- replicated volume as I mentioned earlier files are distributed over replica sets. [17:46] Since EBS volumes are already replicated in the EC2 infrastructure it should not be necessary to replicate bricks on the same server. [17:47] In EC2 replication is best suited to guard against instance failure, so its best to replicate bricks between servers. [17:47] The most straightforward replicated volume would be one with two bricks on two servers. [17:47] By convention these bricks should be named the same, so for a volume called safestorage the volume create command would look like this, “gluster volume create safestorage replica 2 fileserver1:/bricks/safestorage1 fileserver2:/bricks/safestorage1 fileserver1:/bricks/safestorage2 fileserver2:/bricks/safestorage2 ...” [17:47] Bricks must be added in sets of size equal to the replica count, so for replica 2, bricks must be added in pairs. [17:47] Scaling performance on a distributed-replicated volume is similarly straightforward, and similar to adding bricks, servers should also be added in sets of size equal to the replica count. [17:47] So, to add performance capacity to a replica 2 volume, two more server should be added to the pool, and the volume creation command would look like this, “gluster volume create safestorage replica 2 fileserver1:/bricks/safestorage1 fileserver2:/bricks/safestorage1 fileserver3:/bricks/safestorage2 fileserver4:/bricks/safestorage2 fileserver1:/bricks/safestorage3 fileserver2:/bricks/safestorage3 fileserver3:/bricks/ [17:47] safestorage4 fileserver4:/bricks/safestorage4...” [17:48] Up to this point all of the examples involve creating a volume, but volumes can also be expanded while online. This is done with the add-brick command, which takes parameters just like the volume create command. [17:48] Bricks still need to be added in sets of size equal to the replica count though. [17:49] also note, the "add-brick" operation requires a "rebalance" to spread existing files out over the new bricks, this is a very costly operation in terms of CPU & network bandwidth so you should try to avoid it. [17:50] A similar but less costly operation is "replace-brick" which can be used to move an existing brick to a new server, for example to add performance with the addition of new servers without adding capacity [17:51] There are 10 minutes remaining in the current session. [17:51] another scaling option is to use EBS bricks smaller than 1TB, and restore from snapshots to 1TB bricks. this is an advanced technique requriring the ec2 command ec2-create-vol & ec2-attach-vol [17:52] Well looks like my time is running out, so I'll try to wrap things up. please ask any questions you've been holding back! [17:53] Getting started with glusterfs is very easy, and with a bit of experimentation & performance testing you can have a large, high throguhput file storage service running in the cloud. Best of all in my opinion is the ability to snapshot EBS bricks with the ec2-create-image API call/command which is also available in the AWS console [17:53] kim0 asked: Did you evaluate ceph as well [17:54] I am keeping an eye on ceph, but it seemed to me that glusterfs is already well tested & used widely in production, even if not yet used widely in the cloud... it sure will be soon [17:54] neti asked: Is GlusterFS Supporting File Locking? [17:55] yes glusterfs supports full POSIX semantics including file locking [17:56] one last note about snapshotting EBS bricks... since bricks are regular ext4 filesystems, they can be restored from snapshot & read just like any other EBS volume, no hassling with mdadm or lvm to reassemble volumes like with RAID [17:56] remib asked: Does GlusterFS support quota's? [17:56] There are 5 minutes remaining in the current session. [17:57] no quota support in 3.1 [17:58] Thank you all so much for the great questions. I hope you have fun experimenting with glusterfs, I think it's a very exciting technology. One final note for those of you who may be interested in commercial support... [17:59] Gluster Inc. has recently released paid AMIs for Amazon EC2 and Vmware that are fully supported by the company. I've not used these, but they are there for your consideration. [18:00] The glusterfs community is large and active. I usually hang out in #gluster which is where I've learned the most about glusterfs. There's a lot of friendly and knowledgeable people there, as well as on the mailing list, who enjoy helping out beginners [18:00] thanks again! === ChanServ changed the topic of #ubuntu-classroom to: Welcome to the Ubuntu Classroom - https://wiki.ubuntu.com/Classroom || Support in #ubuntu || Upcoming Schedule: http://is.gd/8rtIi || Questions in #ubuntu-classroom-chat || Event: Ubuntu Cloud Days - Current Session: What is Ensemble? - Presentation and Demo - Instructors: SpamapS [18:01] Logs for this session will be available at http://irclogs.ubuntu.com/2011/03/23/%23ubuntu-classroom.html following the conclusion of the session. [18:02] So, I have prepared a short set of slides to try and explain what Ensemble is here: http://spamaps.org/files/Ensemble%20Presentation.pdf [18:02] I will elaborate here in channel. [18:03] Ensemble is an implementation of Service Management [18:03] up until now this has also been called "Orchestration", and the term is not all that inaccurate, though I feel that Service Management is more appropriate [18:03] "What is Service Management?" [18:04] Service Management is focused on the things that servers do that end users consume [18:04] Users connect to websites, dns servers, or (at a lower level) databases, cache services, etc [18:04] Ensemble models how services relate to one another. [18:05] Web applications need to connect to a number of remote resources. Load balancers need to connect to web application servers.. monitoring services need to connect to services and test that they're working. [18:05] Ensemble models all of these in what we call "formulas" (more on this later) [18:06]