Randy Bias' Cloudscaling
Leaders in the Cloud: OpenStack
“People don’t realize that lock-in actually occurs at the architectural level, not at the API- or hypervisor-level”
Randy Bias discusses the impact of OpenStack on the cloud computing ecosystem with sandhill.com’s Kamesh Pemmaraju.
Kamesh outlines the current state of cloud adoption, OpenStack and looks ahead.
Interacting with your Swift install
Once you have swift services running, you are going to want to interact with them. (You can have them running in minutes if you want: swift-solo)
Most of the currently available tools that interact with Rackspace’s Cloudfiles are hardcoded to that API endpoint, so even though the APIs are virtually identical, most the current releases don’t work with swift yet. Since we needed something for testing and demonstrations, we’ve been patching some of the third party projects to store and retrieve files with swift. Here’s a couple quick examples using Cyberduck, a GUI program, and Fog, a Ruby gem, from the command line.
Setting up SwiftAll of the tools expect interaction with SSL enabled servers, so we will need to install and enable that within swift. In the swift-solo repo, edit chef/cookbooks/swift/attributes/swift.rb and enable ssl:
default[:swift][:proxy_server][:use_ssl] = true
default[:swift][:auth_server][:use_ssl] = true
default[:swift][:repository][:url] = "http://github.com/cloudscaling/swift.git"
default[:swift][:repository][:tag] = "cyberduck"
ubuntu@host:~/swift-solo$ sudo swift-auth-create-account account user password
https://example.com:8080/v1/338b6b2d-5137-40b4-9b95-9106a0d4db52
ubuntu@host: st -A https://127.0.0.1:11000/v1.0 -U account:username -K password stat
Account: 338b6b2d-5137-40b4-9b95-9106a0d4db52
Containers: 0
Objects: 0
:default:
:rackspace_api_key: password
:rackspace_username: account:user
:rackspace_auth_url: example.com:11000
username@host$ fog
Welcome to fog interactive!
>> f = Fog::Rackspace::Files.new(Fog.credentials)
>> f.put_container("test_container")
>> f.get_container("test_container")
>> f.directories
<Fog::Rackspace::Files::Directories
[
<Fog::Rackspace::Files::Directory
key="testcontainer",
bytes=10,
count=1
>
]
>
UPDATE:
The lastest builds of Cyberduck now support OpenStack
OpenStack: Swift Barn Raising
At Cloudscaling, we’re getting closer to putting OpenStack into production services. This is the first installment chronicling that journey and opening some of the tools that we are developing along the way.
OpenStack is a cloud computing framework for building infrastructure as a service, and Swift is a subproject which provides a scalable distributed object store. We have already expressed opinions about how a community of service providers all running an open platform would do to the cloud computing ecosystem (think Linux kernel of cloud). To support our own work, we try to make it as simple and easy as possible to build and configure clusters. We want to make it simple and easy for everyone else too. Today we are releasing ‘swift-solo‘ on github, which is essentially a chef automated install of the instructions for setting up an Ubuntu 10 VM for swift development. (We took inspiration from Opscode’s OpenStack Swift cookbook, but you’ll see a lot of things have been split out and extended. Thanks to Adam Jacob for those cookbooks and chef.)
The way it works now, after installing git and chef, one command will deploy and start swift on a single machine. The chef recipe can be configured to sync code from a git or bzr repository so you can quickly deploy changes in your development environment. There are also scripts to setup testing (basically install python-cloudfiles) and make sure the service is working (store an object and retrieve it).
Swift is actually 5 separate services, object, container, account, auth and proxy. In theory, each of those can be scaled separately, but in practice the documentation says Rackspace runs the object, container and account services together and breaks out the proxy and auth.
Object, container and account servers are members of partitioned consistent hash rings mapping names to physical locations. Object servers index the actual blobs, container servers index the objects and the account servers index the containers.
Auth servers provide authentication and the proxy servers are the API gateway that coordinates request and responses with the rest of the services.
There are some other moving parts to do things like handle replication, auditing the integrity of objects and rebalance the rings.
The Swift documentation has a lot more information about the architecture, operational considerations and the code. Dig in.
Cloudscaling is interested in creating an open cloud ecosystem and we’ll keep sharing our work and trying to lower the barrier to participation. In the future, we plan to explain and open some of the work we are doing to handle authentication, automating the installation/configuration across clusters and other operational lessons we learn along the way.
Your first running Swift service is only a git clone away.
We would love to hear your feedback, suggestions and issues.
Finally, a shout out and thanks to the Scalers working on OpenStack/Swift: Big ups to Caleb Tennis for working on these chef recipes, Blake Barnett, and Joe Arnold.
Want to hack on OpenStack or Chef? We’re looking for good people to build open source automation tools and cloud infrastructure. Join us.
Join us for “State of the Cloud” webinar Thursday
Please join us for a one hour webinar this Thursday with Randy Bias, (CEO, Cloudscaling), Michael Crandell (CEO, Rightscale); John Engates (CTO, Rackspace); Sheng Liang, (CEO, Cloud.com) and Erik Troan (CTO, rPath) in a lively discussion on the state of the cloud, the barriers and obstacles to enterprise adoption, the state of the ecosystem, and the current state of the union for public, private, and hybrid clouds.
The discussion will be moderated by Dana Gardner, principal analyst at Interarbor Solutions, and prolific ZDNet blogger and is being hosted by our friends over at rPath.
Join us this upcoming Thursday, August 19 at 11:00am PT / 2:00 PM ET for “State of the Cloud II: Enterprise Patterns Emerge”
We also invite you to submit questions before and during the event via twitter using the hashtag #SOTCII.
Register now to reserve your spot for what will surely be a vibrant debate!
Does OpenStack Change the Cloud Game?
This week Rackspace Cloud, in conjunction with the NASA Nebula project, open sourced some of their Infrastructure-as-a-Service (IaaS) cloud software. This initiative, dubbed ‘OpenStack’, should have a dramatic impact on the current dynamics for building cloud computing infrastructure. Previously there have been two major camps: Amazon API and architecture compatible and VMware’s vCloud. Now there is a third alternative that could not only be a viable alternative to these two approaches, but more importantly, a fantastic option for service providers and telecommunications companies that face unique challenges.
Let’s dive in and I’ll explain.
Cloud Stack Evolution & ‘Camps’
Amazon Web Services (AWS) spawned a huge ecosystem of knock-offs, management systems, tools, and vendors. They include, but aren’t limited to:
- AWS API compatible ‘cloud stacks’ including Eucalyptus, Open Nebula, and others
- Cloud management systems for the AWS APIs and services such as RightScale and enStratus
- Cloud services layered on top of AWS services such as Jungle Disk (S3), Heroku (S3, EBS, EC2), and more
Prior, I wouldn’t have called the AWS ecosystem a ‘camp’ per se, but if you read our most recent article on Google’s foray into cloud storage, you know that it seems likely they will provide a 100% compatible version of S3 and EC2 this year. Imagine the impact of Google Compute & Storage with Amazon Web Services compatible APIs. Already the Google Storage API is nearly 100% compatible with S3.
Together, as a block, Amazon and Google could create a de facto duopoly for infrastructure clouds, which isn’t good for anyone. We need competition and more than two major players.
Up against the Amazon camp is VMware. In my article on Amazon vs. VMware last year I highlighted how these two businesses were on a collision course. Nothing has changed and competition is mounting between them. The reason is that telcos and service providers are under increasing threat from Amazon and soon Google. They need viable solutions and VMware is attempting to provide a competitive ecosystem.
The VMware cloud initiative, vCloud, is designed to arm enterprises and service providers to be competitive, but has not quite delivered yet. VMware has had a number of problems providing a full cloud stack. The software, now in beta, is codenamed ‘Redwood’ has had significant delays in getting to market. Their strategy for cloud infrastructure does not appear unified outside of delivering compute virtualization.
VMware, as a business, understands they need to make their customers competitive. They have made a number of strategic open source acquisitions such as SpringSource, RabbitMQ, and Redis. There are also murmurings that they have some special projects inside that are ‘up the stack’ from their virtualization offerings. In total this shows that VMware ‘gets it’ in that they want to create a competitive ecosystem. While each of these is currently a point solution, there is yet to be a coherent story here. Can VMW build a consistent story and strategy around these disparate pieces? Only time will tell…
Besides these two camps, there is a long tail of clouds running various frameworks vying to establish themselves such as Cloud.com’s CloudStack, 3Tera, Hexagrid, Abiquo, OpenNebula, etc. John Treadway recently had posted a roundup describing all of the various cloud stacks out there.
OpenStack is stepping into the ring as a viable third camp. In particular, the OpenStack Storage solution is a clear contender to Amazon S3 & Google Storage. Many service providers and telcos have struggled to find a viable solution using commodity hardware that was price competitive. Suddenly, there is a viable proven solution.
Yet this is only storage. How can it create an effective ‘third camp’ alternative to Amazon and VMware for an entire cloud?
Lock-in, Architecture, Standards and The Truth about Interoperability
Interoperability for infrastructure clouds is poorly understood. Most believe that the problem lies in the on-disk image format (e.g. VMDK vs. VHD vs. qcow) or in the ‘hypervisor’ (although people don’t really unders/tand what this means). The truth is that lock-in has little or nothing to do with disk formats or the hypervisor. Most on-disk image formats are simply representations of block storage (i.e. disk drives). That means converting between a VMware VMDK and a Citrix XenServer/Hyper-V VHD is relatively trivial.
What about booting the converted disk image up on a new hypervisor? Guess what, since most hypervisors now rely on hardware virtualization (HVM) [1] using Intel-VT/AMD-V, that means that by default most will work with unmodified operating systems out of the box. No changes needed. The only downside of this is that usually the resulting performance is poor. This requires new paravirtualization (PV) drivers in the converted image. What does that mean? After converting the image from one format to another, you simply have to install the PV drivers for the correct OS. A process that requires being methodical, but is in no way technically challenging.
Where is the lock-in then? If it’s not the hypervisor, what makes moving from one cloud to another so difficult? Simply put, it’s architectural differences. Every cloud chooses to do storage and networking differently.
For example, if you wanted to move a virtual machine from GoGrid to Amazon, converting the GoGrid image to an AMI is not difficult. Unfortunately, GoGrid uses two networks, a ‘frontend’ and a ‘backend’ where your cloud storage system is connected to via the backend network. Every Amazon virtual server has only a single network interface. If your application assumes a separate backend network then what happens when it moves to a cloud without one? Or vice versa? Similar architectural incompatibilities exist between Rackspace Cloud, Savvis, Terremark, Hosting.com, Joyent, and all of the others.
The problem here, to be a bit more succinct, is that we need reference architectures for how infrastructure clouds are built. Amazon is one such reference. VMware’s vCloud is potentially another. Now there could be a truly open option with the gravity to gather community support.
More on The Third Camp
OpenStack’s potential to build a real community and a set of reference architectures drives towards greater standardization and interoperability. Perhaps more important than a cloud storage alternative, is this possibility for a true OpenStack community to form a critical mass such that a similar level of developers contributing to it as Amazon or VMware. Then commercial and alternative offerings, such as Cloud.com, Hexagrid, and OpenNebula can match their APIs and architectures to this set of reference architectures.
Will it happen? It’s hard to say, but the opportunity is there. Rackspace and others are putting serious weight behind this initiative.
What This Means for Telcos and Service Providers
For Telcos and SPs this means an alternative to VMware’s vCloud for commodity service offerings. A way to compete and operate at scale like Amazon and at a similar price point. Standardization through a similar reference architecture means greater compatibility between service provider clouds, which means greater benefit for customers and less lock-in, making them more desirable than the walled gardens.
You don’t want to differentiate on the basic compute, storage, and network offering. You want this to be as standard and interoperable as possible, just like 3G networks, TCP/IP, and similar service provider technologies. By creating a common open platform that everyone uses there is a better opportunity to facilitate wider adoption, create a competitive infrastructure service marketplace where providers work on differentiating in areas where they have an inherent advantage:
- Service and support
- Network & datacenter tie-ins (e.g. MPLS, hosting/co-lo)
- Bundled service offerings
- Differentiated value-added cloud services (VACS)
This is a game that all telcos and service providers understand. They have been playing it for the past 15+ years.
Conclusion
OpenStack, with a strong community behind it, should be an important tool for service providers and large telcos to compete at scale with the Amazon and Googles of this world.
We believe OpenStack and the reference architecture(s) associated with it will allow service providers (SP) to get their undifferentiated cloud offerings up and running early. For this reason, Cloudscaling will put real resources into supporting this effort. Getting basic cloud offerings up early then means providers can focus on support, services, bundling, and differentiated services as soon as possible, while embracing as large a customer base as possible. This is just as they compete on top of basic TCP/IP services today.
[1] Clearly, the market leader, Amazon, does not use HVM. They use PVM, a fully paravirtualized mode of Xen. However, even they seem to understand that HVM is the future. Their latest offering, designed for HPC, which is performance sensitive, uses HVM and supports unmodified operating systems. The reality is that the Intel-VT and AMD-V capabilities on the latest round of processors is incredibly fast and will only get faster. The battle is over. HVM and silicon won in this case.
Rumor Mill: Google EC2 Competitor Coming in 2010?
I’ve heard from a somewhat reliable source that Google is working on their Amazon EC2 competitor. Yes, some kind of on-demand virtual servers. I would have been the last person to guess that Google would take this direction[1], but you have to admit it makes a certain sense from their perspective. Consider:
- Amazon’s EC2 is clearly generating Real Revenue (TM) and could be at 500-750M in revenue for 2010
- Google has a massive global footprint and is north of one million servers
- The support structure for these servers includes a huge investment in datacenters, networking, and related
- The Googleplex houses an extremely large number of talented engineers in relevant areas: networking, storage, Linux kernel, server automation, etc.
- Google Storage recently went into BETA and is accepting developer signups
This last is perhaps one of the more telling signs. As you may be aware, Amazon’s Simple Storage Service (S3) pre-dates Amazon’s Elastic Compute Cloud (EC2). When Amazon launched in Europe they first deployed S3 followed by EC2. The same happened with their Asia/Pac deployment.
Amazon has built AWS in such a way that all of the services are synergistic, but in particular, EC2 is dependent on S3 as a persistent storage system of record. EC2 AMIs originate from and are stored in S3, it’s the long term backing store for Elastic Block Storage (EBS) and EBS snapshots, and it’s safe to assume that many other kinds of critical data that AWS relies on are stored there.
Would Google take a different approach? It’s doubtful. Amazon’s S3 is built to be a highly scalable storage platform[2]. Google’s own GoogleFS and BigTable server similar purposes. It’s certain that Google would use related design principles and hence we could see the Google Storage as a prelude to a Google on-demand virtual server service (Google Servers???).
Combined with the rumor I heard from a reasonably informed source I think we can look forward to an EC2 competitor, hopefully this year.
What I want to bring to folks attention here is that if another credible heavyweight enters into this market it will have a tremendous impact in further driving the utilitization of cloud services. In the medium term it will also threaten hosting providers and ‘enterprise clouds‘.
Why? I think what many hosting providers fail to understand is that Amazon and Google, particularly if fueled by direct competition, must grow up into the enterprise space. Just as in the Innovator’s Dilemma, they will eventually provide most of the features of any ‘enterprise’ cloud, which means that if you aren’t building to be competitive with Amazon and Google, you aren’t in the public cloud game.
Much more detail on this in a future posting.
[1] My best would have been that Google put more weight behind PaaS solutions like Google App Engine (GAE) and related, which are more ‘google-y’.[2] See the whitepaper (PDF) on their Dynamo technology behind S3. Also check out Riak from Basho that is designed around the same techniques.
Building A Commodity Cloud with EMC?
Just a quick post to note a recent blog post by Chuck Hollis (@chuckhollis) that discusses some of the issues related to using EMC for commodity clouds. The posting hubs around a conversation I have been having with Chuck trying to understand the EMC product line better and seeing if there is a fit for businesses building cost-effective clouds.
Chuck’s blog posting covers the discussion fairly well and he was very helpful. My final takeaway is that I think there can be a place for EMC, traditionally a ‘premium’ vendor, in even low cost commodity clouds. The challenge however, as he rightfully identifies is that EMC, the business, has a hard time understanding these requirements. The local EMC sales team I’ve been dealing with doesn’t really understand the inherent assumptions I’m making. There is a lot of push to simply purchase the ‘bigger box’.
A strongly recommended read for folks trying to understand both EMC’s potential value proposition in the cloud and how to build ’scale out’ commodity clouds.
Cloud: Change Management & Cloud Operations
Our own Andrew Shafer, killed it today at the Velocity Conference. His presentation is a must read for webops, devops, and those aspiring to build 100% uptime cloud services.
It’s hard for folks to internalize how things are changing in Internet-land, but I think you’ll get closer through this presentation. It’s not the same-old, same-old any more. Cloud computing is the biggest change to how IT functions since the 1980s and the advent of the personal computer (and hence the rise of client-server/enterprise computing).
Enjoy … (and outstanding job, Andrew!)
Change Management Velocity2010 View more presentations from Andrew Shafer.Interview with Cloudscaling CEO on Cloud in the Mid-market
A recent interview I did with Alex Bewley of Uptime Software is finally available. Although the podcast is nominally about cloud computing for mid-tier enterprises, we actually cover much broader ground. Alex’s blog posting lists the core topics as:
- what kinds of businesses are using cloud
- how you should go about evaluating it
- how to avoid being outsourced as an IT department
- what are the barriers to adoption; monitoring in the cloud (near and dear to our hearts)
- designing applications for failure awareness
- where he thinks the cloud is going
More important, for me personally, is that I think this is one of my better podcasts. The audio is clear, my responses, while long, are reasonably crisp, and you can tell that the general thinking around here has evolved a lot. Some key messages come through loud and clear, which I think aren’t well understood still:
- Cloud computing isn’t about virtualization
- This is disruptive sea change, be the disrupter, not the disrupted
- Whole new areas of opportunity, applications, etc. are opening up that didn’t exist before
I really think it’s worth a listen. It’s a little less than 20 minutes and moves pretty quickly. Please enjoy and a big thanks to Alex who did a great job with the interview. Head over to the original blog post to listen to the podcast with Flash in your browser or you can download the MP3 directly if you are using a non-flash capable system.
Getting Velocity – Economy of Motion
Last time, I implied that scale alone doesn’t always lead to the operational efficiency in the datacenter
There is no class someone can take that will teach them everything they need to know to run a datacenter, and the applications it ostensibly exists for, efficiently. Training exists for some of the tools, for some of the hardware, but gaps are still left to fill in with smart people, hopefully working together. (and sometimes getting the people aligned IS the hard part)
If there was a training covering every technical aspect of this, it would be expensive, take months, probably years, and each class would not be up to date by the time it was over.
Two weeks from today, you can get a glimpse of what different parts of that training might look like.
Velocity is the preeminent cloud operations conference. (Let’s face it, a cloud is a big web app. If you can’t do web ops efficiently at scale, keeping a cloud running effectively will be a chance for you to learn these lessons the hard way. These lessons will probably cost you on both sides, as you throw bodies at the problem while your customers leave after every outage.)
This is ‘the’ ops conference for building and managing cloud services at and on every level of whatever ‘cloud’ means to you.
I’ve been to every Velocity so far. Starting at the first Velocity, there were two recurring themes, first, automating dynamic infrastructures, and second, developers and operations working together as paramount and a differentiator. The change is only accelerating, no pun intended.
I’m leaving out half of the story because Velocity also gathers the planet’s expertise focused on client side performance, which is provably important. It’s not my focus at the moment, but we need each other to deliver the highest value.
Many lessons are best learned by doing. Come learn from and connect with the people who have experience building some of the most impressive infrastructures and ops teams on the planet (and the scars to prove it…). You’re at even greater disadvantage if you don’t, because we’ll be learning and sharing with each other.
Cloudscaling will be there.
Register now for 20% off with this code: vel10s2d
More Economies of Scale: Efficiency, Head Count and TCO
James Hamilton’s presentation at Mix 10 illuminated cloud computing economics that few others have direct experience with, but I also believe that this presentation raises interesting questions that didn’t get addressed. (If you haven’t seen James Hamilton’s Mix10 presentation, go watch it now. You should probably also go through Randy’s follow up, and then watch James talk again… I’ve watched it 3 times now.)
This is the first post that will refer to aspects of James’ talk (and I plan at least one more about business models) and in case I haven’t stressed this enough, if you have any interest in understanding the economics of cloud computing, take the time to watch one of the best in this business.
Central to James’ presentation is the breaking down the total cost of ownership of computational infrastructure. His breakdown is based on his own data running web scale services and he provides us with a great analysis on the inevitability and sustainability of cloud computing business models. One of the key points James makes is that the only variable cost in the chart is the cost of power.
Cost Breakdown from James' Hamilton
The thing I want to focus on here is the missing cost of personnel. James touches on this at different points discussing administration and automation. He gives out the number ‘as low as 3% for services’, so I’m assuming he is burying this as a negligible cost. I would argue that this cost is actually highly variable, and, while correlated with scale, is also a function of the types of services an organization provides and how those services relate to the core business. Additionally, automation investments can be scaled down effectively, but that’s what I’ve been working on for a couple years so this is likely a reflection of my bias.
Based on James’ biases (which he is straightforward about), that the cost of personnel can be driven down to almost nothing is essentially taken for granted. I contend, based on personal experience and observation, this is still a significant operational cost for many organizations. I would take this a step farther and posit that the level of efficiency James refers to only comes from the crucible of running web services at scale with considerable economic pressure for little or no downtime. Furthermore, this level of efficiency and cost reduction will never materialize in organizations who view IT as a cost center. Efficiency doesn’t just come for free, at scale or otherwise.
To keep the numbers easy, let’s assume an admin is paid $100,000 per year. Then neglecting the aspects of networking and storage, that admin can manage some number of machines. If that number is 100, then managing each machine costs $1000 per year or ~$83 per month. If that number is 1000 machines, then each machine costs $8.33. If those are $3000 servers and the servers are roughly 54%, then if I’m understand correctly the $8.33 is around 5% of the monthly cost when amortized over 3 years. James gave us price or efficiency ratios for storage, networking and admins. For a large service he listed ‘over 1000 servers/admin’. He did not give us a ratio or a price point per server, but in order to get down to ~3%, the admins need to manage significantly more than 1000, the server cost is significantly higher than $3000, or the admins get paid significantly less than $100,000. (this also assumes salary is the only cost and nothing is paid for any management tools…)
What do you pay your admins? What do you pay for servers? What is the ratio between them?
Which one do you have the most control over?
(hint: the way to optimize the ratio is not to hire less admins, unless your customers like down time…)
Watching the evolution of the cloud computing landscape, in the rush to bring new services to market or transition away from apparently disrupted business models, I believe many organizations may unnecessarily learn this lesson the hard way. The proper care and feeding of the infrastructure better be a core competency for those who intend to compete ‘as a Service’ at any level. The operational differentiators have as much to do with process and culture as they do with technology, but doing them well could be the difference between business success and failure. I believe this is difficult to retrofit, especially at scale.
So what should you do? Start by trying to understand what your costs look like today, and I’ll follow up with perspectives and resources that might help with operational efficiency, at any scale. Operations can be a competitive advantage, but only for those organizations who have made the investments in both the people and the infrastructure.
