fallenpegasus: (Default)
Six months ago, right around the O'Reilly MySQL Conference, my previous employer, Gear6, suffered from "unfortunate cash flow event". That is, they ran out of money faster than their sales grew. Which is too bad, it was a good company with good and useful products, and it was staffed with good people. I appreciate the honest and ethical dealings of the board and the executive staff, who kept the we the staff "in the light" as the situation developed, and did things like paying out the accumulated vacation time and such. No bounced paychecks, unpaid expense reports, or surprise locked doors.

I spent the time working on personal projects, preparing for and going to Burning Man, studying up more on open source community management, digging more into cloud computing, and interviewing at a number of interesting companies.

And now, as of November 1st, I have a new gig. I am the Community Manager for the open source company Eucalyptus Systems.

Eucalyptus is based in Santa Barbara. I will remain based in Seattle, and will be travel down to the offices regularly, and will be travelling for conferences.

My first conference in my new corporate livery will be the second Open Stack Design Conference, which is next week in San Antonio.
fallenpegasus: (Default)
Once someone starts using memcached, they tend to quickly find themselves in the state of: "my database servers overload and my site goes down if the memcached stops working". This isn't really surprising, quite often memcached was thrown into the stack because the database servers are melting under the load of the growing site.

But then they face an issue that is, as mathematicians and programmers like to call it, "interesting".

"How do I add more capacity without an outage?"

At first most people just live with having that outage. Most systems have regularly scheduled downtimes, and during that them the memcached clusters can be shut down, more storage nodes are added, and then it is all restarted, with the new distributed hash values for the new number of nodes.

Ironically, the more successful the site is, the more it grows, the more costly that outage becomes. And not linear to that growth either. The increasing cost is more on the order of the square of the growth, until they literally cannot afford it at all. As the cache gets bigger, it takes longer for it to rewarm, from minutes to hours to days. And as your userbase grows, the more people there are to suffer the poor experience of the cache warming up. The product of those two values is the "cost" of the outage. This is bad for user satisfaction, and thus is bad for retention, conversion, and thus revenue.

This can be especially frustrating in a cloud environment. In a physical datacenter, because you have to actually buy, configure, and install the hardware for a node, it somehow feels easier to justify needing an outage to add it to the cluster. But in a cloud, you can start a new node with the click of a button, without getting a purchase approval and filing a change plan. And also, in cloud environments, we all have been evangelizing "dyanamic growth", and "loosely coupled components", and "design for 100% uptime in the face of change". And yet here is this vary basic component, the HDT KVS cluster, that doesn't want to easily work that way.

There are ways to resize a DHT cluster while it is live, but doing so is an intricate and brittle operation, and requires the co-operation of all the clients, and there are no really good useful open source tools that make it easier. You have to develop your own custom bespoke operational processes and tools to do it, and they are likely to miss various surprising edge cases that get learned only by painful experience and very careful analysis. Which means that the first couple of times you try to resize your memcached cluster without a scheduled outage, you will probably have an unscheduled outage instead. Ouch.

One commonly proposed variety of solution is to make the memcached cluster nodes themselves more aware of each other and of the distributed hash table. Then you can add a new node (or remove a failed one), and the other nodes will tell each other about the change, and they all work together to recompute the new DHT, flow items back and forth to each other, put items into the new node, and try to keep this all more or less transparent to the memcached clients with some handwaving magic of proxying for each other.

And that, more or less, is what Gear6 has just done, under the name "Dynamic Services". We have released it first in our Cloud Cache distribution, initially on Amazon AWS EC2, and then on other cloud infrastructure systems. Soon next it will be in our software and appliance distributions.

This is an especially useful and neat in a cloud environment because the very act of requisitioning and starting a new node is something that the underlying infrastructure can provide. So you can go to the Gear6 Cloud Cache Web UI, and ask it to expand the the memcached cluster. That management system will interface to the EC2 API, and spin up more Gear6 memcached AMIs, and once they are running, add them to the cluster and then rehash the DHT. All while the cluster is serving live data.


(This entry was originally posted at my Gear6 corporate blog. Please comment there.)
fallenpegasus: (Default)
When I first started working with Gear6, I pushed for bundling a version of our memcache appliance as an AWS EC2 AMI. We had already done much of the work by making a version available as trial as a VM image and our current effort on a "Universal Distro", which can run on any qualified piece of server hardware.

Because of that existing effort, and because of the nimbleness and skill of my coworkers here at Gear6, we were able to move from proposal to first release in only a few weeks.

We are releasing our first version of the Gear6 Web Cache for the Cloud today of Amazon EC2, at ami-2411f34d for 32 bit, and ami-2611f34f for 64 bit. The 32 bit version is "free", you only pay Amazon's EC2 charge, and the 64 bit version (which can cache much more, return results faster, and handle more client connections) is linked to Amazon DevPay, so you will pay some money to Gear6. But it works out that the "gigabyte hour" cost for the new "high memory" EC2 types is actually less than the cheaper smaller "free" 32 bit size.

Jeff Barr at Amazon AWS just blogged about it. The press releases and tech press articles happen today.

It's been a learning experience for me. I got to deal more with the tech press, and learned more about how publicity and press releases work.

I also got to learn more and hard about Gear6's internal development and build processes. And I broke the build, several times, on the integration day. I need to get better at testing.

But I'm excited, I hope this goes well.

Profile

fallenpegasus: (Default)
Mark Atwood

August 2012

S M T W T F S
   12 3 4
567891011
12131415161718
19202122232425
262728293031 

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 19th, 2013 01:59 am
Powered by Dreamwidth Studios