fallenpegasus: (Default)
I've ported my AWS S3 storage engine to Drizzle.

The source is at bzr branch lp:~fallenpegasus/drizzle/awss3. Pull it and build it like you would the main Drizzle. The engine is built-in, no need to plugin load it.

Example use:

   CREATE TABLE colors
        CONNECTION='awss3 bucket_name aws_id aws_secret';
   SELECT val FROM colors WHERE nam='BlueViolet';

I will try to keep it tracking the main Drizzle dev tree.
fallenpegasus: (Default)
I've just checked in a small update to the s3-tools

I've removed the dependency on the Perl package XML::Simple, and replaced it with calls to XML::LibXML, which will have already been loaded because Net::Amazon::S3 depends on it.

I really dislike doing what should be a minor update via CPAN, and having a cascading set of added dependencies cause CPAN to pull in the whole world. So I shall swim upstream, and remove unneeded dependencies.


The tarball can be had at http://fallenpegasus.com/code/s3-tools

The Mercurial repo is at http://hg.fallenpegasus.com/s3-tools
fallenpegasus: (Default)
In a recent TechCrunch article about Amazon Web Services, it's revealed that "the biggest customers in both number and amount of computing resources consumed are divisions of banks, pharmaceuticals companies and other large corporations who try AWS once for a temporary project, and then get hooked."

This is not a big surprise to me. Last year sometime, at some random geeky event, I was explaining why I thought AWS was so cool, and one of the people I was explaining it to worked for a local insurance company, and he got very excited. Apparently, at this company, they would have huge stacks of compute servers that would lie fallow 28 days out of the month, but when it came time to run the monthly math to recalculate various risk reward calculations, they would max out their capacity, and were constantly begging for more. They would very happily pay a premium to have those compute instances on-demand, by-the-drink, and not have to pay for them the rest of the month.

I wouldnt be surprised if they are now one of those AWS customers.
fallenpegasus: (Default)

It's not much of an SLA, but it's there. It reaches only to Amazon's network edge, and you have to actually ask for it if you deserve a reimb.
fallenpegasus: (Default)
I know that someone has already written a distributed multimountable filesystem for S3. But it's commercial and closed source.

I've not looked at how it works. But I've been thinking...

There exist already filesystems that are based on preallocated extents, and filesystems that are based on immutable extents. One can combine the two, and build a filesystem that builds on S3, like so...

Each inode structure is an S3 item. Also, each extent is likewise an S3 item. Actually, they are a sequence of S3 items, because they will be versioned. Every time an inode is changed, or an extent is modified, what actually happens is a new one gets written to S3, and the item names for them have a delimited suffix with the version number.

This allows multiple hosts to mount the filesystem readwrite, without being incoherent, and without needing a "live" distributed lock manager. If a host has it mounted, and is reading from some extent, and some other host writes to that extent, the first host will keep reading from the old one.

On a regular basis, such as on sync, a host will issue a list request against all the extents and inodes it is using. It will then thus discover any updated ones, and act accordingly.

Also, each host will write a "ping" item, probably at every such sync. Something can monitor the bucket, and delete all extents and inodes that are older than newest ping of the farthest behind mounting host.

If instead old extents are not deleted after they are obsoleted, it would in fact be possible to mount the filesystem readonly as it appeared at time X, for any arbitrary time between "now" and "just ahead of the reaper process".
fallenpegasus: (Default)
Things are always harder and take longer.

I have s3getacl working and documented.

 $ ./s3getacl example
# bucket: example
# owner: 5a1568e09392dad4b4ceb54f29f0a64d651a531350d6f720fbd2367eed995f08
$ ./s3getacl example/thingee
# bucket: example
# item: thingee
# owner: 5a1568e09392dad4b4ceb54f29f0a64d651a531350d6f720fbd2367eed995f08
$ _

It uses Perl's XML::Simple. I'm thinking I'm going to need a more sophisticated Perl XML module to write the next step, s3setacl. If I do, I'll probably go back and change s3getacl to use it too.

I'm going to push this all up to my mercerial reposatory soon, for other people to use.
fallenpegasus: (Default)
So, there was a command line toolset for manipulating S3, called the Hanzoarchives hanzo-s3-tools. They were kind of iffy in the installation and documentation, but they worked, more or less. However, their handling of S3 ACLs was pretty poor, very not useful. And as the last straw, it seems to have gone offline and vanished.

On the other hand, there is a really good Perl library, Net::Amazon::S3. Many of my clients are more or less Perl shops, so I have been recommending that they install it. But that module lacks the scripts that bring functionality out to the command line.

So I'm writing them. Today, I've got s3mkbucket and s3rmbucket working, complete with handling of ACLs and TLS. Tomorrow will be s3putacl and s3getacl, and quickly following will be s3put, s3get, s3head, s3rm and s3ls.

All open source, of course. It will be available in my public repository, and I'll see if I can get the Net::Amazon::S3 to accept it as a patch. If he doesn't, I'll just make a Freshmeat project, and advertise it that way.
fallenpegasus: (Default)
I went to the keynotes, took some pictures, which I will post to flickr.

While walking to my first session of the day, I ran into one of the OLPC people. Carrying around those bright plastic devices is a great calling card. We chatted a bit, and he told me what evil underhanded stuff Intel, Microsoft, and the Gates Foundation has just pulled last month that seriously threatens to destroy the OLPC.

Now I'm in the "Managing Technical Debt" by Andy Lester. Good stuff, careful notes. But mostly stuff already know.

The exhibition floor. Someone has a Penguin Robot! $85. Runs Python, controlled by Python. Can be used as a VoIP phone. It reads email and rss feeds.

Someone is showing off their 3D printer in the lobby. It can actually print it's own parts. The guy says that the version 1.1 will be completely self-fabricated. It will need a couple of bucks of cheap electric chips, and then clever monkey paws to do final assembly, but otherwise is completely self replicating.

Session "Xen Image Manager" by Johnathan Oxer. Must keep in touch with this project. It could evolve or point the way to an open source gridOS.

Coordinated with [livejournal.com profile] snippy via SMS to meet for dinner at 6pm. I was originally going to couchsurf at their place this week, but then scored crashspace at the hotel.

Session "Google Summer of Code" with Chris DiBona and Leslie Hawthorn.

Dinner with [livejournal.com profile] snippy and [livejournal.com profile] sinanju. They picked me up, and took me out to a German place, for sausages and lentil soup.

BOF session for AWS EC2/S3. BOF session for MySQL Community.

Stayed up till midnight hacking.
fallenpegasus: (Default)
Sunday afternoon, I took the Amtrak from Seattle down to Portland. I think from now on, if I want to go to Portland, that will be the way I will go. It's about as fast as driving, a lot less stressful, it's cheaper (looking at the cost of gasoline), and I have a power outlet. And no damn TSA to deal with.

Portland is a lot like Seattle, only with trains and more hippies.

The O'Reilly folks are helpful and friendly.

Monday, I had wanted to do the morning session on Xen. But it had been cancelled. Foo!

So instead I went to the "Code Like a Pythonista: Idiomatic Python" by David Goodger. It was very cool, and I improved my Python skillz just from watching his examples.

That afternoon, I to went to "A Taste of Haskell" by Simon Peyton-Jones. I picked it because I knew almost nothing about it, except that it's something very different from the CS research world that had made the jump to actual use. It make my head hurt, and I want to learn more about it.

Afterwards, at the end of the session, Nathan Torkington said hello, because I had asked a Perl related question. ("Is there anything like CPAN for Haskell?") Behind him was Larry Wall. I had to tell them that while I used to be a heavy Perl user and worked in a very heavy Perl-only shop, now my language of choice is Python.

That evening, I went to a keysigning BOF, and increased my meshing into the GPG web of trust, and also picked up id points from Thawte and from CAcert.

That night I went to dinner with Brian Aker of MySQL, Rasmus Lerdorf of Yahoo, and Rob Lanphier of Linden Lab.

Tuesday, I attended the morning session "OpenID Bootcamp" by Simon Willison and David Recordon. I didn't learn much new about OpenID itself, but I did learn about Jyte.com and more about ClaimID.com

I had lunch with the OpenID guys. David gave me a "Verisign Identity Protection" fob. PayPal sells them for $5, Verisign sells them for $30. They probably cost a quarter each in quantity from the manufacturer. I then set up my PayPal account and eBay account to use it. I am annoyed that my bank and my credit card web accounts dont use it, and am annoyed that Verisign makes it difficult and expensive to be a VIP RP, when they should be making it cheap and easy.

That afternoon, I went to "Simple Ways To Be a Better Programmer" by Michael G. Schwern. There wasn't much new there for me, but it was interesting to see it all together in one place. Part of it was about code, part was about increasing your own productivity, part was how to "to be an asshole", and part was about peopleware.

After that, I went to the AWS S3/EC2 BOF. Interestingly, most people where there to learn about it, and I was the only one with both experience and opinions and advice. So I ended up being an impromptu speaker/moderator. I got a lot of business cards, and had productive exchanges with Renat Khasansyn of Apatar, who I had met at MySQLCon, and with Kimbro Staken of JumpBox.

That evening, I went to the MySQL party. Several people from O'Reilly helped me navigate the train system. At the party I met Kaj Arnö, which was productive and hopefully profitable. Then Kaj and Monty treated me to a particular Finnish drink called Salmiakki Koskenkorva. I liked it, but then I got a taste for dark black licorice from my mother.

After that, I went back to the convention center, and hung out with Julian Cash, Rob Lanphier, and Robert Kaye of MusicBrainz. Fifteen years I was annoyed when CDDB took all the data that I and many other people had shared together, and basically stole it to start GraceNote. Robert Kaye was so annoyed, that he started MusicBrainz with the goal of smashing them. It's apparently been a threadbare task until recently, when Google started buying his datafeed.
fallenpegasus: (Default)
[livejournal.com profile] krow just posted about the the difficulties of implementing a Queue Engine for MySQL.

I don't think it's really that impossible. Yes, there are some kind of SELECTs that either don't make sense, or are almost certainly not what the user wants, or are impossible to do.

But trying to use a very specialized engine for general purpose queries is not really something to worry about.

I would take the same approach that I do with my S3 storage engine, e.g., implement whatever makes sense, and then for the cases were it doesn't make sense, tell the user "So, don't do that, then!".

For a queue engine, I would separate the operations of "get next item" and "delete"

SELECT id,message FROM queue LIMIT 1;

DELETE FROM queue WHERE id="foo";

But then, I've been thinking about how to wrap the Amazon AWS SQS service up as a storage engine, and this approach fits with SQS reasonably well.


fallenpegasus: (Default)
Mark Atwood

November 2016

202122 23242526


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 19th, 2017 11:30 pm
Powered by Dreamwidth Studios