fallenpegasus: (Default)
Last night, I pushed out a formal release of my S3 tools.

The big thing is that it's now a Perl module, what with an installer and namespace in CPAN and everything.

It has a Freshmeat page. You can grab the tarball here, or check it out of Mercurial here.

It should soon show up in CPAN as Net::Amazon::S3::Tools.
fallenpegasus: (Default)

It's not much of an SLA, but it's there. It reaches only to Amazon's network edge, and you have to actually ask for it if you deserve a reimb.
fallenpegasus: (Default)
I know that someone has already written a distributed multimountable filesystem for S3. But it's commercial and closed source.

I've not looked at how it works. But I've been thinking...

There exist already filesystems that are based on preallocated extents, and filesystems that are based on immutable extents. One can combine the two, and build a filesystem that builds on S3, like so...

Each inode structure is an S3 item. Also, each extent is likewise an S3 item. Actually, they are a sequence of S3 items, because they will be versioned. Every time an inode is changed, or an extent is modified, what actually happens is a new one gets written to S3, and the item names for them have a delimited suffix with the version number.

This allows multiple hosts to mount the filesystem readwrite, without being incoherent, and without needing a "live" distributed lock manager. If a host has it mounted, and is reading from some extent, and some other host writes to that extent, the first host will keep reading from the old one.

On a regular basis, such as on sync, a host will issue a list request against all the extents and inodes it is using. It will then thus discover any updated ones, and act accordingly.

Also, each host will write a "ping" item, probably at every such sync. Something can monitor the bucket, and delete all extents and inodes that are older than newest ping of the farthest behind mounting host.

If instead old extents are not deleted after they are obsoleted, it would in fact be possible to mount the filesystem readonly as it appeared at time X, for any arbitrary time between "now" and "just ahead of the reaper process".
fallenpegasus: (Default)
Things are always harder and take longer.

I have s3getacl working and documented.

 $ ./s3getacl example
# bucket: example
# owner: 5a1568e09392dad4b4ceb54f29f0a64d651a531350d6f720fbd2367eed995f08
$ ./s3getacl example/thingee
# bucket: example
# item: thingee
# owner: 5a1568e09392dad4b4ceb54f29f0a64d651a531350d6f720fbd2367eed995f08
$ _

It uses Perl's XML::Simple. I'm thinking I'm going to need a more sophisticated Perl XML module to write the next step, s3setacl. If I do, I'll probably go back and change s3getacl to use it too.

I'm going to push this all up to my mercerial reposatory soon, for other people to use.
fallenpegasus: (Default)
So, there was a command line toolset for manipulating S3, called the Hanzoarchives hanzo-s3-tools. They were kind of iffy in the installation and documentation, but they worked, more or less. However, their handling of S3 ACLs was pretty poor, very not useful. And as the last straw, it seems to have gone offline and vanished.

On the other hand, there is a really good Perl library, Net::Amazon::S3. Many of my clients are more or less Perl shops, so I have been recommending that they install it. But that module lacks the scripts that bring functionality out to the command line.

So I'm writing them. Today, I've got s3mkbucket and s3rmbucket working, complete with handling of ACLs and TLS. Tomorrow will be s3putacl and s3getacl, and quickly following will be s3put, s3get, s3head, s3rm and s3ls.

All open source, of course. It will be available in my public repository, and I'll see if I can get the Net::Amazon::S3 to accept it as a patch. If he doesn't, I'll just make a Freshmeat project, and advertise it that way.
fallenpegasus: (Default)
Brian has written a library that implements "XML Row Storage", and has added it to his HTTP Storage Engine.

I've got a couple of hours to add it to my S3 Storage Engine.

I've already stumbled over the fact that my engine as it was won't build against a very recent MySQL 5.1 source tree. Type byte is now uchar, and the call parameters of get_server_by_name have changed.

A contract

May. 31st, 2007 05:10 pm
fallenpegasus: (Default)
Yesterday morning, I had an initial face-to-face meeting with a prospective client. We met at the Victrola Cafe. He's putting together a pretty neat Web2.0 custom app. My first task for him will be writing a little process that monitors a particular public database and writes changes into S3, to start capturing a historical timeline. There is a great deal of additional AWS, DB, and web client programming work I am qualified to do for him.

Today, I pulled together a software developement contractor contract, rewrote a bit of it to be aware of open source licencing "stuff", and then emailed it off to him, along with my rate and time estimate.
fallenpegasus: (Default)
Immediately after my presentation about my MySQL S3 storage engine, I was interviewed by Eric Lai, a reporter from ComputerWorld. The resulting article is here, and was abstracted by the conference here.

Also, Sheeri Kritzer recorded me. Here is the audio and the video.

When I get the larger form of the video, I'll push it into an S3 bucket/item and make it world readable...
fallenpegasus: (Default)
My talk was right after the PBXT storage engine talk. PBXT is darncool, it looks to be a leading contender for fast localstore nearly transactional data with lots of blobs and varchars. I was taking lots of notes, especially his "gotchas for storage engine developers" and his "streaming blobs" ideas.

Then a whole bunch of people came in for my talk, almost filling the room. There were a few startup problems with the projector and the mike, and we were off.

Unfortunately, I was scheduled opposite a talk on "Highly Available MySQL Cluster on Amazon EC2", which was annoying because I wanted to go to their talk, they wanted to come to mine, and there were likely a lot of people who wanted to go to both.

It went well. I lagged away from my slides only a few times. The questions at the end were useful, and row of questioners after as well. Someone from, I think, Computer World, did a fast interview. MySQL gifted me, as they did all the speakers, with a PowerSquid, which I just realized that I left in the presentation room.
fallenpegasus: (Default)
I've described this idea to a few people, but I figured I would post it here.

I've had an idea for using Amazon AWS S3 to distribute MySQL cluster replication data.

The existing architecture for MySQL clustering is as follows:
  1. The master has N slaves
  2. The master copies each binlog replication to each slave. If there are 7 slaves, then the master has to push the same data out it's network pipe 7 times
  3. The slave has a hot TCP connection to the master. The master has 7 hot TCP connections, one for each salve.
  4. The slave takes each replication chunk and applies it.

Here is my idea.
  • For each replication chunk, the master creates a handle name for it, and also the handle name for the next chunk.
  • The server copies each chuck into an S3 item, once. The item's name is it's handle, and it has a piece of S3 metadata that is the handle of the next chunk.
  • Each client tails the bucket's item list, and grabs each chunk in turn. After it's applied that chunk, it writes a short item back to the bucket, stating that it's applied the chunk.
  • A low priority reaper watches the bucket, and when every registered slave marks a given chunk as applied, the reaper deletes the chunk.

The advantages are
  • The master only has to write the chunk out to the network once. There is no increased load when the number of slaves is increased.
  • The slaves can be very geographically dispersed without additional pain.
  • The master and the slave don't need hot TCP connections, VPN connections, or firewall configurations.
  • If the network partitions for a while, the slave falls behind, but will resync without pain. Also, a network partition doesnt crash the master when it's binlog space is exhausted.
fallenpegasus: (Default)
I just pushed out a significant release of the Amazon AWS S3 MySQL storage engine. There are significant stability, performance, debugging, and feature improvements.

Available via


Unless there are any showstoppers, this is the one that is going to the MySQL Expo next week, to be presented on Wednesday.


It allows one to view and manipulate Amazon's S3 storage service as tables and items by MySQL. You can keep your blobs or large varchars or truely huge datasets in S3, and then join the tables against your local ones.

If you're a MySQL geek, please download it, install it, try to crash it.
Feedback and bug reports are most welcome.
fallenpegasus: (Default)
I've made a lot of additions and updates to the "MySQL Storage Engine for Amazon AWS S3". Available, as always, via hg.
fallenpegasus: (Default)
It works!

SELECT s3id from pierce;
| s3id |
992 rows in set (25 min 51.73 sec)
mysql> _

Ok, so it needs to be made a bit faster, but I have some ideas for that...
fallenpegasus: (Default)
I just pushed out version 0.03 of the awss3 storage engine.

There should be no user visible changes, it's all under the hood stuff.

Available via hg and tarball.
fallenpegasus: (Default)
Wow. 160 people in del.icio.us have bookmarked my project.
fallenpegasus: (Default)
I just got my first bugreport for the mysql-awss3 project. Embarrassing bug too. So version 0.02 is now available from hg and tarball.
fallenpegasus: (Default)
My project has been linked to Digg.
fallenpegasus: (Default)
I've been linked to DZone.com. DZone appears to be like furl and digg, where people submit links and other readers vote them up or down.
fallenpegasus: (Default)
I just got mentioned on the official AWS blog

Independent developer Mark Atwood has been working on a MySQL interface to Amazon S3. Released under the GNU Public License, the code is compatible with version 5.1 of MySQL. Once the interface has been installed and configured with your AWS developer credentials, you can now create tables using the AWSS3 storage engine like this:

CREATE TABLE atst (s3id VARCHAR(255) NOT NULL PRIMARY KEY, s3val BLOB)) ENGINE='AWSS3' connection='awss3 bucketname aws_id aws_secret'

This is a bleeding-edge, first-cut release and, as is the case with popular open source projects, will undoubtedly evolve and mature rapidly over the coming weeks and months.

Based on Mark's S3 journal entries, the basic functionality is now in place. Each database table row is stored in an S3 object. The object's S3 key corresponds to the table's primary key (which must be of type VARCHAR). Inserts, deletes, and selects are functional.


fallenpegasus: (Default)
Mark Atwood

April 2017

910 1112131415
16 171819202122


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 24th, 2017 07:08 pm
Powered by Dreamwidth Studios