fallenpegasus: amazon (Default)
I was working on a highly constrained consumer electronics device, a little "satellite device" that spoke to the main device over a CATV RF coax cable and also received commands from an IR remote control. My code was failing in bizarre ways. I adopted an extremely paranoid defensive programming stance, filling my code with asserts and doing paranoid cross checking of all inputs. This didn't make the device work. Instead it consistently didn't work, instead of inconsistently, because the cross checks and asserts would usually (but not always) trip before it would crash. It also started to run out of memory because of the all the paranoia code I had added.

I asked for the source code for the driver for the IR receiver, and for the driver for the CATV RF digital transceiver, and for the peer code that was driving the cable digital that ran on the main device.

The driver for the CATF RF digital transceiver was handed to me the first time I asked. And by "handed to me" I mean that I was pointed to where it was sitting in the source repo.


The business partner / hardware supplier who was supplying the IR glue and drivers just , after giving me a runaround, finally just flat out refused, citing trade secrets, confidentiality, secret sauce, and similar bullshit.

So, I finally "stole" the source code with a disassembler. And found the sources of many of my problems. It was complete shit. "Unexpected" input from the silicon would cause wild random pointer writes. And random sunlight on the receiver optics would cause it. "Expected" input of undefined remote commands wasn't much better, generating and handing back blocks of garbage with incorrect block length headers.

I ended up writing, nearly from scratch, a replacement IR receiver driver.


The peer device driver code was written by a developer in a different group in my same company. I finally got the P4 ACLs to read it after loudly escalating, over the objections of it's developer and his group manager. It was also complete shit. I cannot even begin to remember everything that was wrong with it, but I not only figured out may of the sources of my own pain, I also found a significant source of crash and lockup bugs that afflicted the main device.

I was not allowed to rewrite the peer code, as it was not in my remit. However, I was able to sneak in and check in a large number of asserts, using the excuse that they were "inline documentation".


On, and the device driver for the CATF RF digital transceiver? The source code I got for the asking, without a fight? When I reviewed it was easy to understand, efficient, elegant, and as far as I could tell, bug free.


In the end, I made my part work. It just took over two months instead of the original guesstimate of less than two weeks. This caused a schedule slip in the release of the satellite box. Which would have been a more serious problem, except…


Except there was also major schedule slip for the main box. A significant reason for that slip was because the peer code that I had filled will asserts was now crashing with assertion failures instead of emitting garbage. I was lucky that I was not more officially "blamed" for that. The reason why I wasn't, was mainly because the people who understood what I did understood the problem, and the executives who didn't understand what the problem was were also too clueless to blame anyone, let alone me.


My lesson learned from this experience is: if someone is refusing to show the source to suspect driver code, citing trade secrets, confidentiality, secret sauce, partnership agreements, or similar excuses, it's not because they are protecting their magic. It's because they have screwed up, and they are trying to hide it.

A second rule of thumb I have is: source control systems that don't allow any developers to check out and review any arbitrary source code file are expressions of moral failure. It is unethical for an engineer, designer, or other technologist to ever sign off on a project that has been mutilated by such a broken tool.
fallenpegasus: amazon (Default)
Six months ago, right around the O'Reilly MySQL Conference, my previous employer, Gear6, suffered from "unfortunate cash flow event". That is, they ran out of money faster than their sales grew. Which is too bad, it was a good company with good and useful products, and it was staffed with good people. I appreciate the honest and ethical dealings of the board and the executive staff, who kept the we the staff "in the light" as the situation developed, and did things like paying out the accumulated vacation time and such. No bounced paychecks, unpaid expense reports, or surprise locked doors.

I spent the time working on personal projects, preparing for and going to Burning Man, studying up more on open source community management, digging more into cloud computing, and interviewing at a number of interesting companies.

And now, as of November 1st, I have a new gig. I am the Community Manager for the open source company Eucalyptus Systems.

Eucalyptus is based in Santa Barbara. I will remain based in Seattle, and will be travel down to the offices regularly, and will be travelling for conferences.

My first conference in my new corporate livery will be the second Open Stack Design Conference, which is next week in San Antonio.
fallenpegasus: amazon (Default)
Dear recruiters,

I have not "sent you my resume", nor have I "recently posted it to Monster", and using boilerplate stating as such to justify your spam blast towards me warns me right from the start that you are either stupid or dishonest.

Since you claim to already have my resume, asking for a copy of it again "for reference" is suspect.

My actual resume is online and is easy to discover, directly on the web and in LinkedIn. If you claim to take a "personal touch" in finding "the perfect placement", then learn to use Google and LinkedIn.

That you then specifically demand that my resume be "Word formated" (sic) is also suspect. The main reason you would want that is so that you can edit it and thus try to turn me into a liar by proxy on a job application.

By the way, the word "formatted" is spelled with two t's.

Using the phrase "contract W2" to describe a position is, again, a lie. Employment is either W2, or it is 1099. What you really mean is that you want to resell my ass to a body shop, who will pay me mediocre wages and bennies, will resell it again at a huge markup to their one big client, and then fire me every 9 months, just so everyone can maintain the blatant fiction that I am a "contractor" for the tax man.

Nor would I be allowed to describe what I do in detail to my friends, nor would it increase my scores on ohloh, launchpad, or github.

I am not interested in sub-market-wages. Especially not in a Microsoft shop.

A "full commitment" to a "exciting high-energy and yet stable opportunity" means that I would be expected to work for a large and tediously uncaring company for 60 hours a week (with more at crunch time), with no reward other than said sub-market wages, and the occasional box of pizza. Since this is a "contract W2" position, I probably would not even get the free team and project t-shirt, nor an invitation to the release party.

If I'm going to be asked to sacrifice my health and my life to a series of projects-du-jour, I expect a more substantial set of rewards. Specifically, ones that I manage via my etrade account.

Also, as part of your commitment to finding "the perfect placement", do make some effort to only spam me with "opportunities" that are at least somewhat in alignment with my past experience, skill set, and stated objectives.

Thank you, and have a nice day.
fallenpegasus: amazon (Default)
Once someone starts using memcached, they tend to quickly find themselves in the state of: "my database servers overload and my site goes down if the memcached stops working". This isn't really surprising, quite often memcached was thrown into the stack because the database servers are melting under the load of the growing site.

But then they face an issue that is, as mathematicians and programmers like to call it, "interesting".

"How do I add more capacity without an outage?"

At first most people just live with having that outage. Most systems have regularly scheduled downtimes, and during that them the memcached clusters can be shut down, more storage nodes are added, and then it is all restarted, with the new distributed hash values for the new number of nodes.

Ironically, the more successful the site is, the more it grows, the more costly that outage becomes. And not linear to that growth either. The increasing cost is more on the order of the square of the growth, until they literally cannot afford it at all. As the cache gets bigger, it takes longer for it to rewarm, from minutes to hours to days. And as your userbase grows, the more people there are to suffer the poor experience of the cache warming up. The product of those two values is the "cost" of the outage. This is bad for user satisfaction, and thus is bad for retention, conversion, and thus revenue.

This can be especially frustrating in a cloud environment. In a physical datacenter, because you have to actually buy, configure, and install the hardware for a node, it somehow feels easier to justify needing an outage to add it to the cluster. But in a cloud, you can start a new node with the click of a button, without getting a purchase approval and filing a change plan. And also, in cloud environments, we all have been evangelizing "dyanamic growth", and "loosely coupled components", and "design for 100% uptime in the face of change". And yet here is this vary basic component, the HDT KVS cluster, that doesn't want to easily work that way.

There are ways to resize a DHT cluster while it is live, but doing so is an intricate and brittle operation, and requires the co-operation of all the clients, and there are no really good useful open source tools that make it easier. You have to develop your own custom bespoke operational processes and tools to do it, and they are likely to miss various surprising edge cases that get learned only by painful experience and very careful analysis. Which means that the first couple of times you try to resize your memcached cluster without a scheduled outage, you will probably have an unscheduled outage instead. Ouch.

One commonly proposed variety of solution is to make the memcached cluster nodes themselves more aware of each other and of the distributed hash table. Then you can add a new node (or remove a failed one), and the other nodes will tell each other about the change, and they all work together to recompute the new DHT, flow items back and forth to each other, put items into the new node, and try to keep this all more or less transparent to the memcached clients with some handwaving magic of proxying for each other.

And that, more or less, is what Gear6 has just done, under the name "Dynamic Services". We have released it first in our Cloud Cache distribution, initially on Amazon AWS EC2, and then on other cloud infrastructure systems. Soon next it will be in our software and appliance distributions.

This is an especially useful and neat in a cloud environment because the very act of requisitioning and starting a new node is something that the underlying infrastructure can provide. So you can go to the Gear6 Cloud Cache Web UI, and ask it to expand the the memcached cluster. That management system will interface to the EC2 API, and spin up more Gear6 memcached AMIs, and once they are running, add them to the cluster and then rehash the DHT. All while the cluster is serving live data.


(This entry was originally posted at my Gear6 corporate blog. Please comment there.)
fallenpegasus: amazon (Default)
Last year, when I was doing MySQL Professional Services, I encountered a client that was already using memcached. Something they said they were doing was they were caching the compiled bytecode of their PHP code in their memcached, which was a big win because they ran a large fleet of identical PHP based application servers. As soon as any one server encountered a given new piece of PHP, it would compile it and cache it, and immediately all the other app servers could use the same cached compiled bytecode, rather than repeat that work. They had recently changed to this approach, from caching the compiled bytecode on the disk of each app server.

I thought that was really neat, and kept digging elsewhere into their performance and scaling issues.

I had just assumed that this was some open source project, a modification or module to an existing PHP bytecode compiler / cacher / accelerator.

Except, it seems to not be. I've spend a couple of days now googling and reading up on the various "PHP accelerators", and they all appear to cache to disk or cache to local shared memory, but I can't find a reference anywhere to coupling one with memcached.

Am I just missing something, is my google-fu failing me, is this something this shop had written from scratch?

Do any of my readers know?
fallenpegasus: amazon (Default)
When I first started working with Gear6, I pushed for bundling a version of our memcache appliance as an AWS EC2 AMI. We had already done much of the work by making a version available as trial as a VM image and our current effort on a "Universal Distro", which can run on any qualified piece of server hardware.

Because of that existing effort, and because of the nimbleness and skill of my coworkers here at Gear6, we were able to move from proposal to first release in only a few weeks.

We are releasing our first version of the Gear6 Web Cache for the Cloud today of Amazon EC2, at ami-2411f34d for 32 bit, and ami-2611f34f for 64 bit. The 32 bit version is "free", you only pay Amazon's EC2 charge, and the 64 bit version (which can cache much more, return results faster, and handle more client connections) is linked to Amazon DevPay, so you will pay some money to Gear6. But it works out that the "gigabyte hour" cost for the new "high memory" EC2 types is actually less than the cheaper smaller "free" 32 bit size.

Jeff Barr at Amazon AWS just blogged about it. The press releases and tech press articles happen today.

It's been a learning experience for me. I got to deal more with the tech press, and learned more about how publicity and press releases work.

I also got to learn more and hard about Gear6's internal development and build processes. And I broke the build, several times, on the integration day. I need to get better at testing.

But I'm excited, I hope this goes well.
fallenpegasus: amazon (Default)
I've been involved with the Drizzle project since very soon after it began, working on it on nights and weekends.

That has just changed. As of today, I'm no longer a MySQL Professional Services consultant, instead I'm part of a new division of Sun

Much of my time is to be spent working on Drizzle, with a focus on plugin interfaces and making it work well in Extremely Large distributed environments.

I will be blogging heavily about what I am doing. How I sort that blogging out between my personal LiveJournal, my (mostly unused) Sun employee blog, and maybe some other blog system, remains TBD.

This is going to be fun.
fallenpegasus: amazon (Default)
Last November, I became an employee of MySQL Inc, which was owned by MySQB AB.

A few weeks ago, MySQL AB and MySQL Inc etc et al became wholly owned by Sun Microsystems, which immediately started rapidly digesting this new corporate M&A meal.

As of today, I am now an employee of Sun Microsystems.

For the most part, for the time being, nothing changes. I do the same kind of work, have the same lovely workplace (an array of local cafes), and the same annoying business travel. (And the same expense reporting "system". Ugh!)

We shall see what does change, next.

But for now, when someone asks me what my job is, my short answer will remain "I work on MySQL".
fallenpegasus: amazon (Default)
I've learned that Sun bought MySQL.

I have no idea right now how money I just made, given that my hiring options are priced in Swedish currency, and I don't know what (minute) percentage of the company they are.

This explains why the company was spending no time or effort at all fixing brokenness in our internal business processes. Why spend the time fixing our internal expense reporting flow, when we're just going to throw it away and use Suns.

This is certainly the end of BitKeeper here. Sun has converted over to Mercurial. And the BK license is biblically first commandment jealous about it's competators. Whether this means BZR or HG for MySQL, I don't know.
fallenpegasus: amazon (Default)
Bleah, not the best of days.

My car got towed again. Again, my fault. I parked next to a construction site (one of the many in this neighborhood). The last N times, there was no problem. This time, they brought in some heavy machinery, and it was in the way.

I got a call from my boss. I need to be in Minnesota on Monday morning for a two day gig. This is a task that had been on my calendar, then turned into "pending", and then vanished, and then today returned.

On a similar note, I'm already a day behind on the current gig, because my scheduler messed up the calendar and notifier. Not really her fault, my employer's internal business processes are some of the most high friction and messed up that I've ever experienced.

Because of all that, I had to cancel doing a favor for a friend, helping her take her cat to the vet. It's something that is very high stress for her, and I'm sorry I wasn't able to help.

Because I'm going to not be in town on Monday, I'm going to miss bowling, which means I'm going to miss seeing Amanda on her last time, as she leaves for Paris for 6 months on the 11th.

Because I'm going to not be in town on Tuesday, I'm going to to be able to take Bethieee to the airport.

I do have a commitment to Bethieee and to my other housemates to haul the rest of Sol's stuff from his room to his storage, stuff that he didn't move himself, because he ran out of time and energy before he left for Mexico last month. That's going to take a huge chunk of the weekend that I would rather not.

I'd rather spend it on, say, the Bondage Kinetics workshop that I had been invited to. But now can't fit in.

And my employer is being sticky about me living for the month of February in Hawaii. They've asked me to hold off on that until after the All Company Meeting. Which is the middle week of January. Asking me to hold off spending a month somewhere until two weeks before that month is pretty sticky.

They were apparently hoping to travel me a lot that month, and not even for storage engine work, but to shadow normal DBA PS troubleshooting gigs. I have less than zero desire to be a normal DBA troubleshooter. That's right down there with being a software tester again.

Project365 has been a dud so far today. I was originally going to photo the cat at the vet, but wasn't there for that. And then I was going to photo the car at the impound lot, and while I was there, it completely slipped my mind. I will have to shoot something tonight instead.

I finally got my smartphone syncing. Someone in RedHat/Fedora farked up the handling of Treos, and so I've not been able to sync it to the Thinkpad. Today I bought and installed TheMissingSync on the work MacBook, and then worked thru the frustration of making all that work, over bluetooth and/or USB. Having to reboot both my phone and my mac to get it finally working made me want to reach thru the ether, find a programmer at Palm and one at Apple, and proceed to beat them to death with their own keyboards.

And while I'm whining about work related stuff, my scheduler called me to let me know that our mutual boss had, well, to back up, our business processes are in an internal company MediaWiki. And the instructions on how to submit a gig timesheet were unclear and contradictory. I had got the howto from her, and what she said to do was something else entirely. So I reminded her that wiki's were made to be fixed. So she fixed it, and then our boss repremanded her, and undid her fix, so now it's wrong again. Words cannot express how much that makes my mind itch.
fallenpegasus: amazon (Default)
After I got off the plane in Toronto, I had to waste my time dealing with a pair of Canadian Immigration officers while they tried to decide what to do with me, and whether I needed to purchase a work permit or not. They finally decided not.

The distributed locationless internet-enabled model is colliding hard with the old regime of labor immigration and work visas.
fallenpegasus: amazon (Default)
For this international flight, I didn't give myself enough time, stepping thru the doors at SEA only 50 minutes before departure. Fortunately, the line at the counter was short, and the line at the TSA useless security theater was likewise very short.

If one has a T-Mobile cellphone account, that gets access to T-Mobile hotspots. Why can't AT&T be as accommodating? Or at the very least, have a straightforward way to charge access against one's cellphone. One could run ident/auth via a SMS, or looking at the SIM number, or something similar.

The EDGE/EVDO device that MySQL ordered for me doesnt fit this laptop (PCCard vs PCExpress), and won't even fit my new laptop, (MacBooks dont have card slots at all). So either IT will have to swap it for a USB device, take it back entirely, or let me handoff it to someone in the company with a MacBook Pro.

In theory, the Treo's USB cable will give the laptop internet access. In practice, there seems to be a bug in the current FC7 and FC8 in handling the Treo, no connection, no sync. Annoying on so many levels, and spending a few hours trying to make it work is really what decided me to get off desktop linux for my work computing environment for the time being.

I don't actually have that new company laptop yet, so I'm going to be doing this business trip out of amsu. Which has an old battery, and needs a memory upgrade. I need to decide whether to spend money upgrading it. There is a lot of value in integrating to one environment, and also a lot of value in keeping a completely "my" computer.

My preferred airlines are Alaska and SouthWest. Yet the agency, for this and the next two trips, has well, has booked me on AA. I last flew AA about 10 years ago, and I hated it. The seat pitch was such that I literally could not put my knees in front of me and my feet on the floor, and the cushion was old hard and flat.

They seem to have gotten better since then, but not very much so. There isn't room for my backpack under the seat in front of me, and I'm painfully wedged in trying to keep an angle where I can type and see my screen. And the overhead light is burned out, so no bookreading.
fallenpegasus: amazon (Default)
My new work phone is a Treo680 smartphone.

My work email (and now also my personal email) is accessed via IMAP.

I've getting really annoyed with the stock VersaMail program on the Treo. It's poor handling of subfolders is really stupid.

The two big competative contenders seem to be SnapperMail and ChatterMail.

Any experience? Recommendations?
fallenpegasus: amazon (Default)
On Nov 1, I started an undefined duration gig with MySQL Inc, as part of their Professional Services Department, with an emphasis on Storage Engines.

Mainly, this means I will write, tweak, and certify custom storage engines for specific MySQL users. My main job will not involve writing stuff that goes into the regular codebase, but who knows?
fallenpegasus: amazon (Default)
Good things about the Olympic Coffee and Roasting Cafe in Burien:
  • It's close to the SEA airport
  • They place quiet calm classical music
  • The internet connection is fast
  • It's not crowded
  • It's very close to a friend's place, who like to host me for lunch
  • The chair is comfortable, and the table is large enough


Not-so-good things about the Olympic Coffee and Roasting Cafe in Burien:
  • It looks like it's trying to be a clean "corporate coffee" place, out-starbucking Starbucks.
  • The drinks are mediocre.
  • The tea is worse than mediocre. And after drinking at Remedy, my standards have gotten very high, especially for greens.
  • There is no eye candy. Instead it's full of little clusters of senior citizens. Even the barista is a older lady.
  • Did I mention how disappointed I am in the green tea? Both weak and scorched.


Update. And the worst thing about Olympic Coffee and Roasting Cafe, is they close at 3pm. WTF!
fallenpegasus: amazon (Default)
Does the Victrola Café put on the loud bad music just to try to drive out the laptop users?!
fallenpegasus: amazon (Default)
So, there was a command line toolset for manipulating S3, called the Hanzoarchives hanzo-s3-tools. They were kind of iffy in the installation and documentation, but they worked, more or less. However, their handling of S3 ACLs was pretty poor, very not useful. And as the last straw, it seems to have gone offline and vanished.

On the other hand, there is a really good Perl library, Net::Amazon::S3. Many of my clients are more or less Perl shops, so I have been recommending that they install it. But that module lacks the scripts that bring functionality out to the command line.

So I'm writing them. Today, I've got s3mkbucket and s3rmbucket working, complete with handling of ACLs and TLS. Tomorrow will be s3putacl and s3getacl, and quickly following will be s3put, s3get, s3head, s3rm and s3ls.

All open source, of course. It will be available in my public repository, and I'll see if I can get the Net::Amazon::S3 to accept it as a patch. If he doesn't, I'll just make a Freshmeat project, and advertise it that way.
fallenpegasus: amazon (Default)
I think I may have had a couple more gigs just drop into my inbox.

I'm going to have to start tracking tasks, task completion, hours, clients, and potential clients.

There has to be an app that does this. Lawyers and graphic designers do it all the time. The trick will be, finding one that is open source and that doesn't suck.
fallenpegasus: amazon (Default)
... tweaking one's online resume, only to discover some misspellings, some of which have been there for probably more than ten years.
fallenpegasus: amazon (Default)
I don't know how effectively productive I was today, but I did discover that it's possible to make net-snmp 5.4 crash with a sigseg by sending it what I think is valid AgentX commands.

Of course, it shouldn't crash and sigseg, ever. Even if you pour utter garbage sent to the AgentX port.

This implies that net-snmp running all over the world could be crashed and possibly even powned via AgentX, which is Not Good, given that the AgentX sockets tend to be all-writable, and snmpd tends to run as root.

Profile

fallenpegasus: amazon (Default)
Mark Atwood

September 2017

S M T W T F S
     12
3456789
1011121314 1516
17181920212223
24252627282930

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 26th, 2017 03:33 am
Powered by Dreamwidth Studios