GPS serendipity: Florence Avenue, Sebastopol

17:57 July 14th, 2008 by terry. Posted under companies, me, other. 1 Comment »

img_0601.jpgI drove from Oakland up to the O’Reilly Foo camp last Friday. The O’Reilly offices are just outside Sebastopol, CA. I stopped at an ATM and my GPS unit got totally confused. So I took a few turns at random and wound up on Florence Avenue. I drove a couple of hundred meters and started seeing big colorful structures out the front of many houses. They were so good I stopped, got out my camera, and took a whole bunch of pictures.

I talked to a man washing his car in his driveway. He told me that “Patrick” had created all the figures, and installed them on the front lawns. I got the impression that it was all free. Soon after I found the house that was unmistakably Patrick’s and seeing a man loading things into a pickup truck I went up and asked if he was Patrick. It was him and we had a friendly talk (mainly me telling him he was amazing). He gave me a calendar of his work.

Click on the thumbnails below to see bigger versions. There’s even a FC Barcelona structure. As I found out later, lots of people (of course) have seen these sculptures. When I got to Foo, there was one (image above) outside the O’Reilly office. Google for Patrick Amiot or Florence Avenue, Sebastopol and you’ll find much more. And Patrick has his own web site.

img_0556.jpgimg_0558.jpgimg_0560.jpgimg_0561.jpgimg_0567.jpgimg_0568.jpgimg_0569.jpgimg_0570.jpgimg_0572.jpgimg_0573.jpgimg_0579.jpgimg_0581.jpgimg_0582.jpgimg_0585.jpgimg_0586.jpgimg_0589.jpgimg_0592.jpgimg_0595.jpgimg_0599.jpgimg_0575.jpgimg_0577.jpgimg_0564.jpgimg_0566.jpg

AddThis Social Bookmark Button

Sequoia Capital is the new Delphic Oracle

13:46 June 17th, 2008 by terry. Posted under books, companies. 2 Comments »

Consulting the OracleIn a belated attempt to educate myself by reading some of the things that many people study in high school, I’m reading The Histories of Herodotus. It’s highly entertaining and easy to read. I read The History of the Peloponnesian War by Thucydides a few years ago and enjoyed that even more. Herodotus is the more colorful, but the speeches and drama in Thucydides are fantastic.

There were lots of oracles in classical Greece, and elsewhere.Of the Greek oracles, the Delphic Oracle was, and still is, the best known. People (kings, dictators, emperors, wannabees) would send questions like “Should I invade Persia?” to the oracle and receive typically ambiguous or cryptic responses. We have a large number of famous oracular replies. Herodotus recounts how Croesus decided to test the various oracles by sending them all the same question, asking what he was doing on a certain day. The oracle at Delphi won hands down. Croesus then immediately put more pressing matters to the Delphic oracle, famously misinterpreted the pronouncements, and was duly wiped out by the Persians.

Imagine yourself in the position of the Delphic oracle. You’ve got all sorts of rulers and aspiring rulers constantly sending you their thoughts and questions, asking what you think. You’re in a unique position, simultaneously privy to the most secret potential plans of many powerful rulers. You really know what’s going on. You know what’s likely to succeed or to fail, and why. You get to give the thumbs up or thumbs down. By virtue of your position and the information flowing through your temple, you can direct traffic; you can shape and create history. You might even be tempted to profit from your knowledge. Your successful accurate pronouncements invariably reap you rich tribute.

OK, you can see where this is leading…

Sequoia Capital, and other well-known venture firms, have a somewhat similar position. They have thousands of leaders and wannabee leaders bringing them their detailed secret plans, proposing to mount armies, found cities, build empires, to attack the modern-day Persians, etc. By virtue of their unusual position they probably have a pretty good idea of what might work, and why. Using this knowledge, but without necessarily revealing sources, they can cryptically but assuredly state “oh, that’ll never work” or they can encourage ideas that are new and which they can see will somehow fit and succeed. If company X has consulted the oracle, disclosing a detailed plan to go left, and company Y plans to attack from the right, well…. why not?

Entrepreneurs beg an audience, get a tiny slice of time to make their pitch, and occasionally receive rare clear endorsements. Much more frequently they are left to scratch their heads over cryptic, ambiguous and unexplained responses (and non-responses). You can bet the Delphic oracle didn’t sign NDAs either.

It’s stretching it too far to seriously claim that Sequoia is the modern-day equivalent of the Delphic oracle. But on the other hand, over 2500 years have elapsed, so you’d expect a few changes.

AddThis Social Bookmark Button

Random thoughts on Twitter

02:48 June 9th, 2008 by terry. Posted under companies, tech. 12 Comments »

TwitterI’ve spent a lot of time thinking about Twitter this year. Here are a few thoughts at random.

Obviously Twitter have tapped into something quite fundamental, which at a high level we might simply call human sociability. We humans are primates, though there’s a remarkably strong tendency to forget or ignore this. We know a lot about the intensely social lives of our fellow primate species. It shouldn’t come as a surprise that we like to Twitter amongst ourselves too.

Here are a couple of interesting (to me) reasons for the popularity of Twitter.

One is that many people are in some sense atomized by the fact that many of us now work in an isolated way. Technical people who can do their work and communicate over the internet probably see less of their peers than others do. That’s just a general point, it’s not specific to Twitter or to 2008. It would have seemed unfathomably odd to humans 50 years ago to hear that many of us would be doing a large percentage of our work and social communication via machines, interacting with people who we don’t otherwise know, and who we rarely or never meet face to face. The rise of internet-based communication is obviously(?) helping to fill a gap created by this generational change.

The second point is specific to Twitter. Through brilliance or accident, the form of communication on Twitter is really special. Building a social network on nothing-implied asymmetric follower relationships is not something I would have predicted as leading to success. Maybe it worked, or could have all gone wrong, just due to random chance. But I’m inclined to believe that there’s more to it than that. Perhaps we’re all secretly voyeurs, or stickybeaks (nosy-parkers). Perhaps we like to see one half of conversations and be able to follow along if we like. Perhaps there’s a small secret thrill to promiscuously following someone and seeing if they follow you back. I don’t know the answer, but as I said above I do think Twitter have tapped into something interesting and strong here. There’s a property of us, we simple primates, that the Twitter model has managed to latch onto.

I think Twitter should change the dynamics for new users by initially assigning them ten random followers. New users can easily follow others, but if no-one is following them….. why bother? New user uptake would be much higher if they didn’t have the (correct) feeling that they were for some reason expected to want to Twitter in a vacuum. You announce a new program, called e.g., Twitter Guides and ask for people to volunteer to be guides (i.e., followers) of newbees. Lend a hand, make new friends, maybe get some followers yourself, etc. Lots of people would click to be a Guide. I bet this would change Twitter’s adoption dynamics. If you study things like random graph theory and dynamic systems, you know that making small changes to (especially initial) probabilities can have a dramatic effect on overall structure. If Twitter is eventually to reach a mass audience (whatever that means), it should be an uncontestable assertion that anything which significantly reduces the difficulty for new users to get into using it is very important.

Twitter should probably fix their reliability issues sometime soon.

I say “probably” because reliability and scaling are obviously not the most important things. Twitter has great value. It must have, or it would have lost its users long ago.

There’s a positive side to Twitter’s unreliability. People are amazed that the site goes down so often. Twitter gets snarled up in ways that give rise to a wide variety of symptoms. The result seems to be more attention, to make the service somehow more charming. It’s like a bad movie that you remember long afterwards because it wasn’t good. We don’t take Twitter for granted and move on the next service to pop up - we’re all busy standing around making snide remarks, playing armchair engineer, knowing that we too might face some of these issues, and talking, talking, talking. Twitter is a fascinating sight. Great harm is done by its unreliability, but the fact that their success so completely flies in the face of conventional wisdom is fascinating - and the fact that we find it so interesting and compelling a spectacle is fantastic for Twitter. They can fix the scaling issues, I hope. They should prove temporary. But the human side of Twitter, its character as a site, the site we stuck with and rooted for when times were so tough, the amazing little site that dropped to the canvas umpteen times but always got back to its feet, etc…. All that is permanent. If Twitter make it, they’re going to be more than just a web service. The public outages are like a rock musician or movie star doing something outrageous or threatening suicide - capturing attention. We’re drawn to the spectacle and the drama. We can’t help ourselves: it is our selves. We love it, we hate it, it brings us together to gnash our teeth when it’s down. But do we leave? Change the channel? No way.

Twitter is both the temperamental child rock star we love and, often, the medium by which we discuss it - an enviable position!

I’m reminded of a trick I learned during tens of thousands of miles of hitch-hiking. A great place to try for a lift is on a fairly high-speed curve on the on-ramp to the freeway / motorway / autopista / autoroute etc. Stand somewhere where a speeding car can only just manage a stop and only just manage to pull in away from the following traffic. Conventional wisdom tells you that you’ll never get a ride. But the opposite is true - you’ll get a ride extremely quickly. Invariably, the first thing the driver says when you get in is “Why on earth where you standing there? You’re very lucky I managed to stop. No-one would have ever picked you up standing there!” I’ve done this dozens of times. Twitter—being incredibly, unbelievably, frustratingly, unreliable and running contrary to all received wisdom—is a powerful spectacle. Human psyche is a funny thing. That’s a part of why it’s probably impossible to foretell success when mass adoption is required.

If I were running Twitter, apart from working to get the service to be more reliable, I’d be telling the engineering team to log everything. There’s a ton of value in the data flowing into Twitter.

Just as Google took internet search to a new level by link analysis, there’s another level of value in Twitter that I don’t think has really begun to be tapped yet.

PageRank, at least as I understand its early operation, ran a kind of iterative relaxation algorithm assigning and passing on credit via linked pages. A similar thing is clearly possible with Twitter, and some people have commented on this or tried to build little things that assign some form of score to users. But I think there’s a lot more that can be done. Because the Twitter API isn’t that powerful (mainly because you’re largely limited to querying as a single authorized user) and certainly because it’s rate-limited to just 70 API calls an hour, this sort of analysis will need to be done by Twitter themselves. I’m sure they’re well aware of that. Rate limiting probably helps them stay up, but it also means that the truly interesting and valuable stuff can’t be done by outsiders. I have no beef with that - I just wish Twitter would hurry up and do some of it.

Some examples in no order:

  • The followers to following ratio of a Twitter user is obviously a high-level measure of that user’s “importance” (in some Twitter sense of importance). But there’s more to it than that. Who are the followers? Who do they follow, who follows them? Etc. This leads immediately back to Google PageRank.
  • If a user gets followed by many people and doesn’t follow those people back, what does it say about the people involved? If X follows Y and Y then goes to look at a few pages of X’s history but does not then follow X, what do we know?
  • If X has 5K followers and re-tweets a twit of Y, how many of X’s followers go check out and perhaps follow Y? What kind of people are these? (How do you advertise to them, versus others?)
  • Along the lines of co-citation analysis, Twitter could build up a map showing you who you might follow. I.e., you can get pairwise distances between users X and Y by considering how many people they follow in common and how many they follow not-in-common. That would lead to a people you should be following that you’re not kind of suggestion.
  • Even without co-citation analysis (or similar), Twitter should be able to tell me about people that many of the people I follow are following but whom I am not following. I’d find that very useful.
  • Twitter could tell me why someone chooses to follow me. What were they looking at (if anything) before they decided to follow me? I.e., were they browsing the following list of someone else? Did they see my user name mentioned in a Tweet? Did they come in from an outside link? Would a premium Twitter user pay to have that information?
  • Twitter has tons of links. They know the news as it happens. They could easily create a news site like Digg.
  • In some sense the long tail of Twitter is where the value is. For instance, it doesn’t mean much if a user following 10K others follows someone. But if someone is following just 10 people, it’s much more significant. There’s more information there (probably). The Twitter mega users are in some way uninteresting - the more people they have following them and the more they follow, the less you really know (or care) about them. Yes, you could probably figure out more if you really wanted to, but if someone has 10K followers all you really know is that they’re probably famous in some way. If they add another 100 followers it’s no big deal. (I say all this a bit lightly and generally - the details might of course be fascinating and revealing - e.g., if you notice Jason Calacanis and Dave Winer have suddenly started @ messaging each other again it’s like IRC coming back from a network split :-))
  • Similarly if someone with a very high followers to following ratio follows a Twitter user who has just a couple of followers, it’s a safe bet that those two are somehow friends with a pre-existing relationship.
  • I bet you could do a pretty good job of putting Twitter users into boxes just based on their overall behavior, something like the 16 Myers-Briggs categories. Do you follow people back when they follow you? Do you @ answer people who @ address you (and Twitter knows when you’ve seen the original message)? Do you send @ messages to people (and how influential are those people)? Do those people @ you back (and how influential those people are says something about how interesting / provocative you are)? Do you follow tons and tons of people? Do you follow people and then un-follow them if they don’t follow you back? Do you follow random links in other people’s Twitters, and are those links accompanied by descriptive text or tinyurl links? Do you @ message people after you follow their links? Do your Twitter times follow a strict pattern, or are you on at all hours, or suddenly spending days without Twittering? Do you visit and just read much more than you tweet? How much old stuff do you read? Do you tend to talk in public or via DM? Are your tweets public?All that without even considering the content of your Twitters.
  • Could Twitter become a search engine? That’s not a 100% serious question, but it’s worth considering. I don’t mean just making the content of all tweet searchable, I mean it with some sort of ranking algorithm, again perhaps akin to PageRank. If you somehow rank results by the importance or closeness of the user whose tweets match the search terms, you might have something interesting.
  • Twitter also presumably know who’s talking about whom in the DM backchat. They can’t use that information in obvious way, but it’s of high value.

I could go on for hours, but that’s more than enough for now. I don’t feel like any of the above list is particularly compelling, but I do think the list of nice things they could be doing is extremely long and that Twitter have only just begun (at least publicly) to tap into the value they’re sitting on.

I think Google should buy Twitter. They have what Twitter needs: 1) engineering and scale, 2) link analysis and algorithm brilliance, and 3) they’re in a position to monetize the value illustrated above (via their search engine, that already has ads) without pissing off the Twitter community by e.g., running ads on Twitter. What percentage of Twitter users also use Google? I bet it’s very high.

AddThis Social Bookmark Button

Google maps miles off on Barcelona hotel

22:19 April 22nd, 2008 by terry. Posted under barcelona, companies. 1 Comment »

hotel sofiaI’m a big fan of Google maps.

But sometimes they get things very very wrong. In January I posted this example of them getting the location of the San Francisco international airport way wrong.


The screenshot linked above is supposed to show the location of the hotel Princesa Sofia in Barcelona. They have the address right, the zip code looks about right, but the location is about 30 miles off.

Caveat turista.

AddThis Social Bookmark Button

Individuality, transparency, and the cult of impersonality

23:58 April 3rd, 2008 by terry. Posted under companies. No Comments »

entrepreneursI’ve been talking to people about raising money for Fluidinfo over the last 5 months. Along the way I’ve had plenty of time to reflect on the process. I have a series of blog posts saved up. They’re mainly about oddities and discrepancies between appearance and reality. I plan to write them up gradually. Here’s one I wrote earlier this year but which I never finished. It’s still unpolished - but what the hell. This is a blog, after all.

In September 2007, Fred Wilson posted asking whether VCs should blog. The first thing I thought about when I read his title was transparency.

Increased transparency is a side-effect of easier communication between people. There are many relatively opaque human institutions and professions that have persisted for decades or centuries, relying on the fact that their subjects or customers were unable to communicate easily, to self-organize, to be widely heard, etc. Exclusionary access to knowledge is the foundation of power. As barriers to communication begin fall, openness and transparency increase. Cracks appear in the walls. At that point anything can happen. The typical response is a heavy-handed crackdown to maintain or regain control. Examples are so numerous and widespread that any small sample would be woefully inadequate. This never-ending dynamic is just a part of the human condition and the nature of power.

But in some arenas, especially when there’s a market or in repeated games (a rich area of game theory), there may be a competitive advantage to (usually) smaller players who act disruptively to deliberately increase transparency. Those players differentiate themselves by (often informally) defecting from the (often tacit) group of gatekeepers. Advantages may include potential clients tending to trust you more, wide attention, and better opportunities. If increased transparency gets a foothold, there can then follow a kind of race to the bottom as players reveal increasingly more formerly-inside knowledge. This is also a drama that has been played out many times, and it’s fascinating and educational to watch.

We’re now seeing the cracks open wide in the VC world. The rise of the VC blogger has provided us with hundreds of eye-holes through which we can get some view of the works. The VC bloggers are implicitly calling out their less open colleagues, challenging them to open up. An extreme example is Venture Hacks, written by VC industry insiders, whose aim is to “open source” VC strategy in order to aid entrepreneurs. Then there’s The Funded, which shook the VC world as formerly isolated entrepreneurs got together (and in relative privacy, no less!) to exchange opinions and experiences. While The Funded is unquestionably biased, and based on small sample sizes, part of the fuss was unquestionably about control.

I awoke yesterday with another thought about transparency, why VCs should blog, and the curious dynamics of the VC/entrepreneur dance.

VCs should also blog because it allows entrepreneurs to see who they are as people. That may sound trite, but I think it’s quite interesting.

I’ve attended probably 50 events where one or more VCs takes the stage and gives some kind of a presentation. The presentations are very often excruciatingly dull. That’s because they’re filled to bursting with VC clichés. Even when VCs make an effort to differentiate themselves they tend to use clichés! They’re active investors, they have deep experience, broad contacts, want to help management, etc. I sat in the audience at Le Web a couple of weeks ago while several investors were on stage doing their thing. I wound up laughing with the guy who sat next to me, who I’d never met before. We rolled eyes at each other, passed notes, and ended up whispering nasty and disrespectful comments during the presentation. We were obviously there because we were interested to learn more, but we were served up standard VC fare. Steak and eggs.

The interesting thing is that entrepreneurs are a wildly idiosyncratic bunch. One would therefore expect that they’d tend to highly appreciate signs of character and individuality in VCs. Meanwhile VCs tend to keep things buttoned down and insist on making dreary presentations.

If nothing else, the existing dynamics are amusing. Wild-eyed, power-hungry, idiosyncratic, unconventional, and often deeply weird entrepreneurs are trying to act straight, to project an image of reliability, stability, balance, good sense, etc., in order to get funded. Simultaneously, the VC companies the entrepreneurs are evaluating, and who partly rely on being attractive to entrepreneurs, go to lengths to homogenize themselves - in the process washing out the very thing that an entrepreneur might find most reassuring.

There’s opportunity in this discrepancy. VCs who blog about themselves, in addition to talking about their industry and flogging their portfolio companies, may have tapped into this. Allowing entrepreneurs to see what you’re like as a person is a differentiator.

AddThis Social Bookmark Button

Another thought on Mahalo

04:04 March 9th, 2008 by terry. Posted under companies. 1 Comment »

super smash brothersBack in November I wrote some comments on Mahalo in an article titled The Mahalo-Wikipedia-Google love triangle.

I just read Jason’s comment that they’re staying up late to make a Mahalo page on the Super Smash Bros Brawl Walkthrough.

That’s interesting.

Mahalo is supposed to be making things so easy that our grandparents can use it. But my grandparents are all dead, and they certainly wouldn’t be playing Super Smash Bros Brawl if they were alive.

Thinking about this, I was struck by another thought.

Who’s this page for? Why stay up til after midnight to buy and play a kids’ game and document it on Mahalo? Couldn’t it wait? What are those guys smoking up there in LA?

Then the penny dropped. Another penny. If you’re the first site on the web that’s got the Super Smash Bros Brawl walkthrough, kids are going to go to your page first thing tomorrow. And they’re not just going to your page, they’re going to Mahalo.

They’re also going to link to your page, and we all know what that means.

Meanwhile, the latte-sipping crowd who like the feel of Wikipedia being an online encyclopedia can look down their noses and have joint editing catfights over just what should be on the Wikipedia page.

If Mahalo keep it up, might they not look like the default cool destination for baby-chino-sipping teens and pre-teens to go to to find things about their popular culture? Just like you and I head to Wikipedia to look things up, might not kids make Mahalo their destination of choice to find current cultural stuff? Might not Wikipedia look to them as slow-moving, quaint and, dare I say it, even out of date as a print encyclopedia now looks to us grown ups?

I think I’m very slow on the uptake on this one. But I don’t spend much time thinking about Mahalo, so perhaps I can be excused. My kids are into Webkinz and of course there’s Webkinz stuff in Mahalo. My grandparents aren’t into that either.

Jason’s point, that Mahalo isn’t designed for geeks like us, is well taken. But I’ve only heard examples of how older folks want a simplified and more guided experience. Going for teens and pre-teens would be nice too. It might even be worth staying up after midnight for.

Obvious in retrospect, I guess, but I was slow to see it.

AddThis Social Bookmark Button

Worst of the web award: Cheaptickets

16:22 February 14th, 2008 by terry. Posted under companies, me, tech. 5 Comments »

Here’s a great example of terrible (for me at least) UI design.

I was just trying to change a ticket booking at Cheaptickets. Here’s the interface for selecting what you want to change (click to see the full image).

cheaptickets

As you can see, I indicated a date/time change on my return flight. When I clicked on the continue button, I got an error message:

An error has occurred while processing this page. Please see detail below. (Message 1500)

Please select flight attributes to change.

I thought there was some problem with Firefox not sending the information that I’d checked. So I tried again. Then I tried clicking a couple of the boxes. Then I tried with Opera. Then I changed machines and tried with IE on a windows box. All of these got me the exact same error.

I looked at the page several times to see if I’d missed something - like a check box to indicate which of the flights to change. I figured Cheaptickets must have an error server side. Then I thought come on, you must be doing something wrong.

Then I figured it out. Can you?

AddThis Social Bookmark Button

Social Graph foo camp was a blast

00:11 February 9th, 2008 by terry. Posted under companies, travel. No Comments »

foo camp logoI spent last weekend at the Open Social Foo camp held on the O’Reilly campus in Sebastopol, CA. The camp was organized by David Recordon and Scott Kveton, with sponsorship from various companies, especially including O’Reilly. I was lucky enough to have my airfare paid for, so lots of thanks to all concerned for that.

The camp was great. Very few people actually camped, almost everyone just found somewhere to sleep in the O’Reilly offices. Many of us didn’t sleep that much anyway.

There’s something about the modern virtual lifestyle that so many of us lead that leaves a real social hole. It’s been about 20 years since I really hung out at all hours with other coders. It’s something I associate most strongly with being an undergrad, with working at Micro Forté, and then in doing a lot of hacking as a grad student at The University of Waterloo in Canada.

So even though it was just 48 hours at the foo camp, it was really great. It’s not often I have the pleasurable feeling of being surrounded by tons of people who know way way more than I do about almost everything under discussion. That’s not meant to sound arrogant - I mean that I don’t get out enough, and I don’t live in SF, etc. It’s nice to have spent many years hanging around universities studying all sorts of relatively obscure and academic topics, and sometimes you wonder what everyone else was doing. Some of those people spent the years hacking really deeply on systems, and their knowledge appears encyclopedic next to the smattering of stuff I picked up along the way. It’s nice to bump into a whole bunch of them at once. It was extremely hard to get a word in in many of the animated conversations, which reminded me at times of discussions at the Santa Fe Institute. That’s a bit of a pain, but it’s still far better than some alternatives - e.g., not having a room full of super confident deeply knowledgeable people who all want to have their say, even if that means trampling all over others, ignoring what the previous speaker said, not leaving even 1/10th of a second conversational gap, and just plain old bull-dozering on while others try to jump in and wrest away control of the conversation.

I could write much more about all this.

I also played werewolf with up to 20 others on the Saturday night. In some ways I don’t really like the game, but it’s fun to sit around with a bunch of smart people of all ages who are all trying to convince each other they’re telling the truth when you know for sure some are lying. I was up until 4:30am that night. I went to the office I slept in on the Friday night, but found it had about 10 people still up, all talking about code. When I got up at 8am the next morning, they were all still there, still talking about code. I felt a bit guilty, like a glutton, for allowing myself three and a half hours sleep. Nice.

AddThis Social Bookmark Button

S3 numbers revisited: six orders of magnitude does matter

17:18 January 29th, 2008 by terry. Posted under companies. 1 Comment »

OK…. I should have realized in my original posting that the Oct 2007 10,000,000,000,000 objects figure was the source of the problem. I knew S3 could not be doubling every week, and that Amazon could not be making $11B a month, but didn’t see the now-obvious error in the input.

So what sort of money are they actually making?

Don MacAskill pointed me to this article at Forbes which says the number of objects at the end of 2007 was up to 14B from 10B in October. So let’s suppose the number now stands at 15B (1.5e10) and that Amazon are currently adding about 1B objects a month.

I’ll leave the other assumptions alone, for now.

Amazon’s S3 pricing for storage is $0.15 per GB per month. Assume all this data is stored on their cheaper US servers and that objects take on average 1K bytes. So that’s roughly 1.5e10 * 1e3 / 1e9 = 1.5e4 gigabytes in storage, for which Amazon charges $0.15 per month, or $2250.

Next, let’s do incoming data transfer cost, at $0.10 per GB. That’s simply 2/3rds of the data storage charge, so we add another 2/3 * $2250, or $1500.

Then the PUT requests that transmit the new objects: 1B new objects were added in the last month. Each of those takes a PUT, and these are charged at $0.01 per thousand, so that’s 1e9 / 1e3 * $0.01, or $10,000.

Lastly, some of the stored data is being retrieved. Some will just be backups, and never touched, and some will simply not be looked at in a given month. Let’s assume that just 1% of all (i.e., not just the new) objects and data are retrieved in any given month.

That’s 1.5e10 * 1e3 * 0.01 / 1e9 = 150 GB of outgoing data, or 0.15K TB. That’s much less than 10TB, so all this goes out at the highest rate, $0.18 per GB, giving another $27 in revenue.

And if 1% of objects are being pulled back, that’s 1.5e10 * 0.01 = 1.5e8 GET operations, which are charged at $0.01 per 10K. So that’s 1.5e8 / 1e4 * $0.01 = $150 for the GETs.

This gives a total of $2250 + $1500 + $10,000 + $27 + $150 = $13,927 in the last month.

And that doesn’t look at all like $11B!

Where did all that revenue go? Mainly it’s not there because Amazon only added 1e9 objects in the last month, not 1e15. That’s six orders of magnitude. So instead of $11B in PUT charges, they make a mere $11K. That’s about enough to pay one programmer.

I created a simple Amazon S3 Model spreadsheet where you can play with the numbers. The cells with the orange background are the variables you can change in the model. The variables we don’t have a good grip on are the average size of objects and the percentage of objects retrieved each month. If you increase average object size to 1MB, revenue jumps to $3.7M.

BTW, the spreadsheet has a simplification: regarding all data as being owned by one user, and using that to calculate download cost. In reality there are many users, and most of them will be paying for all their download data at the top rate. Also note that my % of objects retrieved is a simplification. Better would be to estimate how many objects are retrieved (i.e., including objects being retrieved multiple times) as well as estimating the download data amount. I roll these both into one number.

AddThis Social Bookmark Button

Google maps gets SFO location waaaay wrong

22:16 January 28th, 2008 by terry. Posted under companies, tech. 1 Comment »

google-sfoBefore leaving Barcelona yesterday morning, I checked Google maps to get driving directions from San Francisco International airport (SFO) to a friend’s place in Oakland.

Google got it way wrong. Imagine trying to follow these instructions if you didn’t know they were so wrong. Click on the image to see the full sized map. Google maps is working again now.


AddThis Social Bookmark Button

Amazon S3 to rival the Big Bang?

00:40 January 28th, 2008 by terry. Posted under companies, tech. 2 Comments »

Note: this posting is based on an incorrect number from an Amazon slide. I’ve now re-done the revenue numbers.

We’ve been playing around with Amazon’s Simple Storage Service (S3).

Adam Selipsky, Amazon VP of Web Services, has put some S3 usage numbers online (see slides 7 and 8). Here are some numbers on those numbers.

There were 5,000,000,000 (5e9) objects inside S3 in April 2007 and 10,000,000,000,000 (1e13) in October 2007. That means that in October 2007, S3 contained 2,000 times more objects than it did in April 2007. That’s a 26 week period, or 182 days. 2,000 is roughly 211. That means that S3 is doubling its number of objects roughly once every 182/11 = 16.5 days. (That’s supposing that the growth is merely exponential - i.e., that the logarithm of the number of objects is increasing linearly. It could actually be super-exponential, but let’s just pretend it’s only exponential.)

First of all, that’s simply amazing.

It’s now 119 days since the beginning of October 2007, so we might imagine that S3 now has 2119/16.5 or about 150 as many objects in it. That’s 1,500,000,000,000,000 (1.5e15) objects. BTW, I assume by object they mean a key/value pair in a bucket (these are put into and retrieved from S3 using HTTP PUT and GET requests).

Amazon’s S3 pricing for storage is $0.15 per GB per month. Assume all this data is stored on their cheaper US servers and that objects take on average 1K bytes. These seem reasonable assumptions. (A year ago at ETech, SmugMug CEO Don MacAskill said they had 200TB of image data in S3, and images obviously occupy far more than 1K each. So do backups.) So that’s roughly 1.5e15 * 1K / 1G = 1.5e9 gigabytes in storage, for which Amazon charges $0.15 per month, or $225M.

That’s $225M in revenue per month just for storage. And growing rapidly - S3 is doubling its number of objects every 2 weeks, so the increase in storage might be similar.

Next, let’s do incoming data transfer cost, at $0.10 per GB. That’s simply 2/3rds of the data storage charge, so we add another 2/3 * $225M, or $150M.

What about the PUT requests, that transmit the new objects?

If you’re doubling every 2 weeks, then in the last month you’ve doubled twice. So that means that a month ago S3 would have had 1.5e15 / 4 = 3.75e14 objects. That means 1.125e15 new objects were added in the last month! Each of those takes an HTTP PUT request. PUTs are charged at one penny per thousand, so that’s 1.125e15 / 1000 * $0.01.

Correct me if I’m wrong, but that looks like $11,250,000,000.

To paraphrase a scene I loved in Blazing Saddles (I was only 11, so give me a break), that’s a shitload of pennies.

Lastly, some of that stored data is being retrieved. Some will just be backups, and never touched, and some will simply not be looked at in a given month. Let’s assume that just 1% of all (i.e., not just the new) objects and data are retrieved in any given month.

That’s 1.5e15 * 1K * 1% / 1e9 = 15M GB of outgoing data, or 15K TB. Let’s assume this all goes out at the lowest rate, $0.13 per GB, giving another $2M in revenue.

And if 1% of objects are being pulled back, that’s 1.5e15 * 1% = 1.5e13 GET operations, which are charged at $0.01 per 10K. So that’s 1.5e13 / 10K * $0.01 = $15M for the GETs.

This gives a total of $225M + $150M + $11,250M + $2M + $15M = $11,642M in the last month. That’s $11.6 billion. Not a bad month.

Can this simple analysis possibly be right?

It’s pretty clear that Amazon are not making $11B per month from S3. So what gives?

One hint that they’re not making that much money comes from slide 8 of the Selipsky presentation. That tells us that in October 2007, S3 was making 27,601 transactions per second. That’s about 7e10 per month. If Amazon was already doubling every two weeks by that stage, then 3/4s of their 1e13 S3 objects would have been new that month. That’s 7.5e12, which is 100 times more transactions just for the incoming PUTs (no outgoing) than are represented by the 27,601 number. (It’s not clear what they mean by transaction - I mean what goes on in a single transaction.)

So something definitely doesn’t add up there. It may be more accurate to divide the revenue due to PUTs by 100, bringing it down to a measly $110M.

An unmentioned assumption above is that Amazon is actually charging everyone, including themselves, for the use of S3. They might have special deals with other companies, or they might be using S3 themselves to store tons of tiny objects. I.e., we don’t know that the reported number is of paid objects.

There’s something of a give away the razors and charge for the blades feel to this. When you first see Amazon’s pricing, it looks extremely cheap. You can buy external disk space for, e.g., $100 for 500GB, or $0.20 per GB. Amazon charges you just $0.18 per GB for replicated storage. But that’s per month. A disk might last you two years, so we could conclude that Amazon is e.g., 8 or 12 times more expensive, depending on the degree of replication. But you don’t need a data center or to grow (or shrink) a data center, cooling, employees, replacement disks—all of which have been noted many times—so the cost perhaps isn’t that high.

But…. look at those PUT requests! If an object is 1K (as above), it takes 500M of them to fill a 500GB disk. Amazon charges you $0.01 per 1000, so that’s 500K * $0.01 or $5000. That’s $10 per GB just to access your disk (i.e., before you even think about transfer costs and latency), which is about 50 times the cost of disk space above.

In paying by the PUT and GET, S3 users are in effect paying Amazon for the compute resources needed to store and retrieve their objects. If we estimate it taking 10ms for Amazon to process a PUT, then 1000 takes 10 seconds of compute time, for which Amazon charges $0.01. That’s nearly $26K per month being paid for machines to do PUT storage, which is 370 times more expensive than what Amazon would charge you to run a small EC2 instance for a month. Such a machine probably costs Amazon around $1500 to bring into service. So there’s no doubt they’re raking it in on the PUT charges. That makes the 5% margins of their retailing operation look quaint. Wall Street might soon be urging Bezos to get out of the retailing business.

Given that PUTs are so expensive, you can expect to see people encoding lots of data into single S3 objects, transmitting them all at once (one PUT), and decoding when they get the object back. That pushes programmers towards using more complex formats for their data. That’s a bad side-effect. A storage system shouldn’t encourage that sort of thing in programmers.

Nothing can double every two weeks for very long, so that kind of growth simply cannot continue. It may have leveled out in October 2007, which would make my numbers off by roughly 2119/16.5 or about 150, as above.

When we were kids they told us that the universe has about 280 particles in it. 1.5e15 is already about 250, so only 30 more doubling are needed, which would take Amazon just over a year. At that point, even if all their storage were in 1TB drives and objects were somehow stored in just 1 byte each, they’d still need about 240 disk drives. The earth has a surface area of 510,065,600 km2 so that would mean over 2000 Amazon disk drives in each square kilometer on earth. That’s clearly not going to happen.

It’s also worth bearing in mind that Amazon claims data stored into S3 is replicated. Even if the replication factor is only 2, that’s another doubling of the storage requirement.

At what point does this growth stop?

Amazon has its Q4 2007 earnings call this Wednesday. That should be revealing. If I had any money I’d consider buying stock ASAP.

AddThis Social Bookmark Button

Final straws for Mac OS X

16:54 January 24th, 2008 by terry. Posted under companies, tech. 7 Comments »

I’ve had it with Mac OS X.

I’m going to install Linux on my MacBook Pro laptop in March once I’m back from ETech.

I’ve been thinking about this for months. There are just so many things I don’t like about Mac OS X.


Yes, it’s beautiful, and there are certainly things I do like (e.g., iCal). But I don’t like:

  • Waiting forever when I do a rm on a big tree
  • Sitting wondering what’s going on when I go back to a Terminal window and it’s unresponsive for 15 seconds
  • Weird stuff like this
  • Case insensitive file names (see above problem)
  • Having applications often freeze and crash. E.g. emacs, which basically never crashes under Linux

I could go on. I will go on.

I don’t like it when the machine freezes, and that happens too often with Mac OS X. I used Linux for years and almost never had a machine lock up on me. With Mac OS X I find myself doing a hard reset about once a month. That’s way too flaky for my liking.

Plus, I do not agree to trade a snappy OS experience for eye candy. I’ll take both if I can have them, but if it’s a choice then I’ll go back to X windows and Linux desktops and fonts and printer problems and so on - all of which are probably even better than they already were a few years back.

This machine froze on me 2 days ago and I thought “Right. That’s it.” When I rebooted, it was in a weird magnifying glass mode, in which the desktop was slightly magnified and moved around disconcertingly whenever I moved the mouse. Rebooting didn’t help. Estéve correctly suggested that I somehow had magnification on. But, how? WTF is going on?

And, I am not a fan of Apple.

In just the last two days, we have news that 1. Apple crippled its DTrace port so you can’t trace iTunes, and 2. Apple QuickTime DRM Disables Video Editing Apps so that Adobe’s After Effects video editing software no longer works after a QuickTime update.

It’s one thing to use UNIX, which I have loved for over 25 years, but it’s another thing completely to be in the hands of a vendor who (regularly) does things like this while “upgrading” other components of your system.

Who wants to put up with that shit?

And don’t even get me started on the iPhone, which is a lovely and groundbreaking device, but one that I would never ever buy due to Apple’s actions.

I’m out of here.

AddThis Social Bookmark Button

I just deactivated my Facebook account

20:45 January 3rd, 2008 by terry. Posted under companies, me. No Comments »

I just deactivated my Facebook account. This has nothing to do with Robert Scoble’s account being disabled earlier today, I’m just sick of Facebook. It does nothing whatsoever for me, except send messages that can and would otherwise have been sent in email. I don’t want to use a tool that encourages people to send me messages on a website that I then have to go log in to. I don’t want some website to hold my messages. I like them to be searchable with things like grep. I like to organize them my way. I like email. Apart from receiving messages in a totally unattractive way, Facebook is useless for me - just a steady stream of invitations to things I don’t want to attend from people I don’t know, plus a smattering of cream pies, flying sheep, etc. So I’m outta there. I wonder if I’ll manage to survive.

AddThis Social Bookmark Button

Amazon just billed me 14 cents

00:35 January 2nd, 2008 by terry. Posted under companies, tech. 2 Comments »

I’ve been messing around with Esteve setting up an Amazon EC2 machine.

We set up a machine the other day, ssh’d into it, took a look around, and then shut it down a little later. Amazon just sent me a bill:

Greetings from Amazon Web Services,

This e-mail confirms that your latest billing statement is available on the AWS web site. Your account will be charged the following:

Total: $0.14

Please see the Account Activity area of the AWS web site for detailed account information.

Isn’t that cool?

It would certainly cost more than 14 cents to get your hands on your own (virtual) Linux box any other way.

AddThis Social Bookmark Button

Pushing back on the elevator pitch

23:28 December 1st, 2007 by terry. Posted under companies, me. 4 Comments »

I’ve been out talking to people about raising money for Fluidinfo.

Over the last 7 years I’ve read literally thousands of articles on talking to potential investors, pitching, raising money, angels, VCs, dilution, control, rounds, boards, strategies, valuations, burn rates, equity, etc. I’ve bought and read dozens of related books. I’m a regular reader of about a dozen VC blogs and the blogs of several entrepreneurs. I’ve swapped stories in person and learned lessons from probably a hundred other entrepreneurs. I was CTO of Eatoni Ergonomics, a startup that raised $5M in NYC, and I sat on the board for 4 years.

I like to analyze things, to sit around thinking, to generalize, to look for lessons, to find patterns, etc. So I reckon I have a fairly good idea of what creating a startup and raising money is about.

Some aspects of doing that are relatively formulaic. But others have significant variation.

For example, what should you put in a business plan? You can spend many months working on business plans. It’s hard work to write well and concisely. Then you show it to VC A and they tell you they’d also like to see X and Y and Z, that are not in your plan. So you put them in. You show it to VC B, and they tell you the plan is way too long! That you should take out P, Q, R and S. That leaves you with a wholly new-looking plan and when you show that to VC C, they’ll tell you it’s incoherent and doesn’t flow and look at you like you’re some kind of innocent child who doesn’t even know how to structure its thoughts. When you tell them you actually already know all that and that you agree, they’ll think you’re even weirder. And so it continues.

Thinking is changing on the business plan front, though. Some entrepreneurs and some investors have realized that creating or insisting on a business plan too early is probably a waste of time. Everyone knows the market numbers and the financial projections are probably rubbish. People expect the business and the plan to change, etc., etc.

When someone asks me for a business plan, I (politely) tell them I don’t have one or intend to write one. I tell them I’m looking for someone who wants to understand what I’m doing and fund it, without needing to see a formal written business plan. I suggest that if I reach the stage of looking for someone who wants the comfort of a better-thought-out plan that I will get back to them.

I think that’s a good change all round. You have to push back a little. A tiny engineering team focused on building a product probably shouldn’t stop, or be stopped, to write a business plan. I’m certainly not going to do that. I could spend that time writing code, working with people I’m paying to create more of a product, to get more online, to have more to point to, etc.

Elevator pitches

There’s definitely been a change with respect to business plans.

And now to the meat of this post, to a place where a similar change has yet to penetrate: the blind insistence on having an elevator pitch.

Almost universally, potential investors will want or expect an elevator pitch. Tons of VC sites will advise you that if you can’t describe your idea in a couple of sentences, it’s probably a non-starter. If you don’t have a compelling elevator pitch they wont talk to you, wont reply to email (even if you have been introduced), and they certainly wont read any materials.

Some even go so far as to tell you that without an elevator pitch you wont be able to communicate your ideas to your employees to motivate them! Uh, excuse me? Since when did the intelligent, driven, dig-in, curious, thoughtful, dedicated people who join startups acquire the attention span of gnats?

Listen. Some ideas can’t be summarized and/or grasped in a two-minute elevator ride. Sometimes you don’t even know yourself what the outcome will be. The history of science and innovation is full of examples. Imagine what the world would be like if, in order to get seed resources to push a new project along, all ideas had to be pre-vetted, each in 2 minutes, by a fairly general audience (I’m being polite again).

Entrepreneurs have to push back—where necessary—on the demand for an elevator pitch.

I’ve tried to put my round ideas into the square hole of an elevator pitch for long enough. I haven’t managed to do it and I don’t want to spend any more time trying.

Until tonight I’ve just been telling people I don’t have an elevator pitch, sorry. I’ve even told them (hi Nivi!) that instead of robotically insisting that I shape my ideas to their expectations that they try being more open minded about the process and try working on their expectations.

From now on, I’m going to give the following elevator pitch:

Here is a list of people. Each of them has had the curiousity, time, and patience to listen to my ideas for at least an hour. Ask them if I’m worth talking to further.

(See below for my list.)

If that’s not ok, then I agree that 1) if I reach the stage where I need to talk to people who really need an elevator pitch, and 2) you’re still interested, then I’ll try again to work on getting you what you need. Same goes for a business plan.

Up to this point I’ve tried to only talk to people who are willing to put the time in, to listen and think, to talk among themselves and draw their own conclusions. But I’ve still run into a bunch of people who wont do that. That’s ok, of course. I also know what it’s like to be busy.

Here’s my list. I’m very happy and very thankful to have recently spent at least an hour, sometimes much more, with each of the following:

Bradley Allen,
Art Bergman,
Jason Calacanis,
Dick Costolo,
Daniel Dennett.
Esther Dyson (now an investor),
Brady Forrest,
Eric Haseltine,
David Henkel-Wallace,
Jim Hollan,
Steve Hofmeyr,
Mark Jacobsen,
Vicente Lopez,
Roger Magoulas,
Jerry Michalski,
Nelson Minar,
Roger Moody,
Ted Nelson,
Tim O’Reilly,
Norm Packard,
Jennifer Pahlka,
Andrew Parker,
Scott Rafer,
Clay Shirky,
Reshma Sohoni,
Graham Spencer,
Stefan Tirtey,
Mark Tluszcz,
David Weinberger, and
Fred Wilson.

That’s my new elevator pitch.

If you buy it, let’s talk properly sometime soon. If you don’t, but you’re still curious, talk to some of those folks. Take your pick.

And if you don’t know any of those people, maybe you should be sending me your elevator pitch.

AddThis Social Bookmark Button

Hacking Twitter on JetBlue

21:41 November 24th, 2007 by terry. Posted under companies, me, python. 3 Comments »

I have much better and more important things to do than hack on my ideas for measuring Twitter growth.

But a man’s gotta relax sometime.

So I spent a couple of hours at JFK and then on the plane hacking some Python to pull down tweets (is this what other people call Twitter posts?), pull out their Twitter id and date, convert the dates to integers, write this down a pipe to gnuplot, and put the results onto a graph. I’ve nothing much to show right now. I need more data.

But the story with Twitter ids is apparently not that simple. While you can get tweets from very early on (like #20 that I pointed to earlier), and you can get things like #438484102 which is a recent one of mine, it’s not clear how the intermediate range is populated. Just to get a feel for it, I tried several loops like the following at the shell:

i=5000

while [ $i -lt 200000 ]
do
  wget –http-user terrycojones –http-passwd xxx \
    http://www.twitter.com/statuses/show/$i.xml
  i=`expr $i + 5000`
  sleep 1
done

Most of these were highly unsuccessful. I doubt that’s because there’s widespread deleting of tweets by users. So maybe Twitter are using ids that are not sequential.

Of course if I wasn’t doing this for the simple joy of programming I’d start by doing a decent search for the graph I’m trying to make. Failing that I’d look for someone else online with a bundle of tweets.

I’ll probably let this drop. I should let it drop. But once I get started down the road of thinking about a neat little problem, I sometimes don’t let go. Experience has taught me that it is usually better to hack on it like crazy for 2 days and get it over with. It’s a bit like reading a novel that you don’t want to put down when you know you really should.

One nice sub-problem is deciding where to sample next in the Twitter id space. You can maintain something like a heap of areas - where area is the size of the triangle defined by two tweets: their ids and dates. That probably sounds a bit obscure, but I understand it :-) Gradient of the growth curve is interesting - you probably want more samples when the gradient is changing fastest. Adding time between tweets to gradient gives you a triangle whose area you can measure. There are simpler approaches too, like uniform sampling, or some form of binary splitting of interesting regions of id space. Along the way you need to account for pages that give you a 404. That’s a data point about the id space too.

AddThis Social Bookmark Button

Not exactly Brownian motion in Manhattan

22:08 November 19th, 2007 by terry. Posted under companies, tech. 5 Comments »

Today after some meetings I went out for a walk. I’m staying on 12th Street between 5th and University in Manhattan.

I had intended to “just wander around” pretty much at random. That’s what I really felt like too. But in the back of my mind, not quite so far back that I wasn’t aware of it, my brain was making sure that, like it or not, I went to the Apple store on the corner of 5th Avenue and Central Park.

I really have no need of an Apple store. There’s nothing I would buy, nothing I need. But.

So off I wandered… Broadway, 6th Av, 5th Av. I stopped briefly in many stores, had a coffee and a muffin, tried to tell myself that I actually wasn’t going to the Apple store. But.

I saw iPods and iPhones aplenty along the way. Hundreds of them. All identically priced. Best buy, Comp USA, Circuit City, all the small electronic shops on 5th Av. No need, no need at all to go to the Apple store. None.

I’m walking up 5th Av in the boring super-rich area, Cartier, Dunhill, DeBeers. There can be no doubt whatsoever that I am heading to the Apple store. Most of my mind doesn’t want to go, but my legs and body seem determined. They know I need it.

And there it is. Amazing. I’ve been in several of these stores before, including this one, but there’s something you just have to see and feel. Maybe it’s the church of the 21st century… people are drawn in to worship an abstract god, to kneel at the altar and finger the icons.

It really is amazing. To me the Apple store is about the hippest place in Manhattan. Here you get to see all sorts of cool cats just hanging out with their favorite hardware. The place is full. Full of people from all over the world who’ve come to buy Apple gear. The place has a very definite atmosphere, and it’s not the atmosphere of a regular computer store. There are hundreds of Apple products out, they’re all on, and people are using them - surfing the web, reading email, listening to music, marveling. Spend half an hour in there people watching, and you want to run out and buy AAPL stock.

Apple and Nokia are two companies that really understand the importance of appearance, design, and fashion in technology. I think Nokia were the first company to see clearly that a phone is not just a phone - it’s a statement about yourself. It’s something you take out and leave on the table at the cafe, or casually flip open when you need to impress someone or get laid. Apple understands it even better. I’ll walk nearly 50 blocks just to get a fix - not to buy, just to look at the products, look at the people, be amazed at it all.

Fortunately I’m old enough to know that I don’t really need any of those shiny objects. I have a first generation iPod that I never use. I have a dead-simple phone that I don’t feel any need to upgrade. I haven’t bought myself a computer in I don’t know how long - maybe 10 years (I always get them through work). I’m not even sure that I’d own a computer if I didn’t work from home. But I sure do like to look at hardware. The new iPod nano is extraordinarily beautiful - dimensions, sleakness, feel, everything about it is divine - and at $149 (4GB) or $199 (8GB) it doesn’t feel expensive. But I know I simply wouldn’t use it. What a pity!

AddThis Social Bookmark Button

The Mahalo-Wikipedia-Google love triangle

00:12 November 18th, 2007 by terry. Posted under companies. 12 Comments »

Lots of people seem to like dumping on Mahalo and Jason Calacanis. For example, Andrew Baron recently posted about Why Mahalo is Fundamentally Flawed.

Try Googling Mahalo sucks and you’ll get about 232,000 hits. Take your pick of the highly critical coverage.

Some of the negative commentary on Mahalo is probably due to professional and personal jealousy. Some of it is due to the fact that it’s early days yet. And I think some of it may be due to Jason happily telling people to look left while he goes right.

How can Jason raise money for Mahalo at valuations north of $100M? Surely there must be a revenue plan that holds water? If you want to argue that Mahalo is a failure and that Jason is simply a ceaseless self-marketer full of hot air, you’ll need to argue that some of the same things are true of Mahalo’s investors. Or maybe we’re in a bubble and they’ve all simply lost it.

Here’s what I think is going on.

Firstly, I think Jason is using a little smoke and mirrors when he calls Mahalo a search engine and frequently compares Mahalo’s “search” results to Google’s. With few exceptions, everyone seems to be buying it! With few exceptions, people compare Mahalo with Google - presumably because Jason tells them to and because he talks about being a search engine. And, with few exceptions, the technorati tell us that Mahalo is a pretty crappy search engine.

I agree, because Mahalo is not a search engine. Putting a box labeled “Search” on your web site to dig hits out of your own content does not make you a search engine - if it did, millions of sites would qualify. Passing queries off to Google and showing the results does not make you a search engine, either. Telling people to compare your content with Google’s results does not make you a search engine. Nor does putting the words “search engine” in your company’s strapline.

Mahalo will never be a search engine, and almost certainly does not want to be a search engine. That would be suicidal.

I believe their strategy is entirely different and that the relevant comparison is not with Google, but with Wikipedia.

Mahalo is a rapidly growing collection of carefully curated content. Mahalo is Wikipedia with a different model of control, ownership, and content creation. It’s a benevolent dictator with a purchase agreement instead of a loose anarchy with the GNU Free Documentation License.

If you want to compare Mahalo to something, compare it to Wikipedia. Jason is a huge fan of Wikipedia. And here he is begging Jimbo Wales not to leave $100M/yr on the table. Interesting.

Right now Mahalo has roughly 25K pages. Google has information on, let’s say, 10 billion pages. By this simplistic measure, Google is about 400,000 times bigger than Mahalo. You’re not going to catch or compete with Google using people to make content. Yes, you can use Google for things you don’t have static pages for, as Mahalo does. But Mahalo is not a search engine. Never will be.

Now consider Wikipedia. Wikipedia has 1.2M English pages. That means that, in English, Wikipedia is a mere 48 times larger than Mahalo! Now we’re talking. Mahalo are currenly adding something like 1,000 pages a week. Suppose Jason manages to double that quite soon. That would be 100K pages a year, or about 8.3% of Wikipedia annually. So I think it’s conceivable that Mahalo could catch Wikipedia. Even if they keep a steady ship and only gain linearly they could easily be 35-40% the size of Wikipedia in 4 years’ time.

But sheer number of pages is only part of the story. Because the distribution of search requests will follow some kind of power law, you can pick up (say) half of all search requests by only covering a small number of them, and, as always, leave the long tail to Google.

So with a small finite amount of work, you can cover a very large chunk of Wikipedia. And I think that’s exactly what Mahalo are aiming to do.

A few weeks ago I pulled down all of Mahalo’s URIs for another project. Here’s a tiny sample - and I really did pick this out at random:

    http://www.mahalo.com/Valerie_Plame_Affair
    http://www.mahalo.com/Violence_on_Television
    http://www.mahalo.com/Violent_Crime_Rate
    http://www.mahalo.com/Virginia_Tech_Report
    http://www.mahalo.com/Voting_Machine_Controversy
    http://www.mahalo.com/Walter_Reed_Army_Medical_Center
    http://www.mahalo.com/War_Wounded
    http://www.mahalo.com/Washington_D.C._Lobbying_Scandal
    http://www.mahalo.com/Abdullah_Gul
    http://www.mahalo.com/Alan_Garcia
    http://www.mahalo.com/Alex_Salmond
    http://www.mahalo.com/Angela_Merkel

So what? you might ask. Well, let’s replace www.mahalo.com with en.wikipedia.org/wiki in the above. We get:

    http://en.wikipedia.org/wiki/Valerie_Plame_Affair
    http://en.wikipedia.org/wiki/Violence_on_Television
    http://en.wikipedia.org/wiki/Violent_Crime_Rate
    http://en.wikipedia.org/wiki/Virginia_Tech_Report
    http://en.wikipedia.org/wiki/Voting_Machine_Controversy
    http://en.wikipedia.org/wiki/Walter_Reed_Army_Medical_Center
    http://en.wikipedia.org/wiki/War_Wounded
    http://en.wikipedia.org/wiki/Washington_D.C._Lobbying_Scandal
    http://en.wikipedia.org/wiki/Abdullah_Gul
    http://en.wikipedia.org/wiki/Alan_Garcia
    http://en.wikipedia.org/wiki/Alex_Salmond
    http://en.wikipedia.org/wiki/Angela_Merkel

And guess what? All those URIs actually work! See below for a possible reason for this uncanny coincidence.

Research question: what percentage of Mahalo URIs work as Wikipedia URIs with the above simple substitution? I may do this test when I get a little more time. I bet the answer is high.

Ask yourself again: does Mahalo look more like Google or more like Wikipedia?

The idea of Mahalo-as-search-alternative-to-Google is just Jason operating Mahalo in stealth mode in broad daylight. “Hey, Rocky, watch me pull a search engine out of my hat! Oops! That’s not a search engine. I swear there was a search engine in there somewhere.”

How is Mahalo different from Wikipedia?

A big one is that Mahalo owns all its content. If Mahalo puts one of your pages on its site, you’ll first sign a purchase agreement in which the

Seller hereby irrevocably sells, grants, assigns, conveys and transfers to Mahalo, exclusively and forever, Seller’s entire right, title and interest in and to the SeRPs

and in which you warrant that the content is legit, in which you fully indemnify Mahalo, and in which you agree to let them be your agent and attorney should they need to take some action to obtain or protect the content.

In consideration you get $10-$15 which you can have in cash. Or, in a wonderfully ironic and masterful gesture, you can have your earnings donated to the Wikimedia Foundation! That’s just brilliant, I love it. How can you not be in awe of that? The guy’s a genius.

Talking of genius, just look at the language on the payment details page at Mahalo: “A Greenhouse Guide begins their career in the Greenhouse…”. You see? Writing articles for Mahalo is the beginning of a career. George Lakoff would probably count that as a classic example of framing (also see here).

Unlike Wikipedia, Mahalo owns every word of its content. That means they can sell it. That they can be acquired. But who would want to acquire Mahalo? Wait.

What other differences are there between Wikipedia and Mahalo?

Another big one is the millions of links on the internet that point to Wikipedia pages. Those little tubules that make up the internets, with Google’s PageRank worming its way down each and every one, assigning and passing on credit.

There are two things here: 1) the links themselves and 2) the high consequent position Wikipedia’s pages have on Google.

Can Mahalo get large numbers of people to link to their pages? If the pages are any good (and they are), then why not? Plus, it may be that Mahalo can catch Wikipedia in terms of how many people link to them.

According to the Netcraft October 2007 Web Server Survey, the number of servers on the net has been growing at an amazing 5% per month!

That’s just the rate of increase of new servers, not the rate of new pages being put onto existing sites. Let’s assume the Netcraft server number isn’t too far from the overall growth, and that the web roughly doubles in size every two years. That means if the size today is X, then in 4 years, towards the end of Jason’s horizon, it will be size 4X. If so, there are 3X pages yet to come into existence. The creators of these will have a choice to point links at Wikipedia or Mahalo. If popular momentum can be shifted to Mahalo, it can grab a large chunk of the link pie graph.

All of which brings us, inevitably, to Google.

Quick survey question: when you need to find something that you know you’d be happy to read in Wikipedia, do you first go to Wikipedia, find English (or your language), find their search box, enter your query, and click on the link? Or do you go to Google and take its Wikipedia link?

I thought so - you use Google. It’s a uniform way to get to things, it’s likely integrated into your browser, and they generally do a better and faster job of indexing sites’ content than the sites do themselves. So the existence and massive popularity of Wikipedia drives traffic to Google. And Google of course drives traffic to Wikipedia. The two of them are dating. But Wikipedia is not the perfect lover: they stubbornly refuse to put ads on their pages, to share the love. Along comes Jason Calacanis, then at AOL, to whom this is all very clear. He tells Wikipedia in no uncertain terms that with all that traffic they could make $100M per year from ads on just the home page. He points to a conservative estimate of the worth of Wikipedia at $600M, and his own estimate is $5B. Hmmmmm. What’s an entrepreneur to do when he sees someone leaving that much value on the table?

Back to Google. They would like to have more content. Traditionally, when you got back a page of their search results, you wouldn’t see links to pages on Google - that wouldn’t make sense: there were no pages on Google, after all. Google was supposed to point you to other pages. It was an index to help you find the things you actually wanted to look at. That was the old model. These days, Google is buying content (e.g., YouTube) and pointing their search results at their content, neatly taking the ad revenue in both places. All the better if the content comes with indemnification.

You can see where I’m going. Mahalo already does advertising with Google. In fact, they’re already a premium adsense publisher to the surprise of some. If ads on the single front page of Wikipedia could generate $100M annually, what could ads on all Mahalo pages generate if Mahalo grows to rival Wikipedia?

And… who weighs the importance of links (and other unknown factors) in Google’s results page? Yes, of course, Google does. According to this Fast Company article, Mahalo gets 65% of revenue Google makes when it sends its users into Google. And Google makes money when it sends its users into Mahalo.

If there’s really (say) $1B of value to be had by building a successful commercial version of Wikipedia, you can see why Google might have some interest in nudging links to Mahalo a little higher in its results. Maybe even higher than the equivalent page for Wikipedia. Now would be a good moment to remember that I illustrated above just how trivial it can be to match up equivalent Mahalo and Wikipedia pages… Got it? User enters a query, Google does the search and finds a highly-linked Wikipedia page, then in an instant they can make and instead display a link to the equivalent Mahalo page, optionally displaying the Wikipedia page below the fold. Would that qualify as evil?

All of which leads to a very clear answer to my “who would want to acquire Mahalo?” question. Interestingly, Google will want to wait until Mahalo is big (they will know exactly when, supposing Mahalo keeps using adsense). They want Mahalo to be independent and with strong momentum before they turn the corporate intake valve in the Mahalo direction.

Can Jason build a viable alternative to Wikipedia? I bet he can. He has the lessons of Wikipedia. He doesn’t have the anarchy factor. He has no spam. He knows what he’s doing, and he’s in control. It’s a content play, and Jason is a content guy. An editor with a track record of building valuable content in this way. He’s playing to his strengths. The engineering is not nearly as daunting as building a Google. He has the money. As he ramps it up he’s going to have more money.

Who’s going to stop him? Certainly not Google - that’s not in their interest at all. Almost certainly not Wikipedia - unless they start putting up ads and funneling large amounts of money back to Google. And Jason is unlikely to shoot himself in the foot either - quite the reverse.

So if that’s the strategy, and if he’s on track with content (as he seems to be), and if the content is passably good, or better (which it is), and if he has a good understanding with his “friends at Google” (which you can bet he does — let’s not forget the Sequoia factor either), and if the revenue numbers are about right, then a $175M valuation for an upcoming round to accelerate things might look like a steal.

Along the way, Jason gets to have a quiet inner smile at all the people whining about how Mahalo is a crap search engine. He feeds the fire all the while, telling them to go ahead, make his day, and compare Mahalo’s results to Google’s (but not Wikipedia’s). Misdirecting attention towards Google and having people write him off probably suits him just fine. Meanwhile, they’re getting on with the real mission.

AddThis Social Bookmark Button

Flakey Twitter and the use of consecutive ids

05:54 November 16th, 2007 by terry. Posted under companies, tech. 2 Comments »

Twitter was just inaccessible for maybe a couple of hours. Prior to that there was a 9-day gap in their timeline, noticed by at least a few people. I quite regularly have twitters I send not show up at all.

I wonder what could be going on over there? Things certainly don’t feel very stable.

A friend signed up tonight. Using the Twitter API you can see her id. It’s a bit over 10 million. You can also see the id of her first twitter, a bit over 417 million. The earliest twitter available on the system is number 20 “just setting up my twttr” sent at 20:50:14 on Tue Mar 21 2006 by Jack Dorsey who has user id 12 (the lowest user I’ve seen).

Given that Twitter seem to be using consecutive ids for users and twitters, and that you can pull dates out of their API, it would be pretty easy to make graphs showing growth in users and twitters over time. You could probably also infer downtime by looking for periods when no twitters appeared. This would be pretty easy too. Beyond a certain point in time it would be very accurate (i.e., when there are so many twitters arriving that a twittering gap is suspicious), and you could calculate confidence estimates.

I don’t have time for all that though.

But I wonder if Google did something like that as part of their competitive analysis when they decided to buy Jaiku, or if Twitter’s investors did it, and how the numbers would match up with whatever Twitter management might claim. I’ve no idea or opinion at all about any of that btw. But I don’t think I’d be exposing all that information by using consecutive ids for users and their twitters.

AddThis Social Bookmark Button

Is Andrew Parker secretly running Union Square Ventures?

18:06 November 14th, 2007 by terry. Posted under companies, me. 3 Comments »

Fred Wilson stirred up the entrepreneurial blogosphere 6 months ago with a series of posts wondering about the influence of founder age on startup success. I wrote one of my typically long comments.

Today I was making myself a coffee, and thinking about how fast/slow I can move, and how that’s changed over the years. When I was 24 I perhaps had more energy, but I often acted in a quite unfocused way. Now I’m 44, and I still have tons of energy. E.g., I was up coding last night until 6:30am, and then got up at 10am this morning and continued, so I’m not exactly loafing around with slippers and a pipe reflecting on my glory days. But I also have 3 kids, and other things going on. I have to act in a much more focused way or I couldn’t do the things I want to do.

But….., I then thought, the life of your average VC probably has some strong similarities. A couple of kids, insanely busy when working, regularly carving out quality time for family, needing to stay very organized and on top of things, needing to keep multiple balls rolling, etc. Those thoughts led me to reconsider Fred’s posting, but in the context of VCs.

Might it be that the best VC general partners would actually be a bunch of 24 years olds? Of course they could have some older guys as analysts. What do 40-50 year old VCs have that 20-30 year olds don’t that makes them more qualified and better as VCs? If you want to argue that experience makes the older better, you probably need to argue that for entrepreneurs too. If you want to argue that the energy of youth makes for a better entrepreneur, you might need to argue that for VCs too. If you want to argue that young founders have unique insight into what products will be successful, you might think the same would be true of young VCs — if there were any.

It takes a massive amount of work to create and build a startup. Unless you’re a superstar, it’s also a huge amount of work to get funded. You have to go begging and scraping, on bended knee, hat in hand, to make mature and otherwise sober people with a lot of money believe in you. And that’s all done against a background of very steep odds. Similarly, it’s a massive amount of work to raise a venture fund. You have to make even more mature and more sober people with even more money believe in you. And you have to do it in a much less forgiving environment, also against steep odds.

Thirty or even twenty years ago, most CEOs would probably have scoffed at the idea that a 20-year-old could start and run a company, and sell it for tens or hundreds of millions, or even a billion, or take it public. We now know that that actually happens, and the idea that the very young can do it, including getting financial backing, is no longer foreign. Might not the same one day be true for fund managers? When will we see the first VC fund run by a couple of twenty-somethings? Will they exhibit a marked preference for funding older founders?

Back when Fred was posting, I pointed Howard Gutowitz to one of the postings. A couple of days later, Howard told me that he’d talked about it to his brother:

Robert made what is actually an interesting suggestion: get a figurehead 26 year old to be the CEO. Turn the old game around.

I think that’s pretty amusing.

Maybe Andrew Parker is actually running Union Square Ventures. Turn the old game around.

AddThis Social Bookmark Button