Pond scum

15:53 September 5th, 2008 by terry. Posted under other, tech. 4 Comments »

Pond scumI had breakfast this morning at a bar in the Santa Caterina market in Barcelona with Jono Bennett. He’s a writer. We were reflecting on similarities in our struggles to do our own thing. An email about a potential Fluidinfo investor that I’d recently sent to a friend came to mind. I wrote:

I had a really good call with AAA. He told me he’s interested and wants to talk to BBB and CCC. I then got mail the next day from DDD (of the NYT) who told me he’d just had dinner with AAA and BBB and that they’d talked about my stuff. So something may happen there (i.e., I’ll never hear from them again).

The last comment, that I’d probably never hear from them again, was entirely tongue-in-cheek. I wrote it knowing it was a possibility, but not really thinking it would happen.

But it did.

Things like that seem to be part & parcel of the startup world as you attempt to get funded. I have often asked myself how can it be possible for things to be this way? How you can have people so excited, telling you and others you’re going to change the world, be worth billions, and then you never hear from them again? (Yes, of course you have to follow up, and I did. But that’s not the point: If you didn’t follow up you’d never hear from them.)

How can that be? In what sort of world is such a thing possible?

I came up with a highly flawed analogy. Despite its limited accuracy I find it amusing and can’t resist blogging it even if people will label me bitter (I’m not).

Kids with sticksFirst: startup founders are pond scum. Second: potential investors are a troupe of young kids wandering through the park with sticks.

The kids poke into the ponds, stirring up the scum. They’re looking for cool things, signs of life, perhaps even something to take home. They’re genuinely interested. They’re fascinated. The pond scum listen to their excited conversation and think the kids will surely be back tomorrow. But it’s summer, and the world is so very very big.

The pond scum are working on little projects like photosynthesis, enhancements to the Krebs cycle, or the creation of life itself. All the while they’re pondering how to make themselves irresistible, believing that someday the kids with the sticks will be back, that they’ll eventually be scooped up.

As Paul Graham recently wrote, fundraising is brutal. His #1 recommendation is to keep expectations low.

Kid with stickYep, you’re pond scum.

Get used to it.

Embrace it.

AddThis Social Bookmark Button

Minor mischief: create redirect loops from predictable short URLs

16:14 July 1st, 2008 by terry. Posted under other, python, tech. 1 Comment »

redirect loopI was checking out the new bit.ly URL shortening service from Betaworks.

I started wondering how random the URLs from these URL-shortening services could be. I wrote a tiny script the other day to turn URLs given on the command line into short URLs via is.gd:

import urllib, sys
for arg in sys.argv[1:]:
    print urllib.urlopen(
        ‘http://is.gd/api.php?longurl=’ + arg).read()

I ran it a couple of times to see what URLs it generated. Note that you have to use a new URL each time, as it’s smart enough not to give out a new short URL for one it has seen before. I got the sequence http://is.gd/JzB, http://is.gd/JzC, http://is.gd/JzD, http://is.gd/JzE,…

That’s an invitation to some minor mischief, because you can guess the next URL in the is.gd sequence before it’s actually assigned to redirect somewhere.

We can ask bit.ly for a short URL that redirects to our predicted next is.gd URL. Then we ask is.gd for a short URL that redirects to the URL that bit.ly gives us. If we do this fast enough, is.gd will not yet have assigned the predicted next URL and we’ll get it. So the bit.ly URL will end up redirecting to the is.gd URL and vice versa. In ugly Python (and with a bug/shortcoming in the nextIsgd function):

import urllib, random

def bitly(url):
    return urllib.urlopen(
        ‘http://bit.ly/api?url=’ + url).read()

def isgd(url):
    return urllib.urlopen(
        ‘http://is.gd/api.php?longurl=’ + url).read()

def nextIsgd(url):
    last = url[-1]
    if last == ‘z’:
        next = ‘A’
    else:
        next = chr(ord(last) + 1)
    return url[:-1] + next

def randomURI():
    return ‘http://www.a%s.com’ % \
           .join(map(str, random.sample(xrange(100000), 3)))

isgdURL = isgd(randomURI())
print ‘Last is.gd URL:’, isgdURL

nextIsgdURL = nextIsgd(isgdURL)
print ‘Next is.gd URL will be:’, nextIsgdURL

# Ask bit.ly for a URL that redirects to nextIsgdURL
bitlyURL = bitly(nextIsgdURL)
print ‘Step 1: bit.ly now redirects %s to %s’ % (
    bitlyURL, nextIsgdURL)

# Ask is.gd for a URL that redirects to that bit.ly url
isgdURL2 = isgd(bitlyURL)
print ‘Step 2: is.gd now redirects %s to %s’ % (
    isgdURL2, bitlyURL)

if nextIsgdURL == isgdURL2:
    print ‘Success’
else:
    print ‘Epic FAIL’

This worked first time, giving:

Step 1: bit.ly now redirects http://bit.ly/fkuL8 to http://is.gd/JA9
Step 2: is.gd now redirects http://is.gd/JA9 to http://bit.ly/fkuL8

In general it’s not a good idea to use predictable numbers like this, which hardly bears saying as just about every responsible programmer knows that already.

is.gd wont shorten a tinyurl.com link, as tinyurl is on their blacklist. So they obviously know what they’re doing. The bit.ly service is brand new and presumably not on the is.gd radar yet.

And finally, what happens when you visit one of the deadly looping redirect URLs in your browser? You’d hope that after all these years the browser would detect the redirect loop and break it at some point. And that’s what happened with Firefox 3, producing the image above.

If you want to give it a try, http://bit.ly/fkuL8 and http://is.gd/JA9 point to each other. Do I need to add that I’m not responsible if your browser explodes in your face?

AddThis Social Bookmark Button

Embracing Encapsulation

16:09 June 18th, 2008 by terry. Posted under me, python, tech. 21 Comments »

Encapsulated[This is a bit rambling / repetitive, sorry. I don’t have time to make it shorter, etc.]

Last year at FOWA I had a discussion with Paul Graham about programming and programmers in which we disagreed over the importance of knowing the fundamentals.

By this I mean the importance of knowing things down to the nuts and bolts level, to really understand what’s going on at the lower levels when you’re writing code. I used to think that sort of thing mattered a lot, but now I think it rarely does.

I well remember learning to program in AWK and being acutely aware of how resource intensive “associative arrays” (as we quaintly called them in those days) were, and knowing full well what was going on behind the scenes. I wrote a full Pascal compiler (no lex, no yacc) in the mid-80’s with Keith Rowe. If you haven’t done that, you really can’t appreciate the amount of computation that goes on when you compile a program to an executable. It’s astonishing. I did lots of assembly language programming, starting from age 15 or so, and spent years squeezing code into embedded environments, where a client might call to ask if you couldn’t come up with a way to reduce your executable code by 2 bytes so it would fit in their device.

But you know what? None of those skills really matter any more. Or they matter only very rarely.

The reason is that best practices have been worked out and incorporated into low-level libraries, and for the most part you don’t need to have any awareness at all of how those levels work. In fact it can be detrimental to you to spend years learning all those details if you could instead be learning how to build great things using the low-level libraries as black-box tools.

That’s the way the world moves in general. Successive generations get the accumulated wisdom of earlier generations packaged up for them. We used log tables, slide rules, and our heads, while our kids use calculators with hundreds of built-in functions. We learned to read analog 12-hour clocks, our kids learn to read digital clocks (so much easier!) and may not be able to read an analog clock until later. And it doesn’t matter. We buy a CD player (remember them?) or an iPod, and when it breaks you don’t even consider getting it “fixed” (remember that?). You just go out and buy another one. That’s because it’s cheaper and much faster and easier to just get a new one that has been put together by a machine than it is to have an actual human try to open the thing and figure out how to repair it. You can’t even (easily) open an iPod. And so the people who know how to do these things dwindle in number until there are none left. Like watch makers or the specialist knife sharpeners we have in Barcelona who ride around on motorcycles with their distinctive whistles, calling to people to bring down their blunt knives. And it doesn’t matter, at least from a technical point of view. Their brilliance and knowledge and hard-won experience has been encapsulated and put into machines and higher-level tools, or simply baked into society in smaller, more accurate and easier to digest forms. In computers it goes down into libraries and compilers and hardware. There’s simply no need for anyone to know how, learn how, or to bother, to do those sorts of things any more.

Note that I’m not saying it’s not nice to have your watch repaired by someone with a jeweler’s eyepiece or your knife or scissors sharpened in the street. I’m just noting the general progression by which knowledge inevitably becomes encapsulated.

In my discussion with Paul Graham, he argued that it was still important for tech founders to be great programmers at a low level. I argued that that’s not right. Sure, people like that are good to have around, but I don’t think you need to be that way and as I said I think it can even be detrimental because all that knowledge comes at a price (other knowledge, other experience).

I work with a young guy called Esteve (Hi Esteve!). He’s great at many levels, including the lower ones. He’s also a product of a new generation of programmers. They’re people who grew up only knowing object-oriented programming, only really writing in very high-level languages (not you Esteve! I mean that just in general), who think in those terms, and who instead of spending many years working with nuts and bolts spent the years working with newer high-level tools.

I think people like Esteve have a triple advantage over us dinosaurs. 1) They tend to use more powerful tools; 2) Because they use better tools, they are more comfortable and think more naturally in the terms of the higher-level abstractions their tools present them; and 3) they also have more experience putting those tools and methods to good use.

The experience gap widens at double speed, just as when a single voter changes side; the gap between the two parties increases by two votes. Even when the dinosaur modernizes itself and learns a few new tricks, you’re still way behind because the 25 year-old you’re working with (again, excluding Esteve) has never had to work at the nuts and bolts level. They think with the new paradigms and can put more general and more powerful tools directly into action. They don’t have to think about protocols or timeouts or dynamically resizing buffers or partial reads or memory management or data structures or error propogation. They simply think “Computer, fetch me the contents of that web page!” And most of the time it all just works. When it doesn’t, you can call in a gray-haired repair person or, more likely, just throw the busted tool away and buy another (or just get it free, in the case of Open Source software).

That’s real progress, and to insist that we should make the young suffer through all the stuff we had to learn in order to build all the libraries and compilers etc., that are now available to us all is just wrong. It’s wrong because it goes against the flow of history, because it’s counter-productive, and because it smacks of “I had to suffer through this stuff, walk barefoot to school in the snow, and therefore you must too.”

Some of the above will probably sound a bit abstract, but to me it’s not. I think it’s important to realize and accept. The fact that your kid can’t tie their shoelaces because they have velcro and have never owned a shoe with a lace is probably a good thing. You don’t know how to hunt your own food or start a fire, and it just doesn’t matter. The same goes for programming. The collective brilliance of generations of programmers is now built in to languages like Java, Python and Ruby, and into operating systems, graphics libraries, etc. etc., and it really doesn’t matter a damn if young people who are using those tools don’t have a clue what’s going on at the lower levels (as I said above, that’s probably a good thing). One day very few people will. The knowledge wont be lost. It’s just encapsulated into more modern environments and tools.

I’m writing all this down because I’ve been thinking about it on and off since FOWA, but also because of what I’m working on right now. I’m trying to modify 12K lines of synchronous Python code to use Twisted (an extraordinarily good set of asynchronous networking libraries written by a set of extraordinarily young and gifted programmers). The work is a bit awkward and three times I’ve not known how best to proceed in terms of design. Each time, Esteve has taken a look at the problem and quickly suggested a fairly clean way to tackle it. Desperate to cook up a way to think that he might not be that much smarter than I am, I’m forced into a corner in which I conclude that he has spent more time working with new tools (patterns, OO, a nice language like Python). So he looks at the world in a different way and naturally says “oh, you just do that”. Then I go do the routine work of making his ideas work - which is great by me, I get to learn in the best way, by doing. How nice to hire people who are better than you are.

That’s it. Encapsulation is inevitable. So you either have to embrace it or become a hand-wringing dinosaur moaning about the kids of today and how they no longer know the fundamentals. It’s not as though any of us could survive if we suddenly had to do everything from first principles (hunt, rub sticks together to make fire, etc). So relax. Enjoy it. The young are much better than we are because they grow up with better tools and they spend more time using them. It’s not enough to learn them when you’re older, even if you can do that really fast. You’ll never catch up on the experience front.

But it sure is fun to try.

AddThis Social Bookmark Button

Random thoughts on Twitter

02:48 June 9th, 2008 by terry. Posted under companies, tech. 12 Comments »

TwitterI’ve spent a lot of time thinking about Twitter this year. Here are a few thoughts at random.

Obviously Twitter have tapped into something quite fundamental, which at a high level we might simply call human sociability. We humans are primates, though there’s a remarkably strong tendency to forget or ignore this. We know a lot about the intensely social lives of our fellow primate species. It shouldn’t come as a surprise that we like to Twitter amongst ourselves too.

Here are a couple of interesting (to me) reasons for the popularity of Twitter.

One is that many people are in some sense atomized by the fact that many of us now work in an isolated way. Technical people who can do their work and communicate over the internet probably see less of their peers than others do. That’s just a general point, it’s not specific to Twitter or to 2008. It would have seemed unfathomably odd to humans 50 years ago to hear that many of us would be doing a large percentage of our work and social communication via machines, interacting with people who we don’t otherwise know, and who we rarely or never meet face to face. The rise of internet-based communication is obviously(?) helping to fill a gap created by this generational change.

The second point is specific to Twitter. Through brilliance or accident, the form of communication on Twitter is really special. Building a social network on nothing-implied asymmetric follower relationships is not something I would have predicted as leading to success. Maybe it worked, or could have all gone wrong, just due to random chance. But I’m inclined to believe that there’s more to it than that. Perhaps we’re all secretly voyeurs, or stickybeaks (nosy-parkers). Perhaps we like to see one half of conversations and be able to follow along if we like. Perhaps there’s a small secret thrill to promiscuously following someone and seeing if they follow you back. I don’t know the answer, but as I said above I do think Twitter have tapped into something interesting and strong here. There’s a property of us, we simple primates, that the Twitter model has managed to latch onto.

I think Twitter should change the dynamics for new users by initially assigning them ten random followers. New users can easily follow others, but if no-one is following them….. why bother? New user uptake would be much higher if they didn’t have the (correct) feeling that they were for some reason expected to want to Twitter in a vacuum. You announce a new program, called e.g., Twitter Guides and ask for people to volunteer to be guides (i.e., followers) of newbees. Lend a hand, make new friends, maybe get some followers yourself, etc. Lots of people would click to be a Guide. I bet this would change Twitter’s adoption dynamics. If you study things like random graph theory and dynamic systems, you know that making small changes to (especially initial) probabilities can have a dramatic effect on overall structure. If Twitter is eventually to reach a mass audience (whatever that means), it should be an uncontestable assertion that anything which significantly reduces the difficulty for new users to get into using it is very important.

Twitter should probably fix their reliability issues sometime soon.

I say “probably” because reliability and scaling are obviously not the most important things. Twitter has great value. It must have, or it would have lost its users long ago.

There’s a positive side to Twitter’s unreliability. People are amazed that the site goes down so often. Twitter gets snarled up in ways that give rise to a wide variety of symptoms. The result seems to be more attention, to make the service somehow more charming. It’s like a bad movie that you remember long afterwards because it wasn’t good. We don’t take Twitter for granted and move on the next service to pop up - we’re all busy standing around making snide remarks, playing armchair engineer, knowing that we too might face some of these issues, and talking, talking, talking. Twitter is a fascinating sight. Great harm is done by its unreliability, but the fact that their success so completely flies in the face of conventional wisdom is fascinating - and the fact that we find it so interesting and compelling a spectacle is fantastic for Twitter. They can fix the scaling issues, I hope. They should prove temporary. But the human side of Twitter, its character as a site, the site we stuck with and rooted for when times were so tough, the amazing little site that dropped to the canvas umpteen times but always got back to its feet, etc…. All that is permanent. If Twitter make it, they’re going to be more than just a web service. The public outages are like a rock musician or movie star doing something outrageous or threatening suicide - capturing attention. We’re drawn to the spectacle and the drama. We can’t help ourselves: it is our selves. We love it, we hate it, it brings us together to gnash our teeth when it’s down. But do we leave? Change the channel? No way.

Twitter is both the temperamental child rock star we love and, often, the medium by which we discuss it - an enviable position!

I’m reminded of a trick I learned during tens of thousands of miles of hitch-hiking. A great place to try for a lift is on a fairly high-speed curve on the on-ramp to the freeway / motorway / autopista / autoroute etc. Stand somewhere where a speeding car can only just manage a stop and only just manage to pull in away from the following traffic. Conventional wisdom tells you that you’ll never get a ride. But the opposite is true - you’ll get a ride extremely quickly. Invariably, the first thing the driver says when you get in is “Why on earth where you standing there? You’re very lucky I managed to stop. No-one would have ever picked you up standing there!” I’ve done this dozens of times. Twitter—being incredibly, unbelievably, frustratingly, unreliable and running contrary to all received wisdom—is a powerful spectacle. Human psyche is a funny thing. That’s a part of why it’s probably impossible to foretell success when mass adoption is required.

If I were running Twitter, apart from working to get the service to be more reliable, I’d be telling the engineering team to log everything. There’s a ton of value in the data flowing into Twitter.

Just as Google took internet search to a new level by link analysis, there’s another level of value in Twitter that I don’t think has really begun to be tapped yet.

PageRank, at least as I understand its early operation, ran a kind of iterative relaxation algorithm assigning and passing on credit via linked pages. A similar thing is clearly possible with Twitter, and some people have commented on this or tried to build little things that assign some form of score to users. But I think there’s a lot more that can be done. Because the Twitter API isn’t that powerful (mainly because you’re largely limited to querying as a single authorized user) and certainly because it’s rate-limited to just 70 API calls an hour, this sort of analysis will need to be done by Twitter themselves. I’m sure they’re well aware of that. Rate limiting probably helps them stay up, but it also means that the truly interesting and valuable stuff can’t be done by outsiders. I have no beef with that - I just wish Twitter would hurry up and do some of it.

Some examples in no order:

  • The followers to following ratio of a Twitter user is obviously a high-level measure of that user’s “importance” (in some Twitter sense of importance). But there’s more to it than that. Who are the followers? Who do they follow, who follows them? Etc. This leads immediately back to Google PageRank.
  • If a user gets followed by many people and doesn’t follow those people back, what does it say about the people involved? If X follows Y and Y then goes to look at a few pages of X’s history but does not then follow X, what do we know?
  • If X has 5K followers and re-tweets a twit of Y, how many of X’s followers go check out and perhaps follow Y? What kind of people are these? (How do you advertise to them, versus others?)
  • Along the lines of co-citation analysis, Twitter could build up a map showing you who you might follow. I.e., you can get pairwise distances between users X and Y by considering how many people they follow in common and how many they follow not-in-common. That would lead to a people you should be following that you’re not kind of suggestion.
  • Even without co-citation analysis (or similar), Twitter should be able to tell me about people that many of the people I follow are following but whom I am not following. I’d find that very useful.
  • Twitter could tell me why someone chooses to follow me. What were they looking at (if anything) before they decided to follow me? I.e., were they browsing the following list of someone else? Did they see my user name mentioned in a Tweet? Did they come in from an outside link? Would a premium Twitter user pay to have that information?
  • Twitter has tons of links. They know the news as it happens. They could easily create a news site like Digg.
  • In some sense the long tail of Twitter is where the value is. For instance, it doesn’t mean much if a user following 10K others follows someone. But if someone is following just 10 people, it’s much more significant. There’s more information there (probably). The Twitter mega users are in some way uninteresting - the more people they have following them and the more they follow, the less you really know (or care) about them. Yes, you could probably figure out more if you really wanted to, but if someone has 10K followers all you really know is that they’re probably famous in some way. If they add another 100 followers it’s no big deal. (I say all this a bit lightly and generally - the details might of course be fascinating and revealing - e.g., if you notice Jason Calacanis and Dave Winer have suddenly started @ messaging each other again it’s like IRC coming back from a network split :-))
  • Similarly if someone with a very high followers to following ratio follows a Twitter user who has just a couple of followers, it’s a safe bet that those two are somehow friends with a pre-existing relationship.
  • I bet you could do a pretty good job of putting Twitter users into boxes just based on their overall behavior, something like the 16 Myers-Briggs categories. Do you follow people back when they follow you? Do you @ answer people who @ address you (and Twitter knows when you’ve seen the original message)? Do you send @ messages to people (and how influential are those people)? Do those people @ you back (and how influential those people are says something about how interesting / provocative you are)? Do you follow tons and tons of people? Do you follow people and then un-follow them if they don’t follow you back? Do you follow random links in other people’s Twitters, and are those links accompanied by descriptive text or tinyurl links? Do you @ message people after you follow their links? Do your Twitter times follow a strict pattern, or are you on at all hours, or suddenly spending days without Twittering? Do you visit and just read much more than you tweet? How much old stuff do you read? Do you tend to talk in public or via DM? Are your tweets public?All that without even considering the content of your Twitters.
  • Could Twitter become a search engine? That’s not a 100% serious question, but it’s worth considering. I don’t mean just making the content of all tweet searchable, I mean it with some sort of ranking algorithm, again perhaps akin to PageRank. If you somehow rank results by the importance or closeness of the user whose tweets match the search terms, you might have something interesting.
  • Twitter also presumably know who’s talking about whom in the DM backchat. They can’t use that information in obvious way, but it’s of high value.

I could go on for hours, but that’s more than enough for now. I don’t feel like any of the above list is particularly compelling, but I do think the list of nice things they could be doing is extremely long and that Twitter have only just begun (at least publicly) to tap into the value they’re sitting on.

I think Google should buy Twitter. They have what Twitter needs: 1) engineering and scale, 2) link analysis and algorithm brilliance, and 3) they’re in a position to monetize the value illustrated above (via their search engine, that already has ads) without pissing off the Twitter community by e.g., running ads on Twitter. What percentage of Twitter users also use Google? I bet it’s very high.

AddThis Social Bookmark Button

Python: looks great, stays wet longer

00:02 June 8th, 2008 by terry. Posted under python, tech. 5 Comments »

Wet clayI should be coding, not blogging. But a friend noticed I hadn’t blogged in a month, so in lieu of emailing people, here are a couple of comments on programming in Python. There are many things that could be said, but I just want to make two points that I think aren’t so obvious.

1. Python looks great

In Python, indentation is used to delimit code blocks. I like that a lot - you would indent your code anyway, right? It reduces clutter. But apart from that, Python is very minimalistic in its syntax. There are rather few punctuation symbols used, and they’re used pretty consistently. As a result, Python code looks great on the page. It’s not painful to edit, and I mean that figuratively and literally. This is worth noting because when you write complex code it’s nice if the language you’re doing it in is very clean. That’s important because code can become hard to understand and unpleasant to work with. If you have pieces of code that you dread touching, that may be in part because the code is really ugly and complex on the page. Perl is a case in point - there’s tons of punctuation symbols, and in some cases the same thing (e.g., curly braces) is used in multiple (about 5!) different ways to mean different things. If the language is pleasant to look at for longer, you are more willing to work on code that might be more forbidding when expressed in other languages. Esthetics is important. Actively enjoying looking at code simply because the language is so clean is a great advantage—for you, and for the language.

This might not seem like a big point, but it’s important to me, it’s something I’ve never encountered before, and it’s a nice property of Python. BTW, people always make fun of Lisp for its parentheses. But Lisp is the cleanest language I know of in terms of simplicity on the page. The parens and using prefix operators in S-expressions removes the need for almost all other punctuation (and makes programmatically generating code an absolute breeze).

2. Python stays wet longer

I don’t like to do too much formal planning of code. I much prefer to sit down and try writing something to see how it fits. That means I’ll often go through several iterations of code design before I reach the point where I’m happy. Sometimes this is an inefficient way to do things, particularly when you’re working on something very complex that you don’t really have your head around when you start. But I still choose to do things this way because it’s fun.

Sometimes I think of it like pottery. You grab a lump of wet clay and slap it down on the wheel. Then you try out various ideas to shape whatever it is you’re trying to create. If it doesn’t work, you re-shape it—perhaps from scratch. This isn’t a very accurate analogy, but I do think it’s valid to say that preferring to work with real code in an attempt to understand how best to shape your ideas is a much more physical process than trying to spec everything out sans code. I find I can’t know if code to implement an idea or solve a problem is going to feel right unless I physically play with it in different forms.

For me, Python stays wet longer. I can re-shape my code really easily in Python. In other languages I’ve often found myself in a position where a re-design of some aspect involves lots of work. In Python the opposite has been true, and that’s a real pleasure. When you realize you should be doing things differently and it’s just a bit of quick editing to re-organize things, you notice. I might gradually be becoming a better programmer, but I mainly feel that in using Python I simply have better quality clay.

AddThis Social Bookmark Button

Everything you think you know is wrong

01:05 April 11th, 2008 by terry. Posted under other, tech. 1 Comment »

wrongI’m often surprised at how confident people are about their knowledge of the world. Looking at the history of thought and of science, you quickly see that it’s strewn with discredited and totally incorrect theories about almost everything. So I don’t understand why it’s not more commonplace to look at history and to arrive immediately at the most likely conclusion: that we too have almost everything wrong.

I don’t mean that literally everything we think is completely wrong. Some things are certainly partly right, or even mainly or fully right. But to have a high degree of confidence, or to assume we’re right just because we know so much more about the world than our ancestors did, or simply because we think we’re right, is just inviting ridicule. Considering our record, and our continual attendant misguided arrogance and confidence along the way, you’d be nuts to think that we know much today or that our confidence adds any weight at all. Many thousands of years of history argue strongly against that conclusion.

Thinking that almost everything is probably wrong in some important fundamental way is a useful default. That attitude stands you in good stead for digging into things, for reconsidering them, for asking questions at a low level. In mathematics when you know for sure that something is wrong (or right) it helps enormously in proving it. It’s a psychological thing. In my dissertation I proved a statistical result that I knew must be true from running simulations. It took me a week or two to nail the proof, and I would never have gotten there if I hadn’t known in advance that the equality I was trying to prove analytically was certainly true (pp 201-207 here in case you’re interested).

As an example of something that I think will be overturned, I think we’ll come to regard our decades of designing computational systems according to the Von Neumann Architecture as extremely primitive. Maybe that will involve some form of analog or quantum computation. I think we’ll take more and more from nature, for instance in solving optimization problems.

On a less grandiose note but still important, I think we’ll look back on our current information architecture and also see it as being extremely primitive. Or, as I’ve said before, we’re living in the shadow of information architecture decisions that were made decades ago. I think that’s all hopelessly wrong. In the real world, information processing simply doesn’t look much like a hierarchical file system.

Hence Fluidinfo.

And so ends another semi-cryptic and ultimately unsatisfying post. I do, as always, plan to eventually say more. And I will.

AddThis Social Bookmark Button

Twitter dynamics: unfollowing guykawasaki, Scobleizer and cameronreilly

16:16 March 22nd, 2008 by terry. Posted under tech. 8 Comments »

cameronreillyI’ve only got so much time a day to read blogs, Twitters, etc.

With blogs I find that I tend to try to keep up with those that post at a frequency at or below what I can handle, irrespective of quality of content. There are lots of blogs that I really enjoy, but which post new material so often that I end up never going to their sites. E.g., BoingBoing or ReadWriteWeb. I tend to always go to new content at blogs I like that have about one new article a day. I have dozens of examples in both these categories.

With blogs it’s no problem if some of the sites you’re subscribed to have tons of content. If you never click through on the indicator that there are 500 unread postings, you never see them.

On Twitter though the dynamic is very different. I follow about 140 people. From time to time during the day - normally when I’m drinking a coffee like I am now, or eating food - I’ll go have a look at Twitter to see what’s up in the wider world.

Unlike with blogs, if someone posts hundreds of Twitter updates you’re going to see them all. You’re perhaps going to see something like the image above (click for larger version). That’s not what I want to see at all. I’m hoping to see a whole bunch of people posting a few things, not screen after screen of one person talking to many people I don’t know or follow. It’s worse than being in a room with someone talking loudly on a mobile phone, hearing just one side of the conversation - this is like being in a room with that same person, but they’re talking to multiple people at once.

So with some reluctance I have recently un-followed Scobelizer, guykawasaki and cameronreilly. I actually like much of their content, but they have much too much of an unbalancing effect on my overall Twitter experience.

Move along.

AddThis Social Bookmark Button

iPod vending machine

12:25 March 9th, 2008 by terry. Posted under tech, travel. 1 Comment »

iPod vending machineHere’s an iPod vending machine I just passed on Concourse A in the Atlanta airport. It also offers a variety of other audio components, like headphones from Harman Kardon and Bose, laptop chargers, digital cameras (including 2 models more advanced than the one I just bought), etc. I didn’t check on the prices, which are only available on the LCD screen you see the couple using.

AddThis Social Bookmark Button

ETech Antigenic Cartography presentation online

19:01 March 7th, 2008 by terry. Posted under tech. No Comments »

ETech logo
I gave my ETech talk on Wednesday afternoon. The Keynote presentation and a PDF of the slides are online.

This was my second presentation made witk Keynote. It took me quite a few days to put it together. Keynote has a few nits that make it slightly awkward to use, but overall it’s really really good. I learned a lot.

With Powerpoint you need to put in a lot of work to make things look good. In Keynote it would take work to make them look bad. The presentation themes are beautiful out of the box. And it’s extremely easy to work with.

I’m even thinking of buying a new laptop to run linux on so I don’t have to dump keynote. I could use Parallels, but I don’t want to spend all my time running on a virtual machine.

AddThis Social Bookmark Button

Keynote is good

19:15 February 15th, 2008 by terry. Posted under tech. 1 Comment »

roman numerals in keynoteI’ve been playing with Keynote to make a presentation. There are a lot of things I don’t really like about using a Mac, but Keynote is not one of them.

It makes really attractive presentations. It’s easy to use. The help actually helps. You can export to multiple formats (Quicktime, Powerpoint, PDF, images, Flash, HTML, iPod).

And, it’s fun to use. I’m going to miss it when I head back to Linux.


AddThis Social Bookmark Button

Worst of the web award: Cheaptickets

16:22 February 14th, 2008 by terry. Posted under companies, me, tech. 5 Comments »

Here’s a great example of terrible (for me at least) UI design.

I was just trying to change a ticket booking at Cheaptickets. Here’s the interface for selecting what you want to change (click to see the full image).

cheaptickets

As you can see, I indicated a date/time change on my return flight. When I clicked on the continue button, I got an error message:

An error has occurred while processing this page. Please see detail below. (Message 1500)

Please select flight attributes to change.

I thought there was some problem with Firefox not sending the information that I’d checked. So I tried again. Then I tried clicking a couple of the boxes. Then I tried with Opera. Then I changed machines and tried with IE on a windows box. All of these got me the exact same error.

I looked at the page several times to see if I’d missed something - like a check box to indicate which of the flights to change. I figured Cheaptickets must have an error server side. Then I thought come on, you must be doing something wrong.

Then I figured it out. Can you?

AddThis Social Bookmark Button

The power of representation: Adding powers of two

17:42 February 13th, 2008 by terry. Posted under representation, tech. 1 Comment »

decimalOn the left is an addition problem. If you know the answer without thinking, you’re probably a geek.

Suppose you had to solve a large number of problems of this type; adding consecutive powers of 2 starting from 1. If you did enough of them you might guess that 1 + 2 + 4 + … + 2n - 1 is always equal to 2n - 1. In the example on the left, we’re summing from 20 to 210 and the answer is 211 - 1 = 2047.

And if you cast your mind back to high-school mathematics you might even be able to prove this using induction.

But that’s a lot of work, even supposing you see the pattern and are able to do a proof by induction.

binary-addLet’s instead think about the problem in binary (i.e., base 2). In binary, the sum looks like the image on the right.

There’s really no work to be done here. If you think in binary, you already know the answer to this “problem”. It would be a waste of time to even write the problem down. It’s like asking a regular base-10 human to add up 3 + 30 + 300 + 3000 + 30000, for example. You already know the answer. In a sense there is no problem because your representation is so nicely aligned with the task that the problem seems to vanish.

Why am I telling you all this?

Because, as I’ve emphasized in three other postings,
if you choose a good representation, what looks like a problem can simply disappear.

I claim (without proof) that lots of the issues we’re coming up against today as we move to a programmable web, integrated social networks, and as we struggle with data portability, ownership, and control will similarly vanish if we simply start representing information in a different way.

I’m trying to provide some simple examples of how this sort of magic can happen. There’s nothing deep here. In the non-computer world we wouldn’t talk about representation, we’d just say that you need to look at the problem from the right point of view. Once you do that, you see that it’s actually trivial.

AddThis Social Bookmark Button

Talking about Antigenic Cartography at ETech

13:34 February 12th, 2008 by terry. Posted under me, tech. No Comments »

ETech 2008Blogs are all about self-promotion, right? Right.

I’m talking at ETech in the first week of March in San Diego. The talk is at 2pm on Wednesday March 3, and is titled Antigenic Cartography: Visualizing Viral Evolution for Influenza Vaccine Design.

You can find out more about Antigenic Cartography here and here.

Here’s my abstract:

Mankind has been fighting influenza for thousands of years. The 1918 pandemic killed 50-100 million people. Today, influenza kills roughly half a million people each year. Because the virus evolves, it is necessary for vaccines to track its evolution closely in order to remain effective.

Antigenic Cartography is a new computational method that allows a unique visualization of viral evolution. First published in 2004, the technique is now used to aid the WHO in recommending the composition of human influenza vaccines. It is also being applied to the design of pandemic influenza vaccines and to the study of a variety of other infectious diseases.

The rise of Antigenic Cartography is a remarkable story of how recent immunological theory, mathematics, and computer science have combined with decades of virological and medical research and diligent data collection to produce an entirely new tool with immediate practical impact.

This talk will give you food for thought regarding influenza, and move on to explain what Antigenic Cartography is, how it works, and exactly how it is used to aid vaccine strain selection—all in layman’s terms, with no need for a biological or mathematical background.

In case you’re wondering, no, I didn’t go so far as to make the “I’m speaking” image above. I chose it from the conference speaker resources. Self-promotion has its limits.

AddThis Social Bookmark Button

Google maps gets SFO location waaaay wrong

22:16 January 28th, 2008 by terry. Posted under companies, tech. 1 Comment »

google-sfoBefore leaving Barcelona yesterday morning, I checked Google maps to get driving directions from San Francisco International airport (SFO) to a friend’s place in Oakland.

Google got it way wrong. Imagine trying to follow these instructions if you didn’t know they were so wrong. Click on the image to see the full sized map. Google maps is working again now.


AddThis Social Bookmark Button

Amazon S3 to rival the Big Bang?

00:40 January 28th, 2008 by terry. Posted under companies, tech. 2 Comments »

Note: this posting is based on an incorrect number from an Amazon slide. I’ve now re-done the revenue numbers.

We’ve been playing around with Amazon’s Simple Storage Service (S3).

Adam Selipsky, Amazon VP of Web Services, has put some S3 usage numbers online (see slides 7 and 8). Here are some numbers on those numbers.

There were 5,000,000,000 (5e9) objects inside S3 in April 2007 and 10,000,000,000,000 (1e13) in October 2007. That means that in October 2007, S3 contained 2,000 times more objects than it did in April 2007. That’s a 26 week period, or 182 days. 2,000 is roughly 211. That means that S3 is doubling its number of objects roughly once every 182/11 = 16.5 days. (That’s supposing that the growth is merely exponential - i.e., that the logarithm of the number of objects is increasing linearly. It could actually be super-exponential, but let’s just pretend it’s only exponential.)

First of all, that’s simply amazing.

It’s now 119 days since the beginning of October 2007, so we might imagine that S3 now has 2119/16.5 or about 150 as many objects in it. That’s 1,500,000,000,000,000 (1.5e15) objects. BTW, I assume by object they mean a key/value pair in a bucket (these are put into and retrieved from S3 using HTTP PUT and GET requests).

Amazon’s S3 pricing for storage is $0.15 per GB per month. Assume all this data is stored on their cheaper US servers and that objects take on average 1K bytes. These seem reasonable assumptions. (A year ago at ETech, SmugMug CEO Don MacAskill said they had 200TB of image data in S3, and images obviously occupy far more than 1K each. So do backups.) So that’s roughly 1.5e15 * 1K / 1G = 1.5e9 gigabytes in storage, for which Amazon charges $0.15 per month, or $225M.

That’s $225M in revenue per month just for storage. And growing rapidly - S3 is doubling its number of objects every 2 weeks, so the increase in storage might be similar.

Next, let’s do incoming data transfer cost, at $0.10 per GB. That’s simply 2/3rds of the data storage charge, so we add another 2/3 * $225M, or $150M.

What about the PUT requests, that transmit the new objects?

If you’re doubling every 2 weeks, then in the last month you’ve doubled twice. So that means that a month ago S3 would have had 1.5e15 / 4 = 3.75e14 objects. That means 1.125e15 new objects were added in the last month! Each of those takes an HTTP PUT request. PUTs are charged at one penny per thousand, so that’s 1.125e15 / 1000 * $0.01.

Correct me if I’m wrong, but that looks like $11,250,000,000.

To paraphrase a scene I loved in Blazing Saddles (I was only 11, so give me a break), that’s a shitload of pennies.

Lastly, some of that stored data is being retrieved. Some will just be backups, and never touched, and some will simply not be looked at in a given month. Let’s assume that just 1% of all (i.e., not just the new) objects and data are retrieved in any given month.

That’s 1.5e15 * 1K * 1% / 1e9 = 15M GB of outgoing data, or 15K TB. Let’s assume this all goes out at the lowest rate, $0.13 per GB, giving another $2M in revenue.

And if 1% of objects are being pulled back, that’s 1.5e15 * 1% = 1.5e13 GET operations, which are charged at $0.01 per 10K. So that’s 1.5e13 / 10K * $0.01 = $15M for the GETs.

This gives a total of $225M + $150M + $11,250M + $2M + $15M = $11,642M in the last month. That’s $11.6 billion. Not a bad month.

Can this simple analysis possibly be right?

It’s pretty clear that Amazon are not making $11B per month from S3. So what gives?

One hint that they’re not making that much money comes from slide 8 of the Selipsky presentation. That tells us that in October 2007, S3 was making 27,601 transactions per second. That’s about 7e10 per month. If Amazon was already doubling every two weeks by that stage, then 3/4s of their 1e13 S3 objects would have been new that month. That’s 7.5e12, which is 100 times more transactions just for the incoming PUTs (no outgoing) than are represented by the 27,601 number. (It’s not clear what they mean by transaction - I mean what goes on in a single transaction.)

So something definitely doesn’t add up there. It may be more accurate to divide the revenue due to PUTs by 100, bringing it down to a measly $110M.

An unmentioned assumption above is that Amazon is actually charging everyone, including themselves, for the use of S3. They might have special deals with other companies, or they might be using S3 themselves to store tons of tiny objects. I.e., we don’t know that the reported number is of paid objects.

There’s something of a give away the razors and charge for the blades feel to this. When you first see Amazon’s pricing, it looks extremely cheap. You can buy external disk space for, e.g., $100 for 500GB, or $0.20 per GB. Amazon charges you just $0.18 per GB for replicated storage. But that’s per month. A disk might last you two years, so we could conclude that Amazon is e.g., 8 or 12 times more expensive, depending on the degree of replication. But you don’t need a data center or to grow (or shrink) a data center, cooling, employees, replacement disks—all of which have been noted many times—so the cost perhaps isn’t that high.

But…. look at those PUT requests! If an object is 1K (as above), it takes 500M of them to fill a 500GB disk. Amazon charges you $0.01 per 1000, so that’s 500K * $0.01 or $5000. That’s $10 per GB just to access your disk (i.e., before you even think about transfer costs and latency), which is about 50 times the cost of disk space above.

In paying by the PUT and GET, S3 users are in effect paying Amazon for the compute resources needed to store and retrieve their objects. If we estimate it taking 10ms for Amazon to process a PUT, then 1000 takes 10 seconds of compute time, for which Amazon charges $0.01. That’s nearly $26K per month being paid for machines to do PUT storage, which is 370 times more expensive than what Amazon would charge you to run a small EC2 instance for a month. Such a machine probably costs Amazon around $1500 to bring into service. So there’s no doubt they’re raking it in on the PUT charges. That makes the 5% margins of their retailing operation look quaint. Wall Street might soon be urging Bezos to get out of the retailing business.

Given that PUTs are so expensive, you can expect to see people encoding lots of data into single S3 objects, transmitting them all at once (one PUT), and decoding when they get the object back. That pushes programmers towards using more complex formats for their data. That’s a bad side-effect. A storage system shouldn’t encourage that sort of thing in programmers.

Nothing can double every two weeks for very long, so that kind of growth simply cannot continue. It may have leveled out in October 2007, which would make my numbers off by roughly 2119/16.5 or about 150, as above.

When we were kids they told us that the universe has about 280 particles in it. 1.5e15 is already about 250, so only 30 more doubling are needed, which would take Amazon just over a year. At that point, even if all their storage were in 1TB drives and objects were somehow stored in just 1 byte each, they’d still need about 240 disk drives. The earth has a surface area of 510,065,600 km2 so that would mean over 2000 Amazon disk drives in each square kilometer on earth. That’s clearly not going to happen.

It’s also worth bearing in mind that Amazon claims data stored into S3 is replicated. Even if the replication factor is only 2, that’s another doubling of the storage requirement.

At what point does this growth stop?

Amazon has its Q4 2007 earnings call this Wednesday. That should be revealing. If I had any money I’d consider buying stock ASAP.

AddThis Social Bookmark Button

Worst of the web award: MIT/Stanford Venture Lab

00:03 January 25th, 2008 by terry. Posted under tech. 1 Comment »

vlabI’ve just awarded one of my coveted Worst of the Web awards to the MIT/Stanford Venture Lab.

Here’s why. They are hosting a video I’d like to watch. You can see it on their home page right now, Web 3.0: New Opportunities on the Semantic Web.

If you click on that link, wonderful things happen.

You get taken to a page with a Watch Online link. Clicking on it tells you that this is a “Restricted Article!” and that you need to register to see the video. Another click and you’re faced with a page that gives you four registration options: Volunteers, Board Members, Standard Members, or Sponsors. Below each of them it says “Rates: Membership price: $0.00″.

Ok, so we’re going to pay $0.00 to sign up for a free video. That takes me to a page with 15 fields, including “Billing Address”. If you leave everything blank and try clicking through, it tells you “A user account with the same email you entered already exists in the system.” But I left the email field empty.

When you fill in email and your name, you get to confirm your purchase: Review your order details. If all appears ok, click “Submit Transaction ->” to finalize the transaction. There’s a summary of the charges, with Price and Total columns, Sub-totals, Tax, Shipping, Grand Total - all set to $0.00. There’s a button labeled “Submit Transaction” and a warning: “Important: CLICK ONCE ONLY to avoid being charged twice.”

You then wind up on a profile page with no less than 54 fields! Scroll to the bottom, take yourself off the mailing list, then “Update profile”.

OK, so you’re registered. The top left of the screen has your user name, and the top right has a link labeled “Sign Out”. So you’re apparently logged in too.

Now you go back to the home page, and click on the link for the video. Then click on the Watch Online link. And it tells you this is a “Restricted Article!” and that if you’re already a member you can log in. But I thought I was logged in?

OK…. click to log in. There’s a field for email address and password. What password? Hmmm. I can click to have it reset, so I do that. A password and log-in link arrives in email.

I follow the link and log in. I go back to the home page. I click on the link to the video I want. I click on Watch Online.

Now I get a screen with a flash player in it. It says Please wait. Apparently forever. I wait ten minutes and begin to blog about my wonderful experience at the MIT/Stanford Venture Lab.

The video never loads.

I actually went through this process twice to verify the steps. The first time was a bit more complex, believe it or not, and involved a Captcha. Also, the two welcome mails I got from signing up were totally different! One looked like

Dear Terry,

Welcome to vlab.org. You are now ready to enjoy the many benefits our site offers its registered users.

Please login using:
Login: terry@xxxjon.es
Password: lksjljls

For your convenience, you can change your password to something more easily remembered once you sign in.

and the other also greeted me and finally, as a footnote, at the very end of the mail after the goodbye:

IMPORTANT: Your account is now active. To log in, go to http://www.vlab.org/user.html?op=login and use “i2nosjf3p” as your temporary password.

So weird.

And then, to top off the whole thing, I get a friendly email greeting which includes the following:

Dear Terry,

Thank you, and welcome to our community.

Your purchase of Standard Members for the amount of $0 entitles you to enjoy more of our activities, gain greater access to site functionality, and enhance your overall experience with us.

Your Standard Members is now valid and will expire on January 16th, 2038.

You couldn’t make this stuff up. It’s 2008. We’re trying to look at a free online video. Hosted by MIT/Stanford of all people. We’re prepared to jump through hoops! We’ll even risk being billed $0.00 multiple times! But no cigar.

AddThis Social Bookmark Button

Final straws for Mac OS X

16:54 January 24th, 2008 by terry. Posted under companies, tech. 7 Comments »

I’ve had it with Mac OS X.

I’m going to install Linux on my MacBook Pro laptop in March once I’m back from ETech.

I’ve been thinking about this for months. There are just so many things I don’t like about Mac OS X.


Yes, it’s beautiful, and there are certainly things I do like (e.g., iCal). But I don’t like:

  • Waiting forever when I do a rm on a big tree
  • Sitting wondering what’s going on when I go back to a Terminal window and it’s unresponsive for 15 seconds
  • Weird stuff like this
  • Case insensitive file names (see above problem)
  • Having applications often freeze and crash. E.g. emacs, which basically never crashes under Linux

I could go on. I will go on.

I don’t like it when the machine freezes, and that happens too often with Mac OS X. I used Linux for years and almost never had a machine lock up on me. With Mac OS X I find myself doing a hard reset about once a month. That’s way too flaky for my liking.

Plus, I do not agree to trade a snappy OS experience for eye candy. I’ll take both if I can have them, but if it’s a choice then I’ll go back to X windows and Linux desktops and fonts and printer problems and so on - all of which are probably even better than they already were a few years back.

This machine froze on me 2 days ago and I thought “Right. That’s it.” When I rebooted, it was in a weird magnifying glass mode, in which the desktop was slightly magnified and moved around disconcertingly whenever I moved the mouse. Rebooting didn’t help. Estéve correctly suggested that I somehow had magnification on. But, how? WTF is going on?

And, I am not a fan of Apple.

In just the last two days, we have news that 1. Apple crippled its DTrace port so you can’t trace iTunes, and 2. Apple QuickTime DRM Disables Video Editing Apps so that Adobe’s After Effects video editing software no longer works after a QuickTime update.

It’s one thing to use UNIX, which I have loved for over 25 years, but it’s another thing completely to be in the hands of a vendor who (regularly) does things like this while “upgrading” other components of your system.

Who wants to put up with that shit?

And don’t even get me started on the iPhone, which is a lovely and groundbreaking device, but one that I would never ever buy due to Apple’s actions.

I’m out of here.

AddThis Social Bookmark Button

Understanding high-dimensional spaces

18:46 January 23rd, 2008 by terry. Posted under other, tech. 5 Comments »


I’ve spent lots of time thinking about high-dimensional spaces, usually in the context of optimization problems. Many difficult problems that we face today can be phrased as problems of navigating in high-dimensional spaces.

One problem with high-dimensional spaces is that they can be highly non-intuitive. I did a lot of work on fitness landscapes, which are a form of high dimensional space, and ran into lots of cases in which problems were exceedingly difficult because it’s not clear how to navigate efficiently in such a space. If you’re trying to find high points (e.g., good solutions), which way is up? We’re all so used to thinking in 3 dimensions. It’s very easy to do the natural thing and let our simplistic lifelong physical and visual 3D experience influence our thinking about solving problems in high-dimensional spaces.

Another problem with high-dimensional spaces is that we can’t visualize them unless they are very simple. You could argue that an airline pilot in a cockpit monitoring dozens of dials (each dial gives a reading on one dimension) does a pretty good job of navigating a high-dimensional space. I don’t mean the 3D space in which the plane is flying, I mean the virtual high-dimensional space whose points are determined by the readings on all the instruments.

I think that’s true, but the landscape is so smooth that we know how to move around on it pretty well. Not too many planes fall out of the sky.

Things get vastly more difficult when the landscape is not smooth. In fact they get positively weird. Even with trivial examples, like a hypercube, things get weird fast. For example, if you’re at a vertex on a hypercube, exactly one half of the space is reachable in a single step. That’s completely non-intuitive, and we haven’t even put fitness numbers on the nodes. When I say fitness, I mean goodness, or badness, or energy level, or heuristic, or whatever it is you’re dealing with.

We can visually understand and work with many 3D spaces (though 3D mazes can of course be hard). We can hold them in our hands, turn them around, and use our visual system to help us. If you had to find the high-point looking out over a collection of sand dunes, you could move to a good vantage point (using your visual system and understanding of 3D spaces) and then just look. There’s no need to run an optimization algorithm to find high points, avoiding getting trapped in local maxima, etc.

But that’s not the case in a high-dimensional space. We can’t just look at them and solve problems visually. So we write awkward algorithms that often do exponentially increasing amounts of work.

If we can’t visually understand a high-dimensional space, is there some other kind of understanding that we could get?

If so, how could we prove that we understood the space?

I think the answer might be that there are difficult high-dimensional spaces that we could understand, and demonstrate that we understand them.

One way to demonstrate that you understand a 3D space is to solve puzzles in it, like finding high points, or navigating over or through it without crashing.

We can apply the same test to a high-dimensional space: build problems and see if they can be solved on the fly by the system that claims to understand the space.

One way to do that is the following.

Have a team of people who will each sit in front of a monitor showing them a 3D scene. They’ll each have a joystick that they can use to “fly” through the scene that they see. You take your data and give 3 dimensions to each of the people. You do this with some degree of dimensional overlap. Then you let the people try to solve a puzzle in the space, like finding a high point. Their collective navigation gives you a way to move through the high-dimensional space.

You’d have to allocate dimensions to people carefully, and you’d have to do something about incompatible decisions. But if you built something like this (e.g., with 2 people navigating through a 4D space), you’d have a distributed understanding of the high-dimensional space. No one person would have a visual understanding of the whole space, but collectively they would.

In a way it sounds expensive and like overkill. But I think it’s pretty easy to build and there’s enormous value to be had from doing better optimization in high-dimensional spaces.

All we need is a web server hooked up to a bunch of people working on Mechanical Turk. Customers upload their high-dimensional data, specify what they’re looking for, the data is split by dimension, and the humans do their 3D visual thing. If the humans are distributed and don’t know each other they also can’t collude to steal or take advantage of the data - because they each only see a small slice.

There’s a legitimate response that we already build systems like this. Consider the hundreds of people monitoring the space shuttle in a huge room, each in front of a monitor. Or even a pilot and co-pilot in a plane, jointly monitoring instruments (does a co-pilot do that? I don’t even know). Those are teams collectively understanding high-dimensional spaces. But they’re, in the majority of cases, not doing overlapping dimensional monitoring, and the spaces they’re working in are probably relatively smooth. It’s not a conscious effort to collectively monitor or understand a high-dimensional space. But the principle is the same, and you could argue that it’s a proof the idea would work - for sufficiently non-rugged spaces.

Apologies for errors in the above - I just dashed this off ahead of going to play real football in 3D. That’s a hard enough optimization problem for me.

AddThis Social Bookmark Button

One email a day

01:58 January 19th, 2008 by terry. Posted under me, tech. No Comments »

I’ve got my email inbox locked down so tightly that only one email made it through today. That’s down from several hundred a day just a few weeks ago.

All the email that doesn’t make it immediately into my inbox gets filed elsewhere. I deal with it all quickly - either deleting stuff (mailing lists), saving, or replying and then saving.

I’m spending way less time looking at my inbox wondering what I didn’t reply to in a list of a few thousand emails. That’s good. I’m spending less time blogging. I haven’t been on Twitter for ages.

In the productivity corner, I somehow managed (with help) to get a 3 meter whiteboard up here and onto the wall. It’s fantastic. I spend 2+ hours every morning talking with Estéve, drawing circles, lines, trees, and random scrawly notes. Today I sat talking to him in my chair while using my laser pointer (thanks Derek!) to point to things on the whiteboard. Ah, the luxury.

AddThis Social Bookmark Button

Tagging in the year 3000 (BC)

19:44 January 4th, 2008 by terry. Posted under books, representation, tech. 3 Comments »

Jimmy Guterman recently called Marcel Proust an Alpha Geek and asked for thoughts on “what from 100 years ago might be the hot new technology of 2008?”

Here’s something about 5000 years older. As a bonus there’s a deep connection with what Fluidinfo is doing.

Alex Wright recently wrote GLUT: Mastering Information Through the Ages. The book is good. It’s a little dry in places, but in others it’s really excellent. I especially enjoyed the last 2 chapters, “The Web that Wasn’t” and “Memories of the Future”. GLUT has a non-trivial overlap with the even more excellent Everything is Miscellaneous by David Weinberger.

In chapter 4 of GLUT, “The Age of Alphabets”, Wright describes the rise of writing systems around 3000 BC as a means of recording commercial transactions. The details of the transactions were written onto a wet clay tablet, signed by the various parties, and then baked. Wright (p50) continues:

Once the tablet was baked, the scribe would then deposit it on a shelf or put it in a basket, with labels affixed to the outside to facilitate future search and retrieval.

There are two comments I want to make about this. One is a throwaway answer to Jimmy Guterman’s request, but the other deserves consideration.

Firstly, this is tagging. Note that the tags are attached after the data is put onto the clay tablet and it is baked. This temporal distinction is important - it’s not like other mentions of metadata or tagging given by Wright (e.g., see p51 and p76). Tags could presumably have different shapes or colors, and be removed, added to, etc. Tags can be attached to objects you don’t own - like using a database to put tags on a physically distant web page you don’t own. No-one has to anticipate all the tag types, or the uses they might be put to. If a Sumerian scribe decided to tag the best agrarian deals of 3000 BC or all deals involving goats, he/she could have done it just as naturally as we’d do it today.

Secondly, I find it very interesting to consider the location of information here and in other systems. The tags that scribes were putting on tablets in 3000 BC were stored with the tablets. They were physically attached to them. I think that’s right-headed. To my mind, the tag information belongs with the object that’s being tagged. In contrast, today’s online tagging systems put our tags in a physically separate location. They’re forced to do that because of the data architecture of the web. The tagging system itself, and the many people who may be tagging a remote web page, don’t own that page. They have no permission to alter it.

Let’s follow this thinking about the location of information a little further…

Later in GLUT, Wright touches on how the card catalog of libraries became separated from the main library content, the actual books. Libraries became so big and accumulated so many volumes that it was no longer feasible to store the metadata for each volume with the volume. So that information was collected and stored elsewhere.

This is important because the computational world we all inhabit has similarly been shaped by resource constraints. In our case the original constraints are long gone, but we continue to live in their shadow.

I’ll explain.

We all use file systems. These were designed many decades ago for a computing environment that no longer exists. Machines were slow. Core and disk memory was tiny. Fast indexing and retrieval algorithms had yet to be invented. Today, file content and file metadata are firmly separated. File data is in one place while file name, permissions, and other metadata are stored elsewhere. That division causes serious problems. The two systems need different access mechanisms. They need different search mechanisms.

Now would be a good time to ask yourself why it has traditionally been almost impossible to find a file based simultaneously on its name and its content.

Our file systems are like our libraries. They have a huge card catalog just inside the front door (at the start of the disk), and that’s where you go to look things up. If you want the actual content you go fetch it from the stacks. Wandering the stacks without consulting the catalog is a little like reading raw disk blocks at random (that can be fun btw).

But libraries and books are physical objects. They’re big and slow and heavy. They have ladders and elevators and are traversed by short-limbed humans with bad eyesight. Computers do not have these characteristics. By human standards, they are almost infinitely fast and their storage is cheap and effectively infinite. There’s no longer any reason for computers to separate data from metadata. In fact there’s no need for a distinction between the two. As David Weinberger put it, in the real world “everything is metadata”. So it should be in the computer world as well.

In other words, I think it is time to return to a more natural system of information storage. A little like the tagging we were doing in 3000 BC.

Several things will have to change if we’re to pull this off. And that, gentle reader, is what Fluidinfo is all about.

Stay tuned.

AddThis Social Bookmark Button