I wonder if BitKeeper owner Larry McVoy has ever regretted not open sourcing his software? Git tools is a whole industry now. The most promiment one, github, recently become one of the 100 most popular websites on the planet.
Regret it? Sure. I'd do it in a heartbeat if I could figure out how to make it work. Still would and there is plenty in BK that Git doesn't have. Like submodules that actually work exactly like a monolithic tree, just lets you clone what you need.
But we've never figured out how to make it work financially. If anyone has any ideas I'm all ears (though pointing at github and saying "do that" isn't an idea that I can execute).
BTW, BK used to be pretty darned close to open source, you got the source code under a funky license that said "don't take out the part that lets us make money". We stopped shipping the source when we learned that the very first thing that someone committed to the repo was taking out the part that let us make money.
Very cool of you to share your thoughts. I sympathize with your dilemma. It seems that the people who end up making money out of free/open source software are often not the ones who write the code. And I remember reading back in the day that Linus would talk with you extensively about the nuts and bolts of DVCSs, so I'm sure all git users owe you some gratitude for inspiring him to create git and getting the fundamentals right from the get go.
Out of curiosity, and please feel free not to answer, is BK still a viable commercial product bringing in significant revenue? And what obstacle do you see with going the platform/service route like github? I assume that's something you've seriously considered, even without open sourcing BK.
Yeah, BK still pays the bills for our team. We're small though, I recently found out that perforce has around 250 people, we're less than 1/10th that. But we pull in millions a year, enough to pay our people above scale even in the bay area, so far, so good.
I'll admit we've fallen off the radar (well, we were never really on the commercial radar, the only "marketing" we ever did was getting Linus to use it and that wasn't intended as marketing, it was intended to keep the kernel from diverging like the BSDs did. But it turned out to be a form of marketing that has kept us alive).
We're gonna try some actual marketing. Stay tuned. We'll probably screw it up :) But we hired a marketing company, I've gone back to writing papers, we'll give it a try. If you have ideas on how we can put ourselves back out there, we'd love to hear them.
As for viable, heck yeah. We work well on big repos (better than git), we've got what we call nested collections of repositories (did I mention I suck at marketing, yeah, I came up with what to call it) that are sort of like submodules except they work exactly like a single repo, sideways pulls work, anything that works with one repo works with N repos, that includes all the guis, command line, everything. We've got an answer for binaries that works for gaming companies. We've got a sane user interface (that's what Mercurial copied, in a somewhat sketchy way).
Git is sort of like the wild west, it never met an idea it didn't want to implement (at least partially). We're more enterprise ready (yeah, over used term) in that we work hard to make sure that BK has all the guard rails, seat belts, etc, so that you can deploy to people who could care less how any SCM works and they don't drive themselves over a cliff. Definitely less cool than git in that we take away some (bad) options, but safer.
We have seriously considered open sourcing a version of BK. We've been doing a lot of performance work and we essentially have two BK's, the almost SCCS compat ascii format slow version (slow but any version of BK will talk to it), and the fast one with a new binary file format (stuff like show the top commit comments are 35x faster in the linux kernel, that number goes up as you add more csets). We considered open sourcing the slow one but that effort has stalled. It could be revived, it just has to be worth it to us.
The github ship, in my opinion, has sailed. Maybe we could have open sourced BK back before git and done a github thing but it's all flashy UI stuff and we sort of suck at that. We're really good at systems stuff (you'll see when we start doing marketing, we scale, git doesn't) but flashy? Not so much. We do our UI in tcl/tk (I know, I know, but we have one UI person who makes it all work on windows/linux/macos and tcl/tk is a big part of that. At least we wrote a C like language that compiles to tcl byte codes so we're out of tcl. Thank God.)
Wouldn't the standard open source-as-freemium work? i.e. a free open source version with enough cool features (e.g. nested respositories), but not efficient. It's free marketing to keep you on the radar, that targets the people who appreciate your systems chops (and, like Atlasian, also gets it in under the radar, to developers). Enterprise customers happily pay ridiculous sums for full versions. And git/hg makes you immune to the competitive danger of open source clones.
I'd value your thoughts on this, as I also have a popular open source competitor, that followed me. The strategy seems sensible, but it might undermine perceived value; and it's a hassle to maintain two versions...
Also, can I please ask a technical BK question: How much does git differ from BK internally? i.e. git has graphs of commits, content-addressable for efficient checks of identity and integrity. Did git get any of that from BK? Or was it more the workflow and distributed concept of everyone having a copy of the repository? Many thanks!
Linus definitely did his own thing with git. The general ideas came from BK, BK gave you clone/pull/push/commit as the model. Everyone copied that because it just makes sense. The all or nothing clone model came from BK.
How it is all glued together differs quite a bit. BK has the concept of a revisioned file, git does not, it versions trees. That's why Linus thinks renames are silly, he doesn't care about them, he cares about the tree.
The graphs of commits comes straight from BK, that's BK's changeset file - which is sort of neat in that it is a version controlled file itself. BK is the only system that I know of that uses a versioned file to store the metadata.
OK, so on the business model thing, I'm not sure. The way we did the old compatible format is compatible but it's pretty slow, it converts to the new format in memory and then converts back if you write it out. It's slower than the older implementation (but this way we have one in memory format, less bugs). I thought it was good enough for small projects, my team overrode me and said "too slow".
As for enterprise customers "happily paying", um, no. We constantly get wacked with "if you don't do this or that we're moving to git". Which could be viewed as a good thing, we have to keep making it better, but it gets tiresome.
Renames are a thing and git made the wrong choice there. It's not like we are perfect but we are way closer.
So on versioning changesets I didn't really explain. Lemme try again.
In any DVCS you have a bill of materials, that's what describes the tree. Git's is different than ours because they don't version files, we do. So our bill of materials looks like:
If you "cat" the changeset file as of any version you get what the tree looks like, a list of files and a list of revisions.
Of course it doesn't work like that because, um, reality and merges and parallel development. We have UUIDs for each file and each version so it looks like
UUID_for_a_file UUID_for_a_version
and our UUIDs are pretty sweet, not sha1 or some other useless thing, they are
those are for each node in the graph, for the very first node which is the UUID for the file, there is a "|<64 bits of /dev/random>" appended.
So the changeset file is just a list of
UUID UUID
Not sure if that helps.
The benefit of versioning the file that holds all that data is we can use BK to ask it stuff.
Want to see the history of the repo? bk revtool ChangeSet
Want to see what files changed in a commit? bk diffs -r$commit ChangeSet
Yeah, we have to process all the UUIDs and turn them into pathnames and revisions but we can do that and do it fast. So it works.
All the tools we built to look at stuff can look at the metadata. That's worked out well.
It's not as far fetched as you might think though, we've been building million cset trees and git is so darn slow that we made a "bk fast-export" that spits out the stuff that git wants. Because Git was over 20x slower just running the commits.
It's really great to see you answer my question yourself! It seems to be much easier to create a business by using open source software, than writing open source software. Especially for small companies like yours that create complex software like BitKeeper.
BitKeeper is still around, selling to a few large corporations. It's a very different business model.
Git has a large industry of value-add / support companies now. But no one commercial company can own or control the core. And that's one key reason why it's so popular and has such a large ecosystem around it. So to be like git, bitkeeper would have had to really give up central ownership and control. Commercial companies never do that unless they're going out of business anyway, they just can't.