• TechNom (nobody)
    link
    fedilink
    English
    2
    edit-2
    11 months ago

    Nicely written! Going into my bookmarks.

    SVN was, along with its proprietary contemporaries like Perforce, a pain in the ass as far as branching and merging and collaborative workflow across more that one branch of development or one machine installation.

    I’m one of those who was unfortunate enough to use SVN. It was my first version control too. Committing takes forever - to fetch data from the server to see if the ‘trunk’ (the branch) was updated (I see why Torvalds hated it). Even committing caused conflicts sometimes. People also used to avoid branching, because merging back was hell! There was practically no merges that didn’t end in merge conflicts. The biggest advance of merge workflow from those days was the introduction of 3-way merges. Practically all modern VCSs - including git and mercurial - use 3-way merge. 3-way merges cut down merge conflicts by a huge margin overnight. Git even uses 3-way merges even for seemingly unrelated tasks like revert, cherrypick, rebase, etc (and it works well for them. We barely even notice it!). Surprisingly though, 3-way merges were around since 1970s (the diff3 program). Why CVS and SVN didn’t use it is beyond me.

    Anyway, my next VCS was Bazaar. It’s still around as Breezy. It is a distributed VCS like Git, but was more similar to SVN in its interface. It was fun - but I moved on to Mercurial before settling with Git. Honestly, Mercurial was the sweetest VCS I ever tried. Git interface really shows the fact that it is created by kernel developers for kernel developers (more on this later). Mercurial interface, on the other hand is well thought out and easy to figure out. This is surprising because both Git and Mercurial share a similar model of revisions. Mercurial was born a few days after Git. It even stood a chance for winning the race to become the dominant VCS. But Mercurial lost kernel developers’ mindshare due to Python - it simply wasn’t as fast as Git. Then GitHub happened and the rest is history.

    And therefore, the version control system used by the team for one of the most advanced pieces of technology in the world for quite a lot of its development, was… diff and patch. And email

    That sounds like a joke, but it actually made more sense than anything else they could do.

    I was a relatively late adopter of Git. But the signs of what you say are still there in Git. Git is still unapologetically based on the idea of keeping versioned folders and patches. Git actually has a dual personality based on these! While it treats commits as a snapshot of history (similar to versioned folders), many operations are actually based on patches (3-way merges actually). That includes merging, rebase, revert, cherrypick, etc. There’s no getting around this fact. IMHO, the lack of understanding of this fact is what makes Git confusing for beginners.

    Perhaps this is no more apparent than in the case of quilt. Quilt is a software that is used to manage a ‘stack of patches’. It gives you the ability to absorb changes to source code into a patch and apply or remove a set of patches. This is as close you can get to a VCS without being a VCS. Kernel devs still use quilt sometimes and exchange quilt patch stacks. Git even has a command for importing quilt patch stacks - git-quiltimport. There are even tools that integrate patch stacks into Git - like stgit. If you haven’t tried it yet, you should. It’s hard to predict if you’ll like it. But if you do, it becomes a powerful tool in your arsenal. It’s like rebase on steroids. (aside: This functionality is built into mercurial).

    diff and patch and email could do that, and it was quick and easy for small changes and possible (with some discipline) even for large changes. Everyone sort of knew that it wasn’t ideal, but it was literally better than anything else available.

    I recently got into packaging for Linux. Trust me - there’s nothing as easy or convenient as dealing with patches. It’s closer to plain vanilla files than any VCS ever was.

    Some kernel developers were unhappy with having the source and all revisions “held hostage” within a proprietary VCS

    As I understand, the biggest problem was that not everyone was given equal access. Most significantly, many developers didn’t have access to the repo metadata. The metadata that was necessary to perform things like blame, bisect or even diffs.

    As best I remember, things came to a head when a couple members of the former group actually had their licenses pulled because McVoy said they had broken the agreement by “reverse engineering” his protocols, with people disagreeing over whether what they’d done actually fit that description, and the whole thing blew up completely with people arguing and some people unable to work.

    That sounds accurate. To add more context, it was Andrew Tridgell who ‘reverse engineered’ it. He became the target of Torvald’s ire due to this. He did reveal his ‘reverse engineering’ later. He telnetted into the server and typed ‘help’.

    Linus, in his inimitable fashion, decided to solve the problem by putting his head down to create a solution and then dictatorially deciding that this was going to be the way going forward.

    I thought I should mention Junio Hamano. He was probably the second biggest contributor to git back then. Torvalds practically handed over the development of git to him a few months after its inception. Hamano has been the lead maintainer ever since. There is one aspect of his leadership that I really like. Git by no means is a simple or easy tool. There has been ample criticisms of it. Yet, the git team has tried sincerely to address them without hostility. Some of the earlier warts were satisfactorily resolved in later versions (for example, restore and switch are way nicer than checkout).

    • mo_ztt ✅
      link
      English
      3
      edit-2
      11 months ago

      I’m one of those who was unfortunate enough to use SVN.

      Same. I guess I’m an old guy, because I literally started with RCS, then the big step up that was CVS, and then used CVS for quite some time while it was the standard. SVN was always ass. I can’t even really put my finger on what was so bad about it; I just remember it being an unpleasant experience, for all it was supposed to “fix” the difficulties with CVS. I much preferred CVS. Perforce was fine, and used basically the exact same model as SVN just with some polish, so I think the issue was the performance and interface.

      Also, my god, you gave me flashbacks to the days when a merge conflict would dump the details of the conflict into your source file and you’d have to go in and clean it up manually in the editor. I’d forgotten about that. It wasn’t pleasant.

      Git interface really shows the fact that it is created by kernel developers for kernel developers (more on this later).

      Yeah, absolutely. I was going to talk about this a little but my thing was already long. The two most notable features of git are its high performance and its incredibly cryptic interface, and knowing the history makes it make a lot of sense why that is.

      Mercurial interface, on the other hand is well thought out and easy to figure out. This is surprising because both Git and Mercurial share a similar model of revisions. Mercurial was born a few days after Git. It even stood a chance for winning the race to become the dominant VCS. But Mercurial lost kernel developers’ mindshare due to Python - it simply wasn’t as fast as Git.

      Yeah. I was present on the linux-kernel mailing list while all this was going on, purely as a fanboy, and I remember Linus’s fanatical attention to performance as a key consideration at every stage. I actually remember there was some level of skepticism about the philosophy of “just download the whole history from the beginning of time to your local machine if you want to do anything” – like the time and space requirements in order to do that probably wouldn’t be feasible for a massive source tree with a long history. Now that it’s reality, it doesn’t seem weird, but at the time it seemed like a pretty outlandish approach, because with the VCS technologies that existed at the time it would have been murder. But, the kernel developers are not lacking in engineering capabilities, and clean design and several rounds of optimization to figure out clever ways to tighten things up made it work fine, and now it’s normal.

      Perhaps this is no more apparent than in the case of quilt. Quilt is a software that is used to manage a ‘stack of patches’. It gives you the ability to absorb changes to source code into a patch and apply or remove a set of patches. This is as close you can get to a VCS without being a VCS. Kernel devs still use quilt sometimes and exchange quilt patch stacks. Git even has a command for importing quilt patch stacks - git-quiltimport. There are even tools that integrate patch stacks into Git - like stgit. If you haven’t tried it yet, you should. It’s hard to predict if you’ll like it. But if you do, it becomes a powerful tool in your arsenal. It’s like rebase on steroids. (aside: This functionality is built into mercurial).

      That’s cool. Yeah, I’ll look into it; I have no need of it for any real work I’m doing right now but it sounds like a good tool to be familiar with.

      I still remember the days of big changes to the kernel being sent to the mailing list as massive series of organized patchsets (like 20 or more messages with each one having a pretty nontrivial patchset to implement some piece of the change), with each patch set as a conceptually distinct change, so you could review them one at a time and at the end understand the whole huge change from start to finish and apply it to your tree if you wanted to. Stuff like that was why I read the mailing list; I just remember being in awe of the type of engineering chops and the diligence applied to everyone working together that was on display.

      I recently got into packaging for Linux. Trust me - there’s nothing as easy or convenient as dealing with patches. It’s closer to plain vanilla files than any VCS ever was.

      Agreed. I was a little critical-sounding of diff and patch as a system, but honestly patches are great; there’s a reason they used that system for so long.

      As I understand, the biggest problem was that not everyone was given equal access. Most significantly, many developers didn’t have access to the repo metadata. The metadata that was necessary to perform things like blame, bisect or even diffs.

      Sounds right. It sounds like your memory on it is better than mine, but I remember there being some sort of “export” where people who didn’t want to use bk could look at the kernel source tree as a linear sequence of commits (i.e. not really making it clear what had happened if someone merged together two sequences of commits that had been developed separately for a while). It wasn’t good enough to do necessary work, more just a stopgap if someone needed to check out the current development head or something, and that’s it.

      That sounds accurate. To add more context, it was Andrew Tridgell who ‘reverse engineered’ it. He became the target of Torvald’s ire due to this. He did reveal his ‘reverse engineering’ later. He telnetted into the server and typed ‘help’.

      😆

      I’ll update my comment to reflect this history, since I didn’t remember this level of detail.