Stop using number of git commits as any metric for anything, it's idiocy

Stop measuring git commits, it is stupid! On so many levels and from so many perspectives is number of commits a super duper terrible metrics.Before I argue my statement, I would like to say that of course, it looks bad with absolutely NO public activity from developers over a long period of time (6-12 months). I say ‘public activity’ because there can be activity, as in code being written, without it being public. More about that below.Some folks seem to be very keen on using the number of commits as an indicator for the success of a project. There are sites highlighting these irrelevant metrics, e.g. https://ift.tt/2P9IvQM about me:I have a masters degree in computer science and I've worked professionally as a developer for 9 years. I have developed two crypto libraries, one crypto wallet and I'm working on my second one. I will not mention which ones, primarily because it is irrelevant, secondly, I don't want this post to be downvoted for shilling any specific crypto project.First let's re-iterate some important conceptsVCS (Version Control System), `git` is the most popular. `svn` is another, but older and not as used any moreGit is a VCS protocol, nothing else. Git is not Github. Github uses gitGithub: is an American for-profit company owned by Microsoft (bought in 2018), it is one of the most popular code hosting platform using the Git protocolGitlab: an alternative to GithubBitbucket: an alternative to Github, owned by Atlassian, who also develops Jira.More info about Git:“git commit”: A commit is like a bookmark, when you read a book, you can either use a bookmark on every page, or read the whole book without any bookmarks. The commit is just saved locally only your computer until you “git PUSH”, see below“git push”: Sending your local commit or commits to any remote git repo, which is a project hosted by any code hosting platform, e.g. Github.“git squash“: Some people like to do many commits while coding, but just prior to pushing the code, they “merge” together all commits into a single one."commit --amend": Let's say I just commited a change in the README, and then I noticed that I misspelled a word, then I can fix that commit (changing it), and fixing the misspelled word, by using `git commit --amend`. Some developers do that, other just fix the misspelled word in a new commit. The difference is that `git commit --amend` results in one single commit (changed), whereas the latter results in two commits.Different methodology, but same code:How often developers commit differs A LOT, and I mean completely. Personally, I tend to code for a couple of hours, days or even weeks without making a single commit (frowned upon by some). Whereas other developers might commit any changed line of code. When the code gets pushed to the VCS remote repo (e.g. Github), it is still the exact same code. But coming from me it can be a single commit, but coming from Alice it can be 1000 commits. Same code but a difference in a number of commits by 3 orders of magnitude.Git squashIn the example above, maybe Alice committed 1000 times (whereas I committed once), but Alice also likes to have one single commit per feature/bugfix/improvement she is working on, so she git squashes and merges all here 1000 commits one. So now Alice method and my method ARE EXACTLY the same, when the code is pushed to GitHub. But it is impossible for us others to know that Alice single commit, actually was 1000 commits prior to being squashed.Private repo’sEven though most crypto projects are open-source, some code might not be open-sourced at first, but might be at a later point in time, so these repositories will be hidden from the public, thus there can be a lot of activity in a certain project without the public knowing about it.Personal repo’sEven though most companies/projects have all their repositories under the same organization in the VCS code hosting platform, some repo's relevant to the project might not be. E.g. if you look at Bitcoin's page on Github, you will find 4 repositories: https://ift.tt/12evB0f but some of its core developers might write experimental code in separate personal repo's (that might be private). Or repos not yet pushed, i.e. code sitting locally on her computer.ForksWhen Alice codes in a distributed project with many contributors it might be most suitable for her to not be using the projects repo directly, but rather a personal version of the whole code project, known as a fork (please not that this has nothing to do with 'forks of a DLT (e.g. blockchain)', as in spawning a new version of said crypto project, e.g. Litecoin and Bitcoin Cash being forks of Bitcoin, I'm talking about 'git fork' here). So Alice codes away in any branch, any number of commits, or a single one, in her own personal (git) fork of e.g. Bitcoin, her own repo. Then after some time (hours, days, weeks, months), she creates a Pull Request to the 'upstream repo' (original/source repo), and if other developers are happy with her work, it gets merged. So there might be activity, many of few commits, in another git repo, being a fork of the original one. The bitcoin C github repo currently has 23,958 git forks: https://ift.tt/2faU320 that's actually so many that Github displays this message "Woah, this network is huge! We’re showing only some of this network’s repositories". So in order for you to KNOW that there is NO activity for any developers, you would actually need to go through ALL forks (in this case ~24 thousand) of a repo to see that there have been no commits done recently. But as stated above, not even that is enough (the commits might not have been pushed yet, right?)Irrelevant "wash" commitsI just coined the term "wash commits", so don't google it (you will only get images of jeans...LOL). Just like there are wash trades, faking volume, any developer can either manually or using some trivial script, at a regular interval just add some character, e.g. a space, in any file of a project, git commit and push that change, and then perform another git commit removing said newly appended character. Then it will look like the project has activity. Hell, you can even do this in 10,000 commits daily, "Wow man! Look at all that activity! This crypto project is the best!" - well, no.No squash and no amendTwo developers, Alice and Bob, neither uses `git squash`, but Alice uses `git commit --amend` to fix typos an other smaller changes, but Bob does not. Since neither uses `git squash`, over a long period of time this might result in a huge difference in the number of commits.Rebase vs MergeWhen Alice and Bob, working in the same repo, wants to merge together their different features they have been working on, they can do so by using two different methods, either `git merge` or `git rebase`, they former results in one extra commit, a commit of the merge event itself, whereas the latter does not result in any extra commit. These are different styles of working and often debated which is to prefer. Over a long period of time this might result in a huge difference in the number of commits.More LoC, worseLoC = Lines of Code. The more lines of code, the worse, okay. Many LoC is NOT at all, in any way, a good thing. The theoretically (however, of course, impossible) best code base, is the code base with 0 lines of code. It is trivial to maintain, you just have to do... nothing. It contains NO bugs. Code is in its natural state buggy. So many commits ADDING new code are not always good. Better with commits removing code, given the same functionality."But but but... how can I easily determine which crypto project is best by looking at Gitlab/Bitbucket/Github?"Well you can't, that is my point. But if you want some tips of what to look for, using these metrics are actually relevant:Number of contributorsNumber of forksNumber of starsNumber of pull requests (PR for short, called "Merge Request" in Gitlab), and how many of them are open? How fast does a PR get merged?Last commit date: WARNING for false positives! Remember "wash commits" (mentioned above), if the last commit date is recent, it does NOT necessarilty mean that the project is active, have a look at the commit. Does it look trivial or not? A trivial commit is e.g. a commit adding a newline/space in the README.

Submitted August 23, 2019 at 04:02PM

No comments:

Post a Comment