GitHub Sucks

2010-05-22 13:46 - Rants

I know it's a contentious title. And this is a long rant. But it reflects my current attitude, however hyperbolic. Let me tell you the story of how I ended up feeling this way. It's one whopper of a rant. But it's worth explaining, and maybe it will mean GitHub improves, at least a bit.

I've been involved with user scripting and the Greasemonkey community for about five years. It's something I believe in and am passionate about. As a result, I participated in Greasemonkey's development from rather early on. Around a year ago, the source and project tracker host that Greasemonkey was using (DevjaVu) announced that it was shutting down. It was right around this time that Aaron Boodman, Greasemonkey's original creator, was looking to step down. I took over co-ownership/maintenance of the project, along with another active member of the community. We had some discussions, and much of the community was interested in moving from Subversion to Git. In hindsight, I'm very glad we switched. Git has a lot of powerful options (and a steep learning curve to go with them!) and is suited perfectly for open source development.

With the decision for Git as a source control system, our choice for a host was directed rather simply. In short, GitHub was the defacto choice. Coming from Subversion + Trac, we wanted a wiki for development specific references, and a good issue tracker. GitHub claimed to offer both, plus Git source control. So off we went!

Nobody involved had used GitHub enough before to have a strong opinion, but we saw their feature list, and they seemed to have checks in the boxes we were confirmed with. So it's true, GitHub offers a wiki, and an issue tracker. Barely.

The Wiki

We wanted to record simple things like: this is our coding style, this is how we like to accept patches, this is the sort-of-long-term roadmap plan, and so on. We had used the wiki section of Trac for this before. For a short period of time, we did exactly that. But it had issues from the get-go.

The wiki uses Textile. That's fine, on its own. But nothing else on GitHub uses it (for comparison, the issue tracker, and comments in general on the site, uses an enhanced version of Markdown. Other things support only plain text, i.e. newlines.). You have to learn this markup language to use the wiki, and you have to learn other markup languages to use other features. Consistency would be nice.

There's practically no version history. You can look up the rendered history. But if you want to restore from the history, you're going to have to turn the rendered HTML back into Textile by hand. You can only edit (and thus see the source for) the current revision. Any and all wikis fall victim to spammers, malicious users, and innociently ignorant users. Cleaning up after them must be a straightforward task. MediaWiki (which runs Greasemonkey's user support wiki) has a lot of great features in this area.

Along these same lines: there's also no permission controls of any kind. Any GitHub user can create and edit wiki pages for any project. This and the lack of accessible history bit us in the same way: malicious users. As we were migrating, somebody decided that they were the authoritative source for information about Greasemonkey, so they decided to insert dozens of pages, and exit existing pages, in the official greasemonkey github project's wiki (since removed). Those of us actually in charge of the project didn't appreciate it, so we undid those changes. They redid them, we undid them. Eventually we wrote a script to clean up the wiki; they must have done the same because every few minutes it was back. The point? While yes, technically, there is a wiki, it's limited. The features are the absolute bare minimum to possibly meet the definition. And this caused us headaches and problems.

The Issue Tracker

At first, the issue tracker seemed ideal, in that it looked both streamlined and straightforward. Time quickly revealed that first impression to be false. Here's the deal.

Issues have: a number, a reporter, a title, a body, a single non-nested thread of comments, possible labels, a count of votes, and an open/closed status. That's it. No resolution type, no version, no severity, no cc: list, no owner/assignee, no milestones, no priority. Even worse is something they do have: editability. The body and title, and all the comments, can be edited after the fact (by project collaborators, and by the first party, at least). I'll get into notifications and communications later, but it's worth pointing out now that when these existing values are edited, it's completely silent. Nobody will know that it changed. And there's no evidence that the comment visible on the site isn't the original one. This is difficult to keep track of, and simply not a good way to communicate online.

Basically, every feature you want has to be built on top of labels. We'll label "wontfix" or "duplicate". We'll label with version numbers for milestones. But you can only search for open issues with a label. I can't, for example, finish a release, then go look up the (closed) issues with that label to review what has been accomplished. There's only one search box; but hey, there's no status or assignee or anything else to possibly limit the search to, so that makes sense.

The issue system is built on an "AJAX" interface, which has the potential to be fast and friendly. It's not. Actually, it's not even AJAX (see the postscript, below). What it almost always means is that after I follow a link, I get an empty page loaded, then another delay for the actual content to load into this empty shell. This is especially bad when following a link to a particular comment — the whole page loads, then later loads the second time and scrolls. When I post a comment, it isn't just added, the entire thing reloads. It's clunky, and actually quite slow. Why? Probably because the snazzy "AJAX" interface actually loads every ticket, so that you can quickly jump back and forth to the previous and next one. But I never want to do that. (P.S. It's not actually AJAX! All the data is in-line in the HTML, and the script just selects part of it to show. So, you have to re-load all this inline data at every single page load, which helps explain the slowness, and why commenting does a forced refresh of the whole page.)

Since the content of the issues are actually displayed by this faux AJAX, the title of every page is just "Issues - user/repo", this doesn't match the issue actually being displayed. Good luck sorting out three tabs with different issues open in each.

Comments are in github flavored markdown, but there is no preview. You better get that syntax right the first time (or, make silent edits that nobody will be notified about). Issues can only be in one of two statuses: "open" or "closed". And it's really hard to know, given a ticket, which status it is in. There's no log of when a ticket is closed. All you get is the comments, and you'll have to trust their content, if anyone even mentioned their closing action.

You actually can say "Fixes #123" in a commit log, but: not if you're just merging in a commit someone else wrote. This feature is almost completely undocumented, so good luck discovering it. The commit view will not auto-link the "#123" to the ticket, and the ticket will have no record that said commit was the reason it was closed. This feature doesn't provide any of the important automated help that would make it valuable.

Notifications

To talk about notifications, first we need to explain the many types there are in GitHub. There are plain "notifications", these will show, when unread, as a number beside your username on the site. You'll (optionally) also get an email about these notifications. There's a news (RSS) feed, for each user, as well. Why all three? Well, it might be nice to have options. But as I'll explain, this doesn't give you the flexibility of choice.

First, notifications. You'll get notifications about (from the notification options page):

There's two problems here.

First: you might notice that something is missing. Commits. There's no notifications for commits. (There are commit hooks, and one of them is email, but only one email address can be provided, users can't choose to listen or not, like all the other notifications.) Commits show up only in the news feed, so you'd better be ready to subscribe to that, if you want to know about new commits being added. The feed, of course, contains a different set of contents. I know you'll see new issues (but not comments on issues), commits, and comments on commits. But I can't say I've ever seen anything else in the feed. The feed will say "user opened issue 123 on user/repo" for a new issue, which makes sense. Email notifications, however, always say "[GitHub] Issue title [user/repo GH-123]", and they always say that, regardless of whether it's a newly opened issue, or a comment. And until recently, comments' email notifications didn't indicate anywhere who placed the comment. Often, you'll have to manually decode multiple emails for the issue, and end up reading the same comments twice, once in your email and once on the webpage.

Second, and more insidious: that frequent term "my". What are "my" repositories? Honestly I have no idea. It's either going to be "owned" repositories (those in your account, and those you are a collaborator on), or maybe it's watched repositories? Like so many things, it's hard to know, because there's no clear documentation. Either way, it's isolationist. It makes it a bit mysterious when, why, and to whom notifications will be sent. This is a big enough problem that it's got it's own section, and here it comes.

Communication

By far my biggest problem with GitHub is its impact on communcation within my project. Again, I'm one of two co-maintainers of the Greasemonkey project, and have been for about a year now. That means the two of us have final say what happens, what is included and what is not, and other administrative details like how we accept patches and such. Like I said at the beginning, I'm in this situation because I'm passionate; I want this process to work smoothly. Not only do I want a good user script manager for myself, I want one to be available for everyone, so the whole web is better. We try to accomplish this with an open community, around our open source project. Not only do Johan and I share decision making (and attempt to do so in the open), we invite other interested parties to take part in the process. I, at least, care what other people have to say (even if I don't always agree, and don't always follow their advice — after all, you can't follow everyone's advice, one person's will often contradict another's).

GitHub makes that hard. I'd like to try to explain to you why this is.

I've already alluded to the first problem. First: some things (comments) only show up as "notifications" and some thing (commits) only show up as "news" items. If you want to keep an eye on everything that's going on (and I do!) you'll have to monitor both.

Second: you end up seeing the same stuff twice, once in your email (where I'd prefer to centralize — I have the power there to set read/unread, arrange into folders, delete when something is seen and done, etc, etc) and once in your news feed. This is compounded by every notification (which I've already read, in my email) showing up on the site; since I'm a dedicated "zero inbox" sort, I hate seeing that "unread" marker, especially for things I really have already read, so I waste bits of time here and there deleting notifications from the website, which I've already seen in my email. On top of that, any issue which generates a notification gets marked "unread". Again, I've probably read everything there is to read already via my email, but I have to go to yet another place to "mark read" this information, and get it out of my face.

So far these are just annoyances. The third problem, on the other hand, is truly serious. GitHub forces communication to be fragmented and disconnected. It makes collaborating and sharing decisions really difficult. This starts off with problem two above: it's a pain to keep up to date with what's actually new, even when you are told what's new, because you're probably told many times. However, the meat of this problem is that you often aren't even told once. It might be because something happens somewhere that is not "my" repository (like one of the many forks), and for the rest, here's the explanation, via examples.

It starts with Greasemonkey issue 1089. I started off with greasemonkey commit 9beb729d, and (manually) commented in the issue that I had pushed this commit to resolve the issue. I missed a detail (I didn't actually remove items over the limit, just their keys). Even though I saw the flaw and planned to fix it, I forgot between then and when I actually pushed the commit. So, erikvold helpfully pointed it out with a comment — in the commit. This feels natural, because in the commit you can point out specifically which part of the change you're discussing. However, this started off a thread of comments separated from the issue, in the commit. Problem the first: who gets notifications about this thread? Again, I have no idea. I strongly suspect that it is another class of "comments after me" — people who have commented get put on an invisible cc: list. So if you first read the commit, and it seems ok, you'll close it and think nothing more. If someone else points something out, and starts a thread which you never see, you'll never add your input, because you don't even know the conversation is happening. Collaboration fail.

Here's another kind of problem. I'd love to walk through an actual example here, but I simply can't find it anymore, even though I know I've seen it. (Another strike against GitHub — I can't use it as an effective reference to go back and see what people have said in the past. There's certainly no comprehensive search feature.) Communication is so fragmented that it is rapidly lost. Anyway, I've seen and hated this series of events: First, there's an open issue. Someone decides to resolve it, creates and pushes a proposed commit-to-fix, and puts a link to this commit in the issue. Other people start a separate isolated thread of discussion on the commit, and then the commit is superceded by another suggestion. A second proposed commit is provided, merged in, and again another isolated thread of conversation starts on either or both the second proposed commit and/or the third commit: the one merging it into the official repository. At every point through this chain, a potentially different and definitely limited set of people goes off into their own space. Collaboration is ruined.

The effect of all this is that communicating to a tiny group of interested parties is easy, but becoming part of that set of interested parties is really hard. You'll need to keep a very close eye on every commit and comment you see everywhere, and if they arent already, make them part of the "my" set, so you'll get notified when the conversation continues.

What instead?

Personally, to make Greasemonkey go, I want something much simpler. Given my experience, I find that Git, with easy public repositories, is wonderful for collaboration. What we need to add onto that is (only!) an easy way to link to any particular commit, branch, file and/or line, and e-mail based collaboration, preferably on the developer's mailing list, which everyone sees and can participate in, and only once per event. Given even a large list of items in your email, it is quick and easy to read through, ignore if necessary, or just hit reply to join in.

Some kind of issue tracking is important, because I want users to be able to report problems (Trac is great in this regard, anyone can report problems with one web form, and no account setup or anything else), and especially I want to be able to track and manage them — which are important, which will be fixed for version X, which is user Y already working on, etc. Updates to these issues should, again, be sent exactly once, via email, to the list. Today, GitHub doesn't do these things I want, and it does do lots of stuff I don't appreciate.

So that's why the strong title of this post. The bottom line is that I've found I like a lot of what Git does and provides. GitHub layers a set of features on top of Git source control — but I find them all severely lacking.

Comments:

Possible Solution
2012-12-11 21:22 - Ekyo777

I had similar concerns, solved these by using assembla instead of github. I suggest you try it

Not accurate; things have improved in two years.
2013-01-17 11:15 - amcgregor

This article is no longer accurate despite being a fairly up-there hit when searching for 'github textile wiki'. Unfortunate!

The wiki system on GitHub can use any number of markup languages, and the markup in use can vary from page to page.

Additionally, the wikis are full Git repositories, allowing you to clone and manage revisions, branch, and roll back as needed. (Sure the web interface isn't the greatest as it doesn't give you full access to all actions possible on the repository, but neither does the primary repository browser…) One of my employers manages cloud infrastructure, and our VMs automatically update the wiki with information about themselves when they come online.

"No resolution type, no version, no severity, no cc: list, no owner/assignee, no milestones, no priority." -- In the statement prior to this you already presented the solution to these values being "missing". (Other than cc:, which does exist as people can subscribe and unsubscribe to the issue.) The tag/label system! Have a gander at the issues for my web framework for an example of the label structure and colour coding scheme I use across my projects. Additionally I've broken out the exact values I use in a gist. It's a slightly different workflow than having discrete singular values in drop-downs, but this is the new age. Tags are where it's at.

"There's no notifications for commits. There are commit hooks…" -- Send the e-mail hook to a in-company mailing list which lets people subscribe and unsubscribe. We have the technology. (That's what we do at my primary place of work.) When collaborating on large projects use the fork+pull-request model. The owners of the primary repository will be notified when a pull request comes in. Then the owners may choose to accept the pull (with GitHub handling merging, or warning you if it can't cleanly merge) or reject it, and while the request is open the original creator of the request may add additional commits to it. The workflow is surprisingly smooth:

Developer forks your project, creates a branch, and starts hacking away. Satisfied with his work, Developer pushes to his fork on GitHub and issues a pull request to the master repository. Owner gets notified and reviews the commit(s). Communication ensues, potentially with additional commits being pushed by Developer. Once Owner is satisfied, he pushes the button and bam, that's an accepted contribution created in isolation.

E-mail integration in GitHub has also greatly improved in the last two years. You can interact with issues and comment on commits by replying to the notification e-mails. In fact, for the majority of uses you never actually need to visit the GitHub website!

Have a great day, and this was an interesting read. (Just wanted to add a few notes on the improvements that have happened over time.)

P.s. your website is excruciatingly difficult to comment on. I'm logged in, then I'm not, then I preview but have no way to post (and pressing back deleted the content of the form…) Painful!

Post a comment:

Username
Password
  If you do not have an account to log in to yet, register your own account. You will not enter any personal info and need not supply an email address.
Subject:
Comment:

You may use Markdown syntax in the comment, but no HTML. Hints:

If you are attempting to contact me, ask me a question, etc, please send me a message through the contact form rather than posting a comment here. Thank you. (If you post a comment anyway when it should be a message to me, I'll probably just delete your comment. I don't like clutter.)