I recently got lookatgit to scan various projects I am interested in. I won’t post results just yet, because they are not shiny, and there are lots of caveats and too much room for error which people would easily misunderstand, but I’ve been thinking of them.
There are some inaccuracies in the way I’ve preserved attribution in source control, so I’ll just state a guess — contributions to my project (in terms of # of commits) from non-redhat domains (what we are most interested in from a community perspective) are probably in the 15-20% range. Ow! I could have sworn they felt like 50%. Github looks like 50% sometimes. Are my illusions shattered or what? At first I was sad, but before we go shoot OSS and Matt Asay rewrites about how the world is imploding, let me state what I think this really means. I think that 15-20% is quite awesome, and I’ll get to that. Func looks to be 45% but that’s skewed due to some heavy GSOC contributions (money was involved, via GSOC, so I don’t count that as quite as pure as I’d like — but thanks Google!). Still, what I’ve been claiming about one developer being as powerful as 10? Is that still true? I want it to be true.
My theory on this — OSS is most interesting not because of size of “outside” contributions. The best patches are done by clever ninjas. Frequency of commits don’t always count … what counts are /diversity/ of commits and all the other incidental things that you can’t compress into metrics. From the data we learn not what value we get from OSS is (that requires humans to think about the data and the project), but more about what patterns around OSS contribution are really like. As we see the data we begin to get a little better understanding. Want a good metric for success? Size of the contributor pool and what features they have added.
To me, OSS is most interesting to me because of feedback loops involved in the free exchange of ideas, that accelerate and motivate the pace of technology. Freenode and mailing list culture, basically. Being able to constantly relate to users on a day to day basis is not only inspiration, but it’s raw data on how things are used in the field, and they (users) have a chance to manipulate that in ways not traditionally found in proprietary software. “A mentat needs data”. Having data is how we excel and can create interesting things — that can’t be done in a vacuum and OSS provides greater engadgement levels. Also, limited resources and a good helping of YAGNI, tempered by the feedback loops, helps us not design features we don’t need. Without that feedback loop, we’d guess what users needed and guess wrong half the time.
The other thing relates to my earlier posts about always working on the most important item to the userbase and to me. The most important item to me isn’t always the most important problem to Jimmy Bob in Outer Mongolia, so getting him to dive in and make a small contribution (in terms of LOC) allows him to help solve his own problems while allowing me to concentrate on “core” tasks. It may be the only way that it gets done, and having those things tweaked makes overall products seem that much more usable.
So, if my code scanner reports 20% outside LOC contribution (on a lines changed basis), and I don’t think that’s huge, well… time to rethink that.
What’s the difference between a project that is 80% complete and one that is 100% complete? What is the difference between a raw amethyst or ruby and a cut and polished gemstone? (Not saying I possess any gemstones). It’s that 20%. That 20% can be HUGE and often it is coming in the form of new ideas and outside perspective. 20% is the difference between “I’d rather use X” and “this really works for me”. It’s the difference between a C and an A grade. Further, the last 20% is the hardest area of polishing — it’s the stuff it takes many different people to notice, it’s not something you can do as roughly or quickly. Thus, looking at things in terms of commit counts or LOC counts is not sufficient. Another analogy — someone building a car. Maybe 80% of the car by mass is the engine or the frame, but you can’t stop there.
The feedback loop and that attention to the odd corners here are a power source — even if the maintainer does feel like they are doing most of the work, work is being done that they could not otherwise complete — and … if tapped into the feedback loop … they have something tremendously useful as they know exactly where their app needs to go. Plus those suprises that shift you in new directions you never imagined. That’s where we see the benefit.
So, thank you, regardless of your patch size, for your contributions, your testing, and your ideas. It is time we assess OSS contributions in the ways they excerpt influence over a project and an ecosystem beyond the realm of just code.
I would encourage people looking for the benefit of taking an app “open source” to not look just to sweeping new code contributions, but the ability to engage with users and be suprised with what may come when they least expect it, just by being open to those new ideas and additions. It can also be shown that even if a project is Open, if the project is not run openly, those benefits won’t come as easily.
So, when gauging your contributions, it’s not about metrics, it’s about stories. It’s “remember when ____ rewrote my web interface for me”, or “remember when ____ and ____ worked together to write that awesome LDAP plugin” or “remember when _____ contributed the email script”. Those are things we want to foster.
I think there is still a /lot/ of value to be had in learning about the size of contributions around OSS (via software like “lookatgit”), when they happen, what frequency they happen, etc. But I should resist creating metrics that make one project better or worse and rather create visualizations that tell that story. What I really want to learn is how it ticks, what it accomplishes, who the contributors are, and what are the stories behind those contributor patterns. It’s the butterfly wingflap theory, really.
Apologies to the long rambling to Fedora Planet. I’m trying to formulate theories in the open, other insights welcome. For truly if we do want OSS to spread as widely as possible, we do have to make a quantifiable case for it… somehow, and better refine those theories — and learn from projects that do them best. THAT is the intent with the analytics side projects. To know exactly what the contributions are so we can better talk about them. The kernel is an example, but it’s the kernel, and heck … it is dysfunctional to some extent
We need more examples and stories. I plan to turn my attention to more external-to-Fedora examples next to see how what-I-know compares to them.
Diax’s Rake from Anathem — Never believe a thing simply because you want it to be true.
I’m working on getting ‘lookatgit’ running, under Fedora 11 currently, and seem to have made progress without actually getting a run to finish.
My instructions are here:
https://fedoraproject.org/wiki/How_to_use_lookatgit
When I use those instructions to install then do ‘make compile test’, after doing ‘cd ~/cg/lookat’, it runs for a few minutes, compiles, and finishes with no noticeable output.
Figuring I needed a git repo to actually check, I cloned ‘cobbler’, made a Makefile entry, and ran it against the repo: ‘make compile cobbler’. I know I didn’t have to recompile most likely, but I did it anyway because ‘make cobbler’ just ran without stopping.
Now it’s been running for over an hour, and the output looks like this:
make compile cobbler
/usr/bin/scalac -classpath . -sourcepath . *.scala
warning: there were deprecation warnings; re-run with -deprecation for details
one warning found
/usr/bin/scala -classpath . App ~/cg/cobbler
What is missing in my documentation or ideas?
Note that I’m using the scala package in Fedora:
rpm -q scala
scala-2.7.4-5.fc11.noarch
The current upstream is 2.7.7. I’ll rework and run against the upstream tarball if this run doesn’t work …