Today I’ve reworked LookAtGit which was a project I originally implemented because (A) I was really bored, and (B) I wanted to learn Scala.
The new “v2″ is done in Ruby, which is nice because the regexes there don’t make me want to hurt someone and Ruby is generally awesome, and I’ve forgotten how much I missed it having a lot of corporate Python jobs. Ruby is fun.
So, lookatgit… I’ve also optimized it a HUGE amount since last time, for one — I found “git log –shortstat”, it has to execute a billion less commands and can also skip binary commits.
The rewrite doesn’t quite yet offer some of the statistics-oriented statistics (SOS) that the previous version offered, but it does offer some additional reports, arbitrary field sorting, and will enable adding lots more other reports in the future.
To get started, check out from github and read the README file in the “v2″ directory here.
You can see from those instructions how to do simple things like see the top 50 most active files in a repo, or the top 100 contributors sorted by arbitrary statistics.
Here’s an example from Spacewalk. Spacewalk is a HUGE repo, and while I’m limiting the length of the output below, the report time is spent in the calculations, rather than the output, so with a reasonable machine it will only take a minute to scan the repo.
mdehaan@snowball:~/code/lookatgit/v2$ time ruby lookatgit.rb -r ~/code/spacewalk/ --limit 10 -T -F --header --verbose scanning... processing 9534 commits... generating report... -------------------------------------------------- TOP CONTRIBUTORS REPORT name,lines_changed,lines_added,lines_removed,commit_ct -------------------------------------------------- Miroslav Suchý <msuchy@redhat.com>,1587958,143182,1444776,1650 Jan Pazdziora <jpazdziora@redhat.com>,183227,121126,62101,1428 Michael Mraka <michael.mraka@redhat.com>,196978,94669,102309,781 Devan Goodwin <dgoodwin@redhat.com>,146951,44522,102429,611 Justin Sherrill <jsherril@redhat.com>,1398046,763492,634554,587 Pradeep Kilambi <pkilambi@redhat.com>,726023,361122,364901,476 Mike McCune <mmccune@gmail.com>,39470,27793,11677,447 jesus m. rodriguez <jesusr@redhat.com>,392192,112554,279638,430 Partha Aji <paji@redhat.com>,49294,29033,20261,406 Milan Zazrivec <mzazrivec@redhat.com>,42101,22507,19594,375 ------------------------------------------ TOP FILES REPORT filename,lines_changed,change_ct,author_ct,commit_ct ------------------------------------------ java/spacewalk-java.spec,2718,246,17,246 backend/spacewalk-backend.spec,2420,235,14,235 java/code/src/com/redhat/rhn/frontend/strings/jsp/StringResource_en_US.xml,23742,202,19,202 java/code/webapp/WEB-INF/struts-config.xml,8040,147,18,147 rel-eng/packages/spacewalk-java,291,146,13,146 web/spacewalk-web.spec,1739,139,11,139 java/code/src/com/redhat/rhn/frontend/strings/java/StringResource_en_US.xml,8999,128,14,128 schema/spacewalk/spacewalk-schema.spec,726,121,10,121 rel-eng/packages/spacewalk-backend,237,119,11,119 proxy/installer/spacewalk-proxy-installer.spec,533,99,7,99 real 1m1.190s user 0m6.912s sys 0m0.340s
The next step is to build in those “statistics oriented statistics” and enhance the query capabilities. For instance, I’d like to be able to generate a report on the standard deviation times between commits, to show which developers on a given project are slacking off
. Similarly, I’d like to generate aggregate statistics on a project so I can show that Project X contributors typically commit changes with certain distribution patterns, which may or may not be revealing.
Contributors are very welcome. I currently do not have a project list, but if enough folks are interested we can get this going. It is not so much about what it can generate now but what we can generate in the future.
Very cool. Would be fun to put a front end on it, or make it a web service that analyzes github repos
I’m not working on (or maintaining) this and probably won’t for the near future, so feel free to fork/copy and do that. It would indeed be nice. I’d like github to do something like this built in.
Nice work.
I like the author summary.
I see only a small flaw on it: some authors are separated
User Name ,106936,90270,16666,1459,8.14,1.22
User Name ,3009,2025,984,28,1781.21,64.44
The git’s shortlog summary uses .mailmap file in the root of working tree, and displays it summarized (commit count only):
git shortlog -s
1487 User Name
github pull requests accepted.