On Python Vs Ruby

Posted: June 15, 2011 in Uncategorized

Lately I’ve had some interesting random conversations with lots of folks who do a fair amount of hiring. Recruiters, HR folks, etc. Basically the analysis was that there’s really not much Triangle/RTP job market for Ruby outside of web site development — there is some, of course, as I know some friends who are doing it, but it is still very small. Meanwhile, I’ve had 2 almost-exclusively-Python jobs here and am about to have a 3rd. I’ve come to a few conclusions that the Ruby crowd (via ycombinator, reddit, etc) is perhaps just exceedingly vocal. Given, there are a lot of serious projects in Ruby as well (Puppet & Chef, obviously), and there’s nothing wrong with picking something new and what you like best. Here’s my take, after doing a lot of Python, and some various (unsuccessful) attempts at Ruby to seduce me long term:

(Aside: we’re still a huge Java/.NET market here, and being able to work in something else I’m glad is still possible, and for the same reason, I’m really glad the alternative-JVM language space is heating up).

Things I like about Python:

* lots of traction in enterprise computing (Canonical, RH, Zenoss, OpenStack, Google, etc…)
* strong amount of modules available for a wide variety of tasks, many widely already packaged in your Linux distros
* first-class functions without having to wrap them in a Proc
* less syntax, indentation system, and ability to leave off trailing “ends”
* code written the first time is often the final version and doesn’t have to be rewritten
* very readable to non Python developers, easy to encourage them to pick up the language to patch code (well proven!)
* indentation system allows things to be more skimmable versus code on same lines, etc
* list comprehensions
* in the web space, Django’s built in admin views are awesome, and it’s very stable/compatible
* no trailing if/unless (I sometimes like it when I do it, but often not when other people do it, so that means I shouldn’t like it)
* feels minimal
* nice C API

This I don’t like about Python:

* the regex module (though I more or less agree it’s best to not use regexes when you can) is a little weird
* eggs and easy_install
* raises exceptions on error conditions versus returning None a little too much (hash['foo'] vs hash.get(‘foo’,None)), which makes code more complicated
* docstring syntax versus just commenting with “#” (minor)
* occasional inconsistent core library usage of CamelCase and with_underscores
* having to manually specify self (minor)
* map/lambda syntax is a little crippled, but I’m starting to agree with Guido about list comprehensions being better and that named functions are important
* a few functions that operate on duck typed things versus being methods (len, etc)
* somewhat confusing jumps between views/iterators and things that return lists between various versions of things
* unicode has been annoying in the past (basestr, etc)

Things I like about Ruby:

* block syntax can be fun
* has the regex module from Perl, more or less
* built in classes like Array have lots of useful methods
* can often be more concise in terms of chaining functions (which can also be bad)
* rspec has some clever capabilities

Things I don’t like about Ruby:

* “module of the week” mentality… 7 different libraries for Markdown conversion, Passenger vs Unicorn vs Mongrel vs…
* Proc to get the equivalent of first class functions
* very web development centric culture, almost one and the same with Rails, if you’re not a web developer
* culture can get too obsessed with how pretty the code looks versus larger problems
* tendency for modules/tools to monkey-patch other modules, which is reckless
* gems
* lack of namespaces
* confusing changes between versions
* foo! and foo? syntax — it’s hard to remember what modules have it and which don’t
* way too much syntax in general
* attrs and implicit behaviors thereof, I just don’t like them
* trailing () on functions are optional, making it hard to tell what’s a function and what’s an attribute
* typing out “end” reminds me of basic
* syntax errors from compiler error messages are often poor, gets confused about what line something is on
* If you’ve seen Perl’s MooseX::Declare, and write to best practices, it’s not much more than Perl.
* “:” to get symbols
* implicit returns

Ultimately we all have to use eleventy billion different languages, and can’t always work in the ones we like. Nothing’s perfect, though some things fit our brains more than others. Anyway, I tend to see a lot of posts from the Ruby crowd, and I think the Python crowd is more quiet… probably because we’re working on different things (in the Rails case, I mean, less of it is about web applications), or maybe because we just like to think differently — my first real job was a lot of C. Often apps have to support things longer, and are less interested in change. It might take a bit more to excite us. The likelihood is it’s the application that’s interesting, not the code behind it, while I think the Ruby crowd is very enamored with the code layer. I’m interested in what I can build out of it, and how easy to maintain that is. I want it to be very easy to work on, but when I’m away from something, I want to think about the capabilities, not the lines of source. So, to my brain anyway, the tool that allows my to write something high-level that’s exceptionally readable several years later is the one I want, even if it doesn’t tend to encourage exciting code manipulations. Individual lines of code and how clever I can make a map/inject/block-thingy… not so much. It also seems with Python there’s a bit more of a “to the metal” feel of things, namely if you want to get inside the classes at a very low level, things like metaclasses allow for a nice level of exposure into the internals… but only if you do need them. That’s pretty cool, I think.

Python might be nicer if it had blocks, but I think in 95% of the cases, “with” and “list comprehensions” are good enough, and that keeps things simpler. I also mostly find my blocks usage is usually limited to each (which list comprehensions handle) and the occasional transaction like call (which “with” handles), so I should quit letting myself be seduced by them. Python lets me maintain the whole app easier, IMHO, and if I’ve got one function or two that is not entirely beautiful at the source level, that’s not a major concern, because most of it is… and it’s obfuscation resistant, making it work much better in group contexts.

I like to use GraphViz a lot to visualize unknown relationships in things. The problem is, most of the time, large graphs do not lay out well. The most common directed graph layout engine “dot”, tends to become very horizontal when there are a lot of items at the same “level” of relationship with other objects.

The answer is to use “twopi” instead of dot.

Also, use some “twopi” specific configuration options in your graphviz DSL:

digraph foo {
    overlap=false;
    splines=true;
    nodea;
    nodeb;
    nodec;
    nodea -> nodeb;
    nodea -> nodec; 
}

The example above is not large enough to need twopi, but this shows how the directives are added.

overlap will ensure, somewhat obviously, that nodes do not overlap. Splines will allow lines to curve to try to overlap less.
twopi is also a “spring model” so it attempts to lay itself out in the most logical possible way.

The only downside I can really see to “twopi” over “dot” is that clusters don’t work… if they did, it would be very compelling.

ORMs (like ActiveRecord, sqlalchemy, or Django) typically provide nice hops across objects in related tables in one of two ways.

(A) They ask you to define it (ManyToMany, OneToOne, etc)
(B) They scan the database and infer it according to conventions.

Sometimes these cause extra lookups (which really isn’t a problem most of the time, and this one has more caching skills than most), though if your ORM is smart, they don’t. I wrote one that originally caused extra lookups. So, how to fix this in a legacy database where the metadata isn’t there, and foreign keys aren’t even there in most cases? Where you don’t have a good metaclass system that describes relationships? I might want a function that gets back a Foo and it’s associated Bars in one place, and a Foo and it’s associated Splats in another place.

Friday, I came up with a syntax for my data-layer that almost approaches map-reduce for SQL databases. I’m not particularly proud of even having this problem in that it it would have been better if I *had* that metadata, but I didn’t. The result, I think, is somewhat interesting. As with some previous posts, this is Perl. This will be my last Perl blog post as I’m going to be back in the Python game very very shortly (w00t). Here’s the syntax, which should be comprehensible even if Perl’s not your thing:

 my @foos = Elevator::Util::Joiner->join(
        unite => [
             [ 'foo' => 'Acme::SqlFoo' ],
             [ 'bar' => 'Acme::SqlBar' ]
        ],
        where => {
              'foo.bar' => 'bar.id',
        },
        stitching  => sub {
             my $objects = shift();
             my $foo = $objects->[0];
             my $bar = $objects->[1];
             $foo->bar($bar);
             return $foo;
        }

What we have above in the “unite” is a list of two classes, and alias names assigned to each. This defines two tables defined in a more or less arbitrary join operation. The “where” clause at minimum must specify the join, but could also include other conditionals to select only parts of the data. The query here is done using the Perl SQL::Abstract module. What I think is *perhaps* innovative here is the stitching function, which from the join results on the two (or any number thereof) of tables, returns a list of only ONE type of object, with the associated objects “stitched in”. This is somewhat analogous (or at least it feels that way to me) to a map-reduce type operation, and we could support the notion of returning undef/null to limit the return of the datastructure from the result set. The code is free to act on the result set in any arbitrary way it wants.

For those wishing to see the implementation, it contains some notable unpolished evils, namely a workaround for Perl’s DBI database accesses not returning column names properly — so it’s a little hacked up, but you can see it here on github. Obviously this is much less eloquent than something like how Django can express things, but it is also generic and pretty extensible. It did all that WITHOUT having the metadata system there.

One to many relationships are a little rough right now, what would happen now if you had an A with 10 B’s, is you’d get back 10 instances of the same A. So what we would need to do would be to build a function that took the results of the “go” operation and consolidate them back down. So we have a “stitching” function, but instead of returning one type of thing, it returns lists. And then we have a reduce function that takes this data (with redundancies) and converts the multiple items into singles.

Friends don’t let friends write their own model framework.

github link to Join class

Su-su-studio

Posted: May 28, 2011 in Uncategorized

Here’s a picture of my electronic music rig, in what I think so far is it’s best working configuration.


(Click for full size).

The Elektron Machinedrum (left center) provides MIDI sequencing, as well as having two configured outputs for it’s built in synths. Output 1 goes into an OTO Biscuit (left of Machinedrum), and Output 2 goes straight into the mixer, which is a just a cheap analog Yamaha. There’s an unused third output reserved for future abuse/enjoyment. In an Analog Synth territory, there’s a Moog Voyager (in the background) and a Future Retro XS (center, right of machinedrum). Everything is mostly hooked up in mono, except for the Blofeld (bottom left). The Waldorf Blofeld is connected directly into an Eventide Space (bottom right), providing background pads and such — while leaving other tracks basically reverb-less. There’s also room left for a Vermona Kick Lancet, which is on the way :)

I’m almost out of table space, so this setup is more or less complete, barring maybe a few more effects. There’s also a lamp!

So it turned out my Intel laptop wasn’t running still Anticlimatic Anteater after all and was instead running Fedora 13. Ok, cool, I said — I decided to pre-upgrade it to Fedora 15 (beta) and am… well… soon to be switching back to a release involving Fuzzy Animals again as my dom0 and desktop environment.

Preupgrade nicely smashed my SELinux install, so that I had to boot with selinux=0 (minor problem) and also left invalid entries in my /etc/yum.repos.d (super minor problem). Though you would expect this to more or less work.

However, Gnome Shell — what happened there? Trainwreck.

Despite the nice skinning of the bar at the top, Gnome shell is pretty rough. There’s no way to switch between applications using that bar at the top — you have to click “activities” to change to a different view. Windows have no visible maximize or minimize button, just a close “X”, and Firefox comes up unminimized by default, wasting valuable screen real estate. To maximize it, you must right click the grey area of the screen (for which there is far too much wasted grey area), and click “maximize”. To fire up a terminal, I have to press activities with the mouse, move my mouse to the search bar, and type “terminal”, or add it to the “activities” favorites. This is more or less a “start menu”, but it’s only accessible when I leave my current Window view.

Rather than being similar to a minimal tiled window manager (which Ubuntu’s netbook distro was so good at), Gnome shell in Fedora 15 doesn’t go minimal enough, but still hides features developers want, and actually takes away so many features that it becomes hard to pitch to less technical users.

I think the answer here is (A) UX research, (B) leadership, and (C) to realize and cater to the core audience — which I would say, for Fedora, is people who want to try out packages before they make themselves available in CentOS, RHEL, and Scientific Linux. AKA, developers. Ultimately though, a lot of those folks, like me, don’t really need the desktop experience at all — we’re coding for the browser and the server. And the reason I’m really not using a Mac as a VM host is my laptop isn’t a Mac. That’s kind of sad, but it’s true.

In the era of the diminishing need of the desktop, a nice way to jump between my browser, email, and terminal is still pretty important to me.

I would love it for Linux to go after Apple and make some really compelling environments — there’s huge opportunity there. Though I don’t think that is to be done by taking away buttons and features. You have to keep that developer audience as those are the folks who will be working on your desktop.

Amococo in Raleigh

Posted: May 20, 2011 in Uncategorized

In case this got missed on Twitter, if you live around Raleigh, check out Amococo this Friday/weekend by Architects of Air. Details here. I think I first noticed one of these things on Flickr 4+ years ago. I need to live in one.

You will need $5 and socks. Click the thumbnail for some of my pictures.

About that Scaling Thing

Posted: May 14, 2011 in Uncategorized

Not that I should give Red Hat Commercial Management Software any free advice about product strategy, but I might as well defend my family honor a bit since I know some folks who go to Summit still read this blog. Topic: Satellite’s dropping Cobbler. I probably wouldn’t have found the above linked presentation but someone mentioned it to me and I found it a bit amusing. To be honest, I don’t care what people choose to build their own software out of, but I do care about the way it was brought up, as those considerations may confuse existing users by spreading somewhat inaccurate data. In this particular case, the implication that I wrote something that “didn’t scale”, despite the fact that I know it’s running some rather important stuff. Like banks and stock trading. And the space program. Ok, not much of that last part. But some. Anyone is free to drop anything, anytime they like — but if they talk about in public, they need to make sure what they are saying is true.

  • Let’s not spread FUD about Cobbler not scaling. If you care about Cobbler scaling, you could switch to the CouchDB backend. It’s already there, and I wrote it during a single meeting. Or pick Riak or Mongo. You could build a backend in a few hours — the Mongo API is awesome. Riak is altogether awesome. Not even using these backends, I know of a couple of installs with 20k+ nodes. There’s really no reason to have a relational datastore when NoSQL will do the job just fine, no DBA required. The reason I never wrote a database backend for Cobbler was simple — it didn’t need it, and it would probably be slower.
  • Using with the JSON backend has some advantages with git (reverting configs!), and since everything goes over XMLRPC to the central daemon, there is no startup cost in present versions.
  • IIRC, command line execution time speed has been solved since Cobbler 1.6 (or was it 1.8? Too long now). — It’s not their fault, but satellite was often downlevel.
  • Want cross platform? Cobbler is already supported by SuSE and Ubuntu and has even helped in some SuSE to RedHat migrations (and could work in the inverse!). I had wanted Red Hat to do an offering something like Ubuntu Orchestra, but it unfortunately couldn’t be made to happen.
  • Multi-subnet support and replication are already present, and there are some rather compelling examples of this freely available on the internet, covering, say, good parts of Europe.
  • Cobbler has puppet integration. It’s just that Satellite never really surfaced or integrated with it. Given, it doesn’t push this integration on you — because you may have another CMS. If you don’t have one, though, pick Puppet — it’s my favorite — though I’d consider running serverless
  • Is the new XYZ “on steroids”? Doubtful. Most of professional experience with the other tool was from end-users telling me it was unreliable. Steroids are bad for you.
  • If someone is switching, make sure they can integrate in the same amount of ways and do the same things. Cobbler is not unlike a modular synthesizer, there are lots of points to plug in without changing the code — not just the API. It can drive or be driven. There’s over 5 years of history from hundreds of ace admins in there.
  • As an OSS project, any features seen in a competing project that you want can be added to a new project without causing users a migration path.

What Satellite is basically suffering from was a suboptimal Cobbler integration, where they only surfaced about 25% of the features, was somewhat buggy in the adapter layers, and tried to hedge it into an abstraction that wasn’t appropriate. Did that not scale? Sure. But let’s not say Cobbler didn’t scale because Satellite’s API integration with Cobbler was not what it was with, say, other people who did the same kinds of integrations in-house. This is no knock at the developers — it’s just true and they had limited time and had to contend with a legacy application. I know how that feels. It’s ok to be excited about a future direction for any product, but it’s also important to not include things in marketing slides that aren’t true, or in this case, aren’t new, where they may mislead customers and make them think they need to switch technology because they aren’t using some new hotness. In those cases, there are *quite* simple ways to tune things if you’re concerned about scaling or need to do multi-subnet management. Existing users of the stand alone project, if they are happy, should realize they can keep using the open source project directly, if they so choose. It’s also easy to export your cobbler config and drop the commercial GUI portion if you wanted to keep using Cobbler. There’s so much community effort in projects like that, no one owns it. As for what product is best for them — the original or the (what I view as a) NIH project, they can decide.

It’s different.

Backstory: I’ve had a long love-hate affair with test modules. Python offered many ok options (nosetests), but the syntax was annoying, and the comparison tools weren’t just right. RSpec had a lot of weird things in it but also had some niceness, yet a bit too much automagic. Perl was really in the worst shape, in my opinion, as it had things like Test::More, Test::Deep, Test::Hardness, and OmgWeHadABabyEatsABoy — but the output was always insufficient for my tastes. Let’s not talk about JUnit. Despite writing decent OO code in Perl (thank you Moose and MooseX::Declare), the tests started ending up too procedural, with lots of boilerplate, and the lack of using classes in defining the tests did not tend to encourage inheritance. I wanted a good Moose based test module but didn’t really find what I wanted. I also didn’t like the comparison functions and the way I had to label test sections — I really just want to return a success/failure code most of the time. Why do I need to write some much boilerplate? I don’t really care about interop between Perl test modules, so why do I even need Test::Harness at all? I don’t. This finally made me want to pound something out to make something better, so I did. It’s evolving. It’s still young yet — but it’s usable as a starting point right now.

Anyway, it’s this.

In short, Procrustes (hey, it starts with a “P”, just like Perl… win) allows writing test code that looks something like this:

$self->describe("Another test suite...", [

        "Non zero return code should fail", sub {
            return 2 == 3;
        },

        "This will tell you why it is wrong", sub {
            return $self->is_equal(2,3);
        },

        "Inequality test will pass", sub {
            return $self->is_equal(2,3,0);
        },

        "This should work but takes 2 secs", sub {
            sleep(2);
            return 1;
        },

        "This will cause a traceback", sub {
            die "kaboom!\n"
        },

    ]);

And you get some awesome log summaries of what you test, complete with reports at the end of which tests had failures, and how many. Along the way, tracebacks are shown and kept sane — Perl doesn’t do tracebacks by default, so that’s awesome.

It’s designed to (A) make tests feel a little more natural to write, and (B) make test output really good so if you weld your tests into a Continuous Integration system like Jenkins or Cruise Control, the reports you get are really exactly what you want to see in the output. Mostly the latter.

A Cucumber-like side piece may be coming. Maybe. Perl’s a bit of a bitch to extend to make stuff look all super-elegant, so there’s a bit more syntactic cruft than I would like… but the again, I think it’s reasonably solid too.

See github, patches and ideas welcome. I hope you like the syntax and minimalism. If not, I believe we can adjust your measurements.

Keeping OO Functional

Posted: May 3, 2011 in Uncategorized

Let’s Do Less Work

A few days ago I posted that converting between datastructures felt like it was *most* of what programs do, and, to me, functional code appears to have to do a lot LESS of this.

I like writing code that works on *lists* in cases where I can get away with it. When I look over various programs there’s just tons of code that deals with converting between this datastructure and that, looping over this and that, rather than defining these logical pipelines that just say what something does. Way too much work is done.

Hypothesis

(1) in languages that offer them, grep and map are super-incredibly powerful because things that work on lists work on things that work on lists work on things that…
(2) in order to maximally use grep and map, the preferred data structure for everything is a list.
(3) while 10 years ago I thought nested hashrefs were awesome, they are not so awesome, because grep and map are more awesome
(4) using hashrefs for sets is ok, because you can extract keys/values in the hash, but looping over hashes kind of sucks

An Example

Assume the following code, which returns what tribbles are of a certain color, given a list of tribble objects. For purposes of not making it too easy, this is Perl, except I’ve added in the “method” keyword from Moose. Pretend it’s pseudocode or something:

    method tribbles_by_color($tribbles) {
            $retval = {};
            foreach my $tribble (@$tribbles) {
                  $retval{$trible->color()} = [] unless defined $retval->{$tribble->color};
                  push @{$retval->{$trible->color()}, $tribble;
            }
            return $retval;
    }

    method tribbles_for_color($tribles, $color) {
            my $by_color = $self->tribbles_by_color($tribles);
            return $by_color->{$color} || [];
    }

This is really just

    method tribbles_by_color($tribles, $color) {
          return grep { $_->color() eq $color } @$tribles;
    }

That’s grep. Similarly, we should be able to do things like:

     method happy_tribles($tribles) {
           return grep { $->is_happy() } $tribles;
     }

     # this could be more OO but it's just a simple blog example
     my $blue_happy_tribbles = tribles_by_color(happy_tribles($tribbles), 'blue'))

What about the birthdays of blue happy tribbles?

     method birthdays_of_tribbles($tribbles) {
           return map { $->birthdays() } @$tribbles;
     }
     my $bbht = birthdays_of_tribbles(tribbles_by_color(happy_tribbles($tribbles), 'red')))

What about the count of Blue Happy Tripples who have Birthdays in June?

    scalar @{ grep { $_->month() == 6 } $bbht }; 

What’s cool there is the methods stack in any order and can be used in any number of ways. They allow future unforseen usage. Assume instead you wrote a method that partitioned the list of tribbles into a hash that was by date, and another by birthday month, and another by whether they were happy or not. Now the code to find out the intersection of these sets would need to loop over and consult all of these intermediate data structures. That’s a lot of work! You’d probably end up writing 5-10x as much code.

With the grep/map approach, despite having something object oriented, your code ends up looking a little like lisp. A lot like Lisp. Paul Graham tells me this is awesome. However, I believe a lot of really uber-short functions is a great thing, and that’s really my end goal. If I can read a line of something, I can tell it’s right… as a method grows, that’s less likely the case.

Evidence

Our proof that this is a good approach is basically:

* Happy tribbles was a one line method
* Tribbles by color was a one line method
* Happy tribbles by color was a one line method/call.
* So was the birthdays of the happy blue tripples
* So was counting the happy blue tripples born in June.

Hashes Considered…

Thus, where lies my opinion of the simple hash?

A hash is a great way to store a set of key/value pairs in an array element. Otherwise, they are lightweight objects for when you do not have objects.
Hash interchange, or worse, nested hash interchange (or nested hashes/arrays/etc) is always difficult because you have to think about what the datastructure is, and convert between it, and there are numerous errors to make. Or the code becomes non-obvious. On the other hand code that is largely based on map/grep is often full of one liners…and at least in practice, this kind of code just doesn’t have as many bugs. Code review becomes super-simple. Bugs are mostly proportional to the number of lines of code you write.

And if you’re working in a langauge where the syntax is a source of guess-and-check frustration (such as Perl, as compared to something more terse, like Ruby or Python), having less syntax to type means you can code faster because you’re checking for syntax errors less.

Once you mix in things like list libraries that allow functions like “max” or “unique”, you have even less code to write. You should never have to write code that loops over a list and finds the maximum of something. I probably have done that fifty zillion times — I think fifty zillion is enough. Let me just call:

    my $highest_element = max { $_->foo() } @$list;

It could also be said that these data structures also lend themselves better to recursion, but I don’t care so much about that.

In conclusion

  • Code that builds up a hash in a loop can probably be simplified.
  • Hashes are ok as named tuples.
  • It’s nice to try to write as few loops as possible
  • Nested datastructures are bothersome. Code must be generated to build it, and then extract meaning from them.
  • Lists of objects allow the ability to have key/value pairs (object data and methods) along with list traversal, ergo hybrid OO-Functional code is great.
  • It seems the way universities (I’m saying this, because it’s the way I learned) have taught OO in a very procedural way, when it’s actually possible to teach it in a functional way.

    In a future post, how to make OO code be less procedural by reducing side effects, or: why most Java developers don’t really understand what OO is supposed to be. We should liberate it by introducing more functional aspects — and we don’t have to give up objects to go there.

Previously I worked on a Ruby-based docs project that had a project that was a lot like Rails Guides, but generated docs from Markdown instead of Textile. Now, I very much like these ideas, and I prefer Markdown (and not needing Rails), but — I want something simple and generic.

What I really wanted was a directory full of markdown files that I could, at will, turn into a directory full of HTML files very quickly, with a very low entry point for people adding docs — because, the harder it is for people to add docs, the less docs get built. I also wanted it to be very simple to edit the format of the docs themselves, and to understand the program generating them. The idea is any project has a directory called “docs”, and it’s full of markdown documentation that gets versioned along with the app. Github already does this for you, which is nice, but … not everything needs to be on github.

While I have a short list of TODO items I’d *like* to add to this project, it’s essentially complete in a half hour of hacking, and here’s the result: Mortimer — a really simple developer docs generator.

Patches very welcome, see TODO.md for some ideas. I think this has a lot of potential for some good OSS upgrades.