What’s the future of data interchange over web services? Currently, I believe too much work is required of clients to utilize remote applications at the level of capability provided by a local API. Basically, it all comes down to speaking using a very narrow wire. It’s like assembling a jigsaw puzzle by sending written letters back and forth to someone in a different country — you give the orders, but only the other person can see the pieces and move them.
Primitive datstructures include strings, numbers, hashes and lists. It is upon this foundation that most web services are based — whether XML based (REST, SOAP), or JSON, these are two dimensional text entities that don’t represent more complex data very well. Well, they could just contain a flat graph topology, sure, but it’s not natural, and it’s verbose.
My theory is that in designing web services for humans (so that things are readable/printable, and their various bash/Perl scripts can operate easily over an XML DOM or XPath, without deep modeling of the underlying business logic or data models in client code), we are invariably designing our data structures in only two dimensions, and probably reducing the number of links and data structures that reference objects in ways that makes the data less useful than it’s natural state. In the the case of SQL databases, we probably spend a lot of time writing code to move between an ideal database representation and the web services representation. In the case of NoSQL databases, we let our web services representation unfortunately limit the best possible way to access the data! The humans have told the computers what to do.
When modelling, say, a list of US cities and connecting roads and airport flight paths in a web service, you quickly get a graph with a lot of deeply interconnected parts. Expression of that graph is, fundamentally, an expression of nodes and edges. Do I want to print a list of cities and each of their neighbor cities? What if I want to produce a list of cities that are 1-2 cities away? What if I want to print a list of all cities in North Carolina? Or cities that are 50 miles away from cities in North Carolina. As we deviate from the need of simple data access, eventually the client application will need to “think” about the data, rather than getting the answer from a simple DOM traversal or XPath query. That is expected… but if that code is accessed by 5 different programming languages, that’s 5 different times that logic will have to be written. Ouch.
What if the thing we “got” for a web services request actually contained multiple datastructure views and indexes into the thing we were querying, so that the client program did not have to calculate? If web services APIs were instead written for machines, and client libraries were distributed to allow more rich access to a software system, our programs would not be likely limited by the need to represent their best internal structure as something close to the representation of the API layer.
To do this, we really don’t even need human readable protocols — but we *do* need to standardize on a more or less standard way of object to binary marshaling. Think is, we did this before (probably because CORBA was a square wheel?), but we threw this out. Hmm. I think we can do better though.
XML or JSON structured data is returned in a way that is intended to make it easier for a particular set of access use cases. What if when requesting a complex topology, I actually got a section of the database that I queried, complete with indexes and various foreign key relationships? Wouldn’t that be easier than traversing it with a XML DOM implementation to build up my own internal model of the original database model, so that I could think about it, and then get my result?
Unfortunately, this sounds a bit like I’m requesting the return of the 1990′s, where it was permissible for a client application to have it’s own database handle. You don’t want that, but I might like it to be able to export tables for the subset of the database matched by a server-cleared search pattern.
Anyway, the theory is this — humans are making us think about data in two dimensional views, and is not good to store it that way. However, the easiest way to present data is just like it’s stored. Further, it would be nice if client applications, in the age of web services, could be as dumb as possible. To do this, they need rich data, without the need to re-model relationships needed to re-represent those 2D datastructures in more complex forms.
Some sort of hybrid between a database handle, web services, and application remoting (RMI, drb, etc) is an interesting idea. Clear my app for the data it wants to access and let me access the raw classes and exported object methods, but limit what I can do based on some rather complex RBAC policy. Basically, I’m not convinced the older client/server methods were busted. They just need to be given some security extensions for the open internet and presented in a different form. The internal API and storage of the app is usually the closest thing to the true data model of the app, and has richer capabilities … finding a way to give web users access to parts of that, safely, is the ticket, and then we don’t have to do all the emailing of the letters back and forth to work on that puzzle — I can just become that remote person working on that puzzle, and use his hands and eyes, as long as he agrees with what I’m doing.
(P.S.: if you think I’m talking about Thirft, ProtocolBuffers or something like that, I’m doing a pretty bad job of communicating what I’m talking about, because those don’t do what I’m talking about.)


