|29 Jul 2003 @ 20:06, by Roger Eaton|
Here's another old article, with a more technical bent. Already it is somewhat out of date -- for instance, JXTA will replace SOAP as the basis for inter-nodal communication. But there are a couple very important ideas here. For one, that of using a manual scaffold on which an automatic network can be constructed. This is a key idea. The database intelligence will depend heavily on the many manually applied individual linkages. And the notion of using bottom up methods to create useful hierarchies is another key idea.
March 10, 2001 Go to original.
March 10, 2001 Got to original.
Definitions: An InterMix hub is a single instance of InterMix running on a web connected computer. Each hub has one or more database areas defined for holding messages or database items of a particular form, for instance a collection of information about nonprofit organizations in a particular city. These database areas, potentially many on a single hub, are the "nodes" of the global database. For purposes of the protocol in question, we will assume there may be hundreds of millions of hubs and billions of nodes.
We need one or two people to help us come up with and implement a simple protocol for a global database with billions of nodes, that will be searchable, and that will distribute data to where it is wanted. For now we assume we are dealing with public data, so security is not a problem, and that we are not going to take heroic measures to make everything perfect. Data synchronization is beyond our scope - that is, we will not try to control the order that information arrives at its various destinations. We assume we are not dealing with real money or lives on the line.
We will use Simple Object Access Protocol (SOAP) to transport the data between
InterMix nodes because SOAP will give us flexibility to interface with non-InterMix
There is an exciting opportunity here for the right person(s). The global database will be an indefinitely scalable, bottom-up, self-organizing, fault tolerant, loosely hierarchical network with peer-to-peer connections as well. The excitement lies in adding "hierarchical" to all those other free-wheeling adjectives. Normally hierarchy goes with a top-down view, but here the intent is to implement it in a fundamentally bottom-up structure.
For the initial design, most links between InterMix nodes will be added manually by database area managers on participating InterMix hubs. (These database area managers may well be individuals running InterMix on their home computer.) Some links may be added automatically as a way of healing broken networks. Links are always unidirectional, though usually implemented in bidirectional pairs, and either hierarchical or peer to peer. Selection rules set by the receiving node govern whether a particular database item is forwarded or not. For instance, a local HealthWatch node might want from a national site only those health advisories marked "urgent".
The main purpose of the hierarchies is to collect the InterMix item ratings and to distribute highly rated items more widely. This collection and distribution of ratings will be part of the fundamental design. The hierarchies should also be helpful in organizing fuzzy searches into the distributed database. Those searches will be for a future phase, but we will consider them in the design to reduce the necessity of a retrofit. Since the hierarchies will be constructed bottom up from many individual choices, the resulting network structure is not predictable -- the database may well self-organize not into one, but into many hierarchies.
To avoid circularity in the hierarchies and at the same time to heal broken links is the number one challenge. In keeping with our bottom up design, there will be no centralized map of the network. Instead, network information will be kept locally, with no single node having a complete picture. Each item, as it is passed around the database, carries with it a list of links traversed so far. Items are never propagated to a node they have already been sent to. When an item is received, it's history is examined for possible circularity in the hierarchy, and if there is circularity, then action is taken to have the most recently added link in the circle removed. Each node constructs its map of the network as best it can from the history of the messages it receives and uses this map to heal broken links.
Clearly there are more than a few details that need to be filled in before we can begin coding! Estimated time: 2000 hours. A team of two or three persons would be ideal. There are three major components to the job. First, filling in the design details to make sure the concept is really feasible - 400 hours. Second, readjusting InterMix internals so we are compatible with the new design - 600 hours. Third coding and testing the distributed database - 1000 hours.
this log is published at http://www.newciv.org/nl/newslog.php/_v252