| InterMix Middleware Writeup from Dec, 2002|
|9 Aug 2003 @ 17:12, by Roger Eaton|
Written before I discovered JXTA, this writeup is still the most comprehensive document for the InterMix distributed database. Collective Communication is not mentioned, but the design is such as to underpin a global scale voice of humanity.
InterMix Middleware Overview
InterMix is envisioned as middleware for Linux, Windows and Macintosh. InterMix will connect locally via multiple protocols to local data sources defined to it as “nodes”, and it will connect across the internet with other instances of itself scalably and intelligently using SOAP. An indefinite number, at least hundreds of millions, of these InterMix “hubs” will be connected in a fundamentally bottom up organization that is efficient at bringing to each node just the kind of information from the global network that the owner(s) of the nodes desire.
"Bottom up" means that actions that will directly affect a hub are always initiated by that hub. There is no command hierarchy of hubs. The basic plumbing of InterMix, the connections between nodes, known as "pipes", will be created manually by node owners one by one. InterMix Middleware will provide intelligent suggestions for new pipes, but manual control will be maintained, pipe by pipe. An automated intelligent data transfer system built on top of the pipes is one goal of InterMix, but the detailed parameters of such transfer will always be under the control of the node owners, including the ability to turn the transfer on and off.
InterMix hubs need not function as document repositories (nodes), though they may. Either way, hubs do keep metadata about documents and also directories of hub, node, network, participant and other information as needed.
InterMix hubs also perform a useful function as a buffer between participants and the extra-hub digital world. In particular InterMix is relatively spam free because information moves by request only.
InterMix Middleware instances will self-check and "prove" that they match their source code as part of the transmission protocol between hubs. The self-check is intended as a way of preventing InterMix from being hacked, which is essential for any plan to provide security for personal information.
As part of the transmission protocol, each instance will send two random strings to the other hub, to be used to build two blocks of, say, 256 bytes selected from the binary at displacements calculated from the random string. These two blocks plus the version identification of the InterMix instance will be returned to the requesting hub.
Each hub will build a library of known random strings and blocks for each version of InterMix that it encounters. At each encounter, it sends one known random string and one new random string for the version. The known string "proves" the other hub is not hacked, and the new string builds the library for the future.
In principle, there is no way to prove from a remote location that a process is pristine. In the above scenario, the InterMix process could be hacked and yet provide the proper response by passing off the request to an unhacked version. The chief problem with network security in a bottom up design, though, is not the single site which is malfunctioning, but rather the possibility of mass substitution at many sites, and the random string test should provide substantial if not quite perfect protection against the mass substitution possibility.
This discussion about security needs a lot more work. Clearly though, we need to address security concerns in the design phase as best we can, and continue to address them as the technology develops.
InterMix Middleware Elements
Each element will have a long, partly random ID presumed to be unique.
1) InterMix hubs are instances of InterMix running on a particular box with access to multiple local nodes.
Each hub has a built in hub registry with a list of all known hubs. When a hub is set up, the registry is begun with the information for that hub: ID, name, owner, owner info, IP address, IP address type (static or temporary), last active date, first active date, documents sent, bytes sent, documents received, bytes received.
2) InterMix nodes are document repositories accessible by the InterMix hub. One hub can have many local nodes, but each node is local to one and only one InterMix hub. If a node is not local then it is "remote" to a hub.
Since nodes are primarily logical constructs within InterMix, multiple nodes may share the same "physical" repository. For instance, one node might be defined as calendar events for Los Angeles, and another node be defined as calendar events for San Diego, but the repository might be an event database with Southern California events kept by metropolitan area. The only requirement is that InterMix must know how to get and/or put events for both nodes.
There is a seeming problem in the design here. How do we prevent duplicate or overlapping nodes, since there may be more than one hub on a box? But this is not really a problem. InterMix allows for this kind of duplication and overlap. The real problem is that the same document may enter the global InterMix network under multiple ID's. This problem can be resolved by indexing all documents by checksum and always pre-comparing documents against the checksum index as part of the transfer protocol. When a duplicate is found, the new and old origin dates are compared, and if the remote version is older, then the key and originating hub from the older version is used to rekey the document record. If the local version is older, then the remote hub is notified and adjusts its key instead.
Each hub has a built in node registry with a list of all known nodes. The hub's own hub and node registries are local nodes to the hub, so every hub has at least two local nodes.
Node information kept in the node registry: ID, name, list of owners with owner information, hub, number of documents, number of bytes, list of pipes to other nodes (see below for description of pipes).
Nodes may be email type nodes with send and/or receive functions, or they may be database type nodes with get and/or put functions.
Nodes may be public, semi-public or private. Information about semi-public and private nodes is never distributed to other hubs; such nodes have no pipes connecting them to other nodes. Documents in a private node are available only to a list of named participants. Documents in a semi-public node are available to anyone who knows the node exists.
"Participant" nodes represent particular participants, and are owned by those participants.
"Group" nodes represent groups of participants -- that is, they are defined by the participants that provide items or ratings.
3) InterMix participants are persons who have provided documents or ratings or have requested documents, or have simply been registered and perhaps provided personal information. (Participants may provide personal information so InterMix can further customize the selection of documents for the participant.)
InterMix assumes that a participant is related to one particular hub. A person may participate through more than one hub, but InterMix has no way of knowing that.
The handling of personal information is a difficult problem. Participants are not required to provide personal data, such as name, address, sex, age, education and so forth. However, personal information is key to automating the getting of quality information from the internet, so our aim must be to shelter the personal data and yet be able to use it effectively.
InterMix never forwards raw personal information, identifying the participant only by "handle", which must be unique to the home hub of the participant. Email to firstname.lastname@example.org is forwarded to the participant's email address, if known. Participants may use their true name as their handle, of course, but otherwise, they have to include personal information themselves in documents they publish on InterMix if they want to make that information public.
A participant may have a node just for that participant, in which case the participant is the owner of the node.
Note that hub and node owners do not have to be participants, tho if they are participants, then their participant ID may be listed in their ownership info.
Participant identification is handled by having different categories of participant: a) anonymous/unknown b) pseudononymous c) identified and credited d) identified and proved by return email e) identified and proved by digital signature. Everyone except anonymous has a password and must provide an email address.
4) InterMix Middleware documents are of four kinds:
a) freeform messages
b) simple forms (same as html forms when auto-emailed)
c) xml forms, quite possibly with embedded blobs
Document ID's are formed from three parts: date/time stamp, random digits, and a checksum (checksum length?). When documents are physically transported from hub to hub, the checksum is recalculated, and if it does not match, either the document is not accepted or the checksum is replaced. When documents are modified, the new document retains the old key except for the checksum.
Documents must have been posted originally by a participant, who is identified by hub and handle. The date of the original posting is available as part of the ID. Additionally, a document may have a title, a short description, repository key information and a modification date. Beyond that, all document information, such as language, author, geographic origin and so forth is handled as keywords. Very possibly the third portion of the key should be the modification date/time, and the checksum should be data.
5) InterMix keywords are up to 256 bytes long, are made of displayable characters only, with no whitespace except spaces – i.e., no tabs, no carriage returns or newlines. Keywords may be internally delimited to act as a keyword hierarchy. For instance, "furnishings|home" would be distinguished from "website|home" where the pipe sign is used as delimiter. The exact mechanics of delimiting need to be decided.
Every keyword begins with two required levels: 1) keyword type schema and 2) keyword type. A simple default keyword type schema will be published with InterMix. The default keyword type schema identifier is null, so keywords using the default begin with the delimiter.
Each hub keeps a registry of keyword types as a built-in hub node indexed by schema and type. Default keyword schema types are always in English. For instance, "|language" or "|word" or "|node". Thus to keyword a message as being primarily in English, one would apply the keyword "|language|en". (The default schema will use the international standard three character language codes.)
6) Ratings – an integral value in a particular rating scheme provided by a participant for a keyword as applied to a document.
Each rating scheme specifies: 1) A valid range of integer ratings storable in a single byte: i.e. –127 to 128. Thus, a valid range might be [0,4] or [-3, 3]. 2) One rating scheme may allow ratings by the anonymous participant, while another may allow ratings only by email verified or signature verified participants.
The difficult problem of preserving secrecy of ratings while making the rating system verifiable is handled by requiring manual intervention in order to verify a rating.
Each rating includes the originating hub ID and is kept on that hub with the rater's ID. Each rating also includes the participant category (credited, verified by email, or verified by digital signature). When distributed to remote nodes, ratings include their hub ID as well as their own ID. InterMix allows a hub owner to determine the handle for any particular rating, one by one. Therefore, to verify a rating the hub owner must be approached first to identify the rater by handle. Then the rater can be emailed through the InterMix hub for verification.
Such a very cumbersome system for checking any one particular rating might be combined with a system that makes the handle and rating of every thousandth rating automatically available to persons designated by the hub owner.
Ratings for a particular document are commonly aggregated across hubs, but ratings by a particular participant are aggregated only on that participant's hub. Aggregation of ratings is separately available by group.
7) A pipe is a set of rules for unidirectional information transfer between nodes. Pipes are manually created by node owners, with timing and selection specifications.
8) Offers and requests are standardized methods of setting up pipes. An offer can be published to the net, thereby setting up an automatic acceptance of a request for a predefined pipe. An unsolicited request (not in response to an offer) must be dealt with manually by the owner of the node to which the request is directed.
9) Networks are leagues of nodes defined for each node by network keywords. Network information is kept in a network directory for the hub. Networks can extend across multiple hubs.
How we build intelligence into the network
this is the next section to be worked on
How it looks to the user
(this section is barely begun)
1) There must be an interface to allow a hub owner to connect the hub to local nodes
a) As many kinds of local nodes as possible will be supported
i) Simple directory with content items as files
ii) Known database formats
iii) Intermix and other specialties
iv) Unknown formats based on protocols
b) The owner must be able to provide
i) A description of the node
ii) Keywords for the node
iii) A count of the number of items in the node
iv) Node type information
* email or database node
* rating repository - data items are ratings
* items: message, simple form, xml or binaries
11 Aug 2003 @ 17:18 by : Intermix
While I've seen this before, and while it sounds like things I might come up with, it is probably more useful to step back a bit and ask simple questions...
What is the main selling point to this kind of thing? Why would I want it?
It is a way of storing structured data in a universal way, right? So it is kind of like the Semantic Web, in a simple way. I.e. it makes certain assumptions that limits the scope, that it is XML data records, like forms. Or am I not getting it right? Can it be any kind of XML? In that case, why have the other types of documents?
I suppose the same kinds of problems are attempted solved by the people working on the semantic web. Would this kind of make it simpler, by accepting certain universal restraints?
12 Aug 2003 @ 17:17 by : focusing community intelligence
I am still posting the old documents, so not everything in this particular document is going to end up in the final design. For instance, very possibly all InterMix items will be transformed into XML documents, so there'll be no simple text or binary items.
Once all the old docs are posted then the question will be how to set up a systematic method that will eventually produce a final design that is compelling and usable for coding. I don't mind admitting I am boggled at this point. I am pretty sure, though, that going after these smaller issues now will get us nowhere.
Your question about the main selling point is appropriate, though, at this stage. "Focusing community intelligence" is a good succinct description of what InterMix is about. While not neglecting the individual or the polities, the emphasis is on radically reorganizing the global information flow to benefit humanity as a whole.
There are elements in the design that will benefit the individual participant directly once we get over the critical-mass hurdle. Specifically, we should be able to provide a daily feed of new information that is customized to the individual and ordered so the cream is at the top. And we should be able to out Google Google eventually, as unlikely as that seems at the moment. And we should be able to deliver music and video just like any p2p system, by title (which means a whole other set of problems).
Still, I think it is the community benefit that forms the heart of the InterMix appeal. And that makes the critical-mass problem a tough one. There are a couple avenues of approach.
For one thing, small communities on the web might find InterMix useful for themselves, so that could be a starting point. Another idea is to hitch InterMix to a rising star. I am thinking Chandler, perhaps, and this is a reason to go with jxta, maybe. Finally, I definitely want to implement collective communication on InterMix, so we might be able to get some action in that arena by pitching collective talks between adversarial email lists. That sounds rather fun! Might be just the ticket.
It is a tough one. Any thoughts appreciated.
17 Aug 2003 @ 13:39 by : Community Intelligence
Focusing Community Intelligence is an excellent focus. ...Hm, but, I guess I'm still searching for what it is that will make it compelling, irrestistible, a no-brainer to adopt. I don't doubt that it is a good thing, and that it will be useful, but you're also talking about something very ambitious, something that millions of people will use. And that kind of requires that it isn't just nice and useful, but that will spread like wildfire.
8 Jun 2009 @ 06:10 by @126.96.36.199 : pearl
Celebrate all special occasions.
Smile! It will make you feel better.
Other entries in Articles
24 Jun 2007 @ 23:17: Global Assembly now accepting sign ups
26 May 2007 @ 19:26: WiserEarth / Paul Hawken
18 Mar 2007 @ 23:19: Latest InterMix Design
30 Dec 2006 @ 17:53: A Nonviolent Service Arm for the Global Assembly
19 Nov 2006 @ 15:45: Global Assembly Dialog Progress Report
12 Oct 2006 @ 15:49: True Religion Creates Community
1 Oct 2006 @ 18:24: Voice of Humanity and the Information Commons?
24 Sep 2006 @ 22:12: The Outsider has a place in the Global Assembly Dialog
17 Sep 2006 @ 20:44: "Unity and Diversity" and "Unity in Diversity"
11 Aug 2006 @ 05:13: The Wedding of Humanity and Nonviolence