voice of humanity: scalable collaborative filtering network (SCFN)    
 scalable collaborative filtering network (SCFN)4 comments
31 Jul 2003 @ 19:49, by Roger Eaton

Two articles on the (tongue in cheek) SCFN from 1998. Tho, come to think of it, the concept is right on!

Subject: brainstorming scalable collaborative filtering network
Author: RogerMix [email] Created on: 1998-09-30 9:25am


we have messages and/or items
a) an item is structured content, such as a recipe
i) the structure supplies metadata automatically
ii) within the structure we can automatically find keywords
b) a message is unstructured content
i) metadata can be applied to a message by people
ii) keywords can be found in the message automatically
c) in both cases it is the online content that is of interest

we have pointers
a) pointers can point to items in the real world
b) pointers can also point to messages and items online
c) pointers have interest according to what they point to
d) but pointers also can have online content
i) pointer content is metadata about the item pointed to
ii) part of the value of the pointer is in the accuracy
and completeness of its metadata

there are indexers and authors
a) indexers supply pointers
b) authors supply messages and items
c) an author may be a collective author
d) pointers may also be created collectively
through automated processes -- then the indexer
is a system process

items pointed to also have creators, and if the item
is an online item, the creator may be an author, but
we need to keep the distinction in mind between the
creator of an item pointed to, and an author

now the interesting and tricky part comes in -- symbols

there are three kinds of reference: iconic, indexical and
symbolic -- this is originally from Charles Sanders Peirce,
the 19th century American philosopher, but has been taken
up recently by Terrence W. Deacon in his seminal book,
The Symbolic Species - The co-evolution of language and the
brain.

Symbolic reference builds on iconic and indexical references
to name a meaning. Once a meaning has been named, it becomes
free to be manipulated linguistically.

To build a scalable collaborative filtering network, we need
to automate the creation of online symbols that are built up
from the iconic and indexical references. There are several
ways to go about this, and we need to exploit them all. One
fascinating method already being pursued is the "hubs and
authorities" CLEVER system from IBM:

[link]

CLEVER extracts metadata from around existing web links to
identify hubs and authorities. Hubs are web pages with lots
of links related to a single topic. Authorities are web
pages pointed to by many hubs. CLEVER has identified about
50,000 web topics this way. This seems to me an excellent
beginning toward identifying the symbols of our scfn.

-------- end of first scfn article from 1998 -------------

Subject: Re: brainstorming scalable collaborative filtering network
Author: RogerMix [email] Created on: 1998-10-01 9:44am

Reply Thread

in the previous message I began developing a scfn taxonomy

on rethinking, I see the difference between pointer and item is
not enough to distinguish the role the item-poster plays. An
online recipe, for instance might just be provided by a cataloger
from old recipe books whose copyright has run out, and not be the
creation of the poster.

To clear up the confusion, we need to allow the source of an item
to be specified. Messages we can assume are the creation of the
person who posts the message. But structured items should
tell us what their source is, with the default being the person
who posted the item.

Pointers perhaps should also have a source in case the indexer
wants to disclaim credit/blame for the validity of the metadata
being supplied. For instance, the contents of a small card
catalog being hand entered, should not reflect on the
reputation of the indexer.

So now we have the following:

items
a) unstructured messages
b) structured catalog items
keywords
metadata supplied by annotators
pointers with metadata
imbedded links with implied metadata
sources
authors
indexers
annotators
symbols

With some work, these can mostly be specified as objects for
programming. Symbols are not yet defined, but we have an
example in CLEVER at

[link]

Now we add ratings to the mix. Ratings fit very well with the
RDF metadata language built on XML.

RDF uses the simple basic structure of resources that have
properties that have values. I believe that RDF allows
properties to be treated as resources, as well.

Using RDF, a rating could be a measure of how well a property
describes a resource. For instance, a movie could have a
property of being "educational". On a scale of 1 to 10, how
educational is Rambo vs Saltmen of Tibet? Maybe -3 vs 11!

Of course ratings reflect the rater as well as the item being
rated.

A "profile" of an item is constructed from its properties and
the ratings of its properties.

A profile of a rater is constructed from the raters own
properties (age, sex, education, nationality, etc) and from the
ratings given taken together with the profiles of the items
rated.

Now -- here's a big idea that still needs a lot of development:
we build a semantic space that has a metric on it. A metric
means that there is a distance between any two members of the
space. A numerical distance.

First we express the profiles of items as vectors where each
dimension represents a keyword, and where the value of the
profile for that keyword is determined somehow from the
properties and ratings of the item. The predefinition of the
dimensions is a difficulty that we bypass for now.

Then from a large random sample of profiles, we do a regression
analysis to reduce the dimensionality to something more
manageable. This means a big number crunching task, which
becomes dauntingly large as the length of the vectors increase,
because we have to invert a matrix of that dimension as part of
the regression analysis. Once we have done the big number
crunching, from then on, we can feed the long original vector
of the item into a process and quickly come out with a much
shorter vector which still captures almost all of the
information of the original vector.

What the regression analysis does, is give us a much more
efficient representation of the profile of each item by
combining the correlated factors so that each dimension in
the resulting shorter vector is unrelated to the other
dimensions. For instance, if in the original vector we had
one dimension for "ship" and one dimension for "ocean" there
would be a correlation that can be captured to make a more
efficient representation of the semantic factors in the
item.

Now we use the efficient shorter vector as the basis for our
semantic space.

One great thing we can do with this semantic space is to find
the position of each of the symbols that CLEVER determined for
the web. CLEVER is great at finding the major topics of the
web, but it does not find the relationships between the topics.
Our semantic space fills in this important gap.

Now when a new item comes in, we analyse it by keyword and
whatever metadata was supplied with it to determine its position
in semantic space, and we then determine the nearest CLEVER
topic and assign it there.

There are so many faults and missing pieces to this analysis,
but for those who have followed in painstaking detail, I think
you will realize that it is the notion of a semantic metric
space that makes scfn doable. Not at all easy, but definitely
doable.





[< Back] [voice of humanity]

Category:  

4 comments

31 Dec 2008 @ 00:03 by 戸田恵梨香 @61.210.115.201 : thanks
nice site. thanks.  


20 Mar 2009 @ 06:22 by Gary Winnick @116.71.158.129 : Hi
Great job keep it up.  


20 Mar 2009 @ 06:23 by Salt Lamps @116.71.158.129 : Hello
Great info i got here. thanks  


8 Jun 2009 @ 06:11 by jewelry @218.19.53.159 : pearl
Celebrate all special occasions.
Smile! It will make you feel better.  



Your Name:
Your URL: (or email)
Subject:       
Comment:
For verification, please type the word you see on the left:


Other entries in
24 Jun 2007 @ 23:17: Global Assembly now accepting sign ups
26 May 2007 @ 19:26: WiserEarth / Paul Hawken
18 Mar 2007 @ 23:19: Latest InterMix Design
30 Dec 2006 @ 17:53: A Nonviolent Service Arm for the Global Assembly
19 Nov 2006 @ 15:45: Global Assembly Dialog Progress Report
12 Oct 2006 @ 15:49: True Religion Creates Community
1 Oct 2006 @ 18:24: Voice of Humanity and the Information Commons?
24 Sep 2006 @ 22:12: The Outsider has a place in the Global Assembly Dialog
17 Sep 2006 @ 20:44: "Unity and Diversity" and "Unity in Diversity"
11 Aug 2006 @ 05:13: The Wedding of Humanity and Nonviolence



[< Back] [voice of humanity] [PermaLink]?