LISTSERV mailing list manager LISTSERV 15.5

Help for DUBLINICA Archives




Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font


Join or Leave DUBLINICA
Reply | Post New Message
Search Archives

Subject: DublinICA@UCD Friday, Aug 19, 3:30-5PM, Eng234
From: Scott Rickard <[log in to unmask]>
Reply-To:Dublin-Area Independent Components Analysis (ICA) Interest Group" <[log in to unmask]>
Date:Tue, 16 Aug 2005 13:10:13 +0100

text/plain (72 lines)

The sixth DublinICA seminar will take place this Friday, the 19th of August
and will feature Tom Melia and Susanna Still talking about source separation
and clustering. As keeping with tradition, visitors from Hawaiian
universities will be allowed to speak as long as they like (as long as it's
about an hour) - thus the 1.5 hour meeting.

The seminar will start at 3:30pm in Room 234 in the Engineering Building at
UCD. Transportation info can be found here:
(Engineering is building 21 on the above map). There will be
coffee/tea/cookies from 3:15. 


Tom Melia (UCD)

The DESPRIT Source Separation Algorithm


Susanna Still (Dept of Computer Science, University of Hawaii)


An information theoretic approach to clustering and complexity control.


I will give a brief introduction to clustering / unsupervised learning
within an information theoretic framework, and then I will discuss the
important problem of complexity control within this framework.

Clustering provides a common means of identifying structure in complex data,
and there is renewed interest in clustering as a tool for the analysis of
large data sets in many fields. A natural question is how many clusters are
appropriate for the description of a given system.

Traditional approaches to this problem are based on either a framework in
which clusters of a particular shape are assumed as a model of the system or
on a two-step procedure in which a clustering criterion determines the
optimal assignments for a given number of clusters and a separate criterion
measures the goodness of the classification to determine the number of clusters.

In a statistical mechanics approach, clustering can be seen as a trade-off
between energy- and entropy-like terms, with lower temperature driving the
proliferation of clusters to provide a more detailed description of the
data. For finite data sets, we expect that there is a limit to the
meaningful structure that can be resolved and therefore a minimum
temperature beyond which we will capture sampling noise. This suggests that
correcting the clustering criterion for the bias that arises due to sampling
errors will allow us to find a clustering solution at a temperature that is
optimal in the sense that we capture maximal meaningful structure-without
having to define an external criterion for the goodness or stability of the
clustering. We have shown that in a general information-theoretic framework,
the finite size of a data set determines an optimal temperature, and we have
introduced a method for finding the maximal number of clusters that can be
resolved from the data in the hard clustering limit. In my talk, for
simplicity, I will focus on this limit.

If there is remaining time, I will discuss how the very frequently used
K-means algorithm can be derived and understood from information theoretic

Relevant publications:

S. Still and W. Bialek (2004): ``How many clusters?  An information
theoretic perspective.'' Neural Computation, 16:2483-2506.

S. Still, W. Bialek and L. Bottou (2003): "Geometric Clustering using the
Information Bottleneck method."  In "Advances In Neural Information
Processing Systems 16".

Back to: Top of Message | Previous Page | Main DUBLINICA Page



CataList Email List Search Powered by the LISTSERV Email List Manager