Data Transformation and Editing
We are generally dealing with a large set of objects sharing common
features. The challenge is to describe these objects with an
appropriate set of attributes. Objects have an infinite number of
properties, so that the choice of pertinent attributes with respect to
the questions to be addressed is determining. Often the best result is
not obtained in the first trial, but at the end of a long series of
"trials and errors" using an interactive and iterative feed-back mechanism.
is conceived to help the user in these time-consuming tasks.
Attribute Editor :
- When a Database is built, it generally consists of a set of objects
described by a set of attributes taking a set of values (so-called multi-valued
table). In Dynamic
DB Builder, there is an Attribute
Editor allowing the user to build and modify easily the whole
set of attributes-values (AV):
Modifications are instantly applied to the whole DB.
||- to create new attributes and/or value
- to change their names
- to merge attributes
- to split attributes.
Statistics and help to decision :
DB Builder provides statistics about the use of
attributes and values. A report informs the user when two attributes
could be merged, i.e. when they are used
exclusively (when one is used, the other is not).
DB Builder also indicates the existence of
duplicates in the DB (i.e. the objects that have
exactly the same set of AV) and gives an index of saturation
of the DB (i.e.
the number of combinations of AV used with respect to the theoretical
number of combinations). This gives an idea of the representativeness
of the sample.
Collector : In Dynamic DB Builder,
there is a procedure named Collector
which builds a table from the whole set of objects present in the DB.
This is a multi-valued table (AV-type) made of
As such, multi-valued tables are ready for RST
and Decision logic.
Table conversions :
For other procedures, such as FCA
multi-valued-tables must be converted into one-valued tables.
||• This is generally achieved using nominal or
plain scaling (each value of a multi-valued attribute becomes
nominally a one-valued attribute).
• Logical scaling can also be used. It
the combination of two (or more) attributes according to rules proposed
by the expert (see the example of the sleeping bags in S. Prediger
• Discretization: Quantitative measurements
length, weight, notations, count of words, etc.) can be converted into
discrete values (called modalities) according to
conversion rules designed by the expert. Histograms can help to the
Table reordering :
There is a statistical test implemented in SEMANA
called clustering index
which indicates whether there is a trend toward clustering or toward
seriation (Renfrew and Sterud 1969). If there is a trend toward
seriation, the rows and columns of the table may be reorganized in
order to concentrate the positive values along the diagonal. The
following example is an ideal, theoretical case:
(after Caraux 1984)
CARAUX G. (1984). Réorganisation et représentation
visuelle d'une matrice de données numériques: un
algorithme itératif. Revue de Statistique appliquée,
t. 32, n°4, pp. 5-23.
PREDIGER S. (1997). "Logical
scaling in Formal Concept Analysis". In Conceptual
structures: fulfilling Peirce's dream
(D. Lukose et al. eds.) Proceedings of the 5th Internat. Conf. on
conceptual structures (ICCS'97). Lecture notes in Artificial
Intelligence n°1257, Springer-Verlag: Berlin, pp. 332-341.
RENFREW C., G. STERUD (1969). Close-Proximity Analysis: A Rapid Method
for the Ordering of Archaeological Materials. American
Antiquity, Vol. 34, No. 3, pp. 265-277.
DEMSAR, J. & ZUPAN, B. (2005),"From
Experimental Machine Learning to Interactive Data Mining", (white paper), Slovenia
On conceptual modeling of data mining,
in: Wang, J., Zhou, Z.H., and Zhou, A.Y. (Eds.),
Machine Learning and Applications,
Tsinghua University Press, Beijing, pp. 238-255, 2006.
Apache/1.3.29 Server at celta.paris-sorbonne.fr Port 80