STATISTICAL ANALYSIS


     SEMANA contains an Editor allowing the user to prepare tables directly, but tables may also be imported from a spreadsheet such as Excel® or obtained through the Collector associated to Dynamic DB Builder.
     Multivalued tables containing symbolic values are converted into one-valued tables called contingency tables.
In turn, one-valued tables may be converted into Burt's tables (tables of cooccurrences). These tables are particularly useful to study the dependence and clustering of attributes.

      The most powerful technique offered by SEMANA is Correspondence Factor Analysis (CFA) coupled with Hierarchical Ascending Classification (HAC) according to programs written by J.-P. Benzecri and coworkers in the 1970s.

      The report gives the Eigenvalues (inertia of the axes), the projections of each object and attribute onto the first 4 axes and the contribution of each axis to the definition of each point and the contribution of each point to the definition of the axes. Projections in planes [1,2] and [1,3] are proposed by default, but any other plane may be represented. The HAC is also displayed and classes may be colorized to help visualization.

 For beginners in Statistical Data Analysis looking for a good introduction to Factor Analysis, see http://www.micheloud.com/FXM/COR/e/index.htm



Other statistical tools are available in SEMANA:
 K-Means : an algorithm to cluster a set of n objects defined by attributes into k partitions (k < n being defined by the user). see K-means algorithm in Wikipedia.
 Correlation matrix : for each couple of attributes, a test of reduced deviation ("écart-réduit") is performed and expressed as a positive or negative number between 0 and 100, according as the correlation is positive or negative (+100 means a perfect correlation; 0 = independence; -100 = perfect inverse correlation).
 Matrix of distance : The matrix of distance between objects or between attributes is calculated according to various metrics : Euclidean distance, Chi2, Jaccard, Sokal and Michener (or Hamming distance). The results are displayed in cross tables and as ordered lists. The procedure applies to one-valued tables.
 Feature matching : A similarity index calculated according to Tversky' model is given for each pair of objects. The procedure applies to multi-valued tables (see Zhao et al. 2006).



References

BENZECRI J.-P. (1984). L'analyse des données. Vol. 1 : La Taxinomie ;
     Vol. 2 : L'Analyse des Correspondances. Ed. Dunod, Paris, 4ème éd.
JAMBU M. (1978). Classification automatique pour l'analyse des données.
     Vol. 1 : Méthodes et algorithmes ; vol. 2 : Logiciels (avec M.-O. Lebeaux). Ed. Dunod, Paris.
FENELON J.P. (1981). Qu'est-ce que l'analyse des données? Lefonen: Paris.
Yi Zhao, Xia Wang, Wolfgang Halang (2006). Ontology Mapping based on Rough Formal Concept Analysis. Proceedings of the Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services (AICT/ICIW 2006)


Apache/1.3.29 Server at celta.paris-sorbonne.fr Port 80