Causality Causality Workbench                                                             Challenges in Machine Learning Causality

Unsupervised and Transfer Learning Challenge

Fact sheet

 

Team name: NG-A3
Team leader:
First Name: Nistor
Last Name: Grozavu
Website: http://www-lipn.univ-paris13.fr/~grozavu/
Institution: LIPN
Country: France
Photo:
 
Phase 1 experiment: Exp1TL
Phase 2 experiment: transfer7 (phase 2a)
abcnga3 (phase 2b)
 
Title of the contribution:

Unsupervised Weighted Topological Learning for Feature Transformation

 
Background:

The used model consists on combining several unsupervised approaches for normalization, feature selection, topological learning and transfer or collaborative learning.
For the topological learning we propose to use an adaptation of the Self Organizing Maps which allows to detect and to weight the relevant features during the learning process called lwo-SOM.

Before to learn te data, we introduce a preprocessing step in the model, the normalization of the dataset using the variance normalization technique and preprocess it using the Principal Component Analysis and Singular Values Decompozition for the sparse datasets;

For the unsupervised learning, we construct a new datasets containg the distance between the validation and final datasets and the SOM maps.

To transfer the knowledge, we propose a new methodology which label the map using the transfer lables and after the labeling the map, we prune the map by eliminating the cells containing no labeled data.
After the prunning the map, we affect new data by computing the Euclidina distance between the validation and final datasets and the pruned map.
 
Results:
PHASE 1: UNSUPERVISED LEARNING
Dataset Validation Final evaluation Rank
AUC ALC AUC ALC
avicenna 0.658561 0.149326 0.701728 0.182106 9
harry 0.978586 0.794511 0.961722 0.709893 5
rita 0.707198 0.284878 0.786303 0.489439 2
sylvester 0.937103 0.606385 0.825077 0.44926 7
terry 0.990022 0.780955 0.994574 0.808953 5

PHASE 2a: TRANSFER LEARNING (Official ranking)
Dataset Validation Final evaluation Rank
AUC ALC AUC ALC
avicenna 0.627927 0.117747 0.631197 0.112783 6
harry 0.687074 0.221498 0.589041 0.0950054 6
rita 0.707523 0.259007 0.759892 0.363303 5
sylvester 0.936743 0.606771 0.624744 0.126217 6
terry 0.983234 0.739909 0.888154 0.566029 5

* The organizers detected that the team 1055A submitted by error their results on the validation set instead of those on the final evaluation set for the dataset Sylvester. The team was allowed to re-submit their results on that dataset and those are shown in the table. Without this correction, the 1055A team ranks 3rd and this is the official ranking (with the correction they rank 2nd ex aequo with tkgw).


PHASE 2b: (Supplemental ranking)
Dataset Validation Final evaluation Rank
AUC ALC AUC ALC
avicenna 0.637932 0.130236 0.623894 0.105119 7
harry 0.978586 0.794511 0.961722 0.709893 4
rita 0.707523 0.259007 0.759892 0.363303 7
sylvester 0.936743 0.606771 0.624744 0.126217 8
terry 0.983234 0.739909 0.888154 0.566029 6

** Due to an accidental release of the results on the final evaluation set on the scheduled deadline of phase 2, the planned grace period was canceled. However, the participants were permitted to make one last submission.

Method:
Algorithm Phase 1 Phase 2
Preprocessing with no learning at all: Did you use...
P1 Normalization of data matrix lines (patterns)?
P2 Normalization of data matrix columns (features)?
P3 Construction of new features (e.g. products of original features)?
P4 Functional data transformations (e.g. take log or sqrt)?
P5 Feature orthogonalization?
P6 Another preprocessing with no learning at all?
Unsupervised learning: Did you use...
U1 Linear manifold learning (e.g. factor analysis, PCA, ICA)?
U2 "Shallow" non-linear manifold learning for dimensionality reduction (e.g. KPCA, MDS, LLE, Laplacian Eigenmaps, Kohonen maps)?
U3 "Shallow" non-linear manifold learning to expand dimension (e.g. sparse coding)?
U4 Clustering (e.g. K-means, hierarchical clustering)?
U5 Deep Learning (e.g. stacks of auto-encoders, stacks of RBMs)?
U6 Another unsupervised learning method?
Transfer learning: Did you...
T1 - Not use of the transfer labels at all?  
T2 - Use of the transfer labels for selection of unsupervised learning methods, not for training?  
T3 - Use only a subset of the available transfer labels (i.e. select the tasks that are most suitable for transfer)?  
T4 - Learn a "shallow" representation with the transfer labels?  
T5 - Learn a "deep" representation with the transfer labels?  
T6 - Use transfer learning in another way?  
Feature selection: Did you...
F1 Not perform any feature selection?
F2 Use a feature selection mechanism embedded in your algorithm?
F3 Use a filter method not taking into account the prediction performances of the classifier (e.g. use reconstruction error)?
F4 Use a wrapper method to select features based on the performance of the classifier (e.g. use the validation set results)?
Kernel (or metric) learning: Did you...
K1 - Learn parameters in a "shallow" architecture (e.g. kernel width, NCA)?
K2 - Learn parameters in a "deep" architecture (e.g. a Siamese neural network)?
Ensemble methods: Did you...
E1 - Concatenate multiple representations?
E2 - Average several kernels?
Model selection: Did you...
M1 - Submit results with the same algorithm on all datasets (eventually with some hyperparameter tuning)?
M2 - Select the model performing best on the validation set?
M3 - Use cross-validation on development data?
Induction/Transduction: To prepare the final results did you...
I1 - Use of the development dataset for training?
I2 - Use of the validation dataset for training?
I3 - Use the final evaluation dataset for training?
Classifier: Did you...
C1 - Make specific changes to your algorithm knowing that it would be evaluated with a linear classifier?
C2 - Take into account the specific type of linear classifier algorithm we are using?
Advantages of the methods employed:
  • Quantitative advantages


    The model constructs a compact feature subset (the map size) and the used methods are computationaly simples.
  • Qualitative advantages


    We used the same model for each dataset with different parameters for datasets.
    As the topological learning is used, the results are easly to visualize.
  • Other methods

    PCA
    MDS,
    SVD
    K-means
    Normalization (variance, logarithmic, range,)
    Self Organizing Map
    Scree Test feature selection
    Observations weighting

    The critical element of success was the use of the topological learning to construct new features representation.
  • Software implementation
     
    • Availability
      Proprietary in house software
      Commercially available in house software
      Freeware or shareware in house software
      Off-the-shelf third party commercial software
      Off-the-shelf third party freeware or shareware
    • Language
      C/C++
      Java
      Matlab
      R
      Weka
      SAS/SPSS
      Python
      Other (precise below)

  • Hardware implementation
     
    • Platform
      Windows
      Linux or other Unix
      Mac OS
      Other (precise below)
    • Memory
      <= 2GB    <= 8 GB    > 8 GB    >= 32 GB
    • Parallelism
      Multi-processor machine
      Run in parallel different algorithms on different machines
      Other (precise below)
  • Development effort:

    How much time did you spend customizing your existing code (total human effort)?
    A few hours    A few days    1-2 weeks    >2 weeks   

    How much time did you spend experimenting with the validation datasets (total computer time effort)?
    A few hours    A few days    1-2 weeks    >2 weeks   

    Did you get enough development time?
    Yes    No

References:

GROZAVU N., BENNANI Y., LEBBAH M. (2010), «Cluster-dependent features selection through a weighting learning paradigm», in «Advances In Knowledge Discovery and Management », F. Guillet, G. Ritschard, D. Zighed and H. Briand (eds), Vol. 292, pp. 133-147, Springer. ISBN: 978-3-642-00579-4, DOI: 10.1007/978-3-642-00580-0. Invited paper.

GROZAVU N., BENNANI Y. (2010), «Topological Collaborative Clustering», in LNCS Springer of ICONIP'10 : 17th International Conference on Neural Information Processing, 22nd 25th November 2010 in Sydney, Australia.

GROZAVU N., BENNANI Y., LEBBAH M. (2009), «From variable weighting to cluster characterization in topographic unsupervised learning», Proc. IJCNN 09, International Joint Conference on Neural Network, 14-19 June 2009, Atlanta, Georgia, USA.
Figures: