Classification module documentation
- 1 Introduction
- 2 JCLEC classification module
- 2.1 net.sf.jclec.problem.classification
- 2.2 net.fs.jclec.problem.classification.base
- 2.3 net.fs.jclec.problem.classification.blocks
- 2.4 net.fs.jclec.problem.classification.blocks.fuzzy
- 2.5 net.fs.jclec.problem.classification.crisprule
- 2.6 net.fs.jclec.problem.classification.fuzzyrule
- 2.7 net.fs.jclec.problem.classification.exprtree
- 2.8 net.fs.jclec.problem.classification.multiexprtree
- 2.9 net.fs.jclec.problem.classification.multisyntaxtree
- 2.10 net.fs.jclec.problem.classification.syntaxtree
- 2.11 net.fs.jclec.problem.classification.listener
- 3 API Reference
- 4 Download
- 5 Running a classification algorithm
- 6 Implementing new algorithms
- 7 Running Tests
JCLEC-Classification is an intuitive, usable and extensible open source module for genetic programming (GP) classification algorithms. This module is an open source software for researchers and end-users to develop classification algorithms based on GP and grammar guided genetic programming (G3P) models. It houses implementations of rule-based methods for classification based on GP, supporting multiple model representations and providing users the tools to easily implement any classifier. This library is a module for JCLEC, which is a software system for Evolutionary Computation (EC) research, developed in the Java programming language. JCLEC provides a high-level software environment to do any kind of Evolutionary Algorithm (EA), with support for genetic algorithms (binary, integer and real encoding), genetic programming (Koza style, strongly typed, and grammar based) and evolutionary programming.
The classification module includes some GP and G3P proposals described in the literature, and provides the necessary classes and methods to develop any kind of evolutionary algorithm for easily solving classification problems.
JCLEC classification module
Similarly to JCLEC core, the structure of the JCLEC classification module is organized in packages. In this section we describe the main packages in the JCLEC classification module. For further information, please visit the API reference. The next figure shows the class diagram for the JCLEC classification module.
This package contains all JCLEC classification module interfaces. These interfaces allow to represent a classifier and individuals.
This base package provides the abstract classes with the properties and methods that any classification algorithm must contain.
This package contains and represents several implementations of all possible primitive functions that could be used in an expression tree node.
This package contains and represents several implementations of all possible fuzzy primitive functions that could be used in an expression tree node.
This package has implementations to represent the phenotype of a crisp rule-base individual.
This package has implementations to represent the phenotype of a fuzzy rule-base individual.
This package defines the necessary classes to implement genetic programming encoding individuals.
This package defines the necessary classes to implement genetic programming encoding multiple individuals.
This package defines the necessary classes to implement grammar guided genetic programming encoding multipe individuals.
This package defines the necessary classes to implement grammar guided genetic programming encoding individuals.
This package defines the listener to obtain reports in each generation.
For further information, please visit the API reference.
The JCLEC core and the classification module can be obtained as follows:
- Download jar and source files from SourceForge.net
- Download source files from SVN. View details
- Download source files from GIT. View details
- Download source files from CVS. View details.
- Download source files and import as project in Eclipse. View details.
- Download source files and import as project in NetBeans. View details.
There is a tutorial of the JCLEC classification module available to download.
Running a classification algorithm
This section describes how to encode the configuration file required to run the algorithm in the JCLEC classification module.
How to encode the configuration file
The configuration file comprises a series of parameters required to run the algorithm. For a further detail see the configuration page. Following, the Tan algorithm configuration file is shown.
<experiment> <process algorithm-type="net.sf.jclec.problem.classification.algorithm.tan.TanAlgorithm"> <rand-gen-factory type="net.sf.jclec.util.random.RanecuFactory" seed="123456789"/> <population-size>100</population-size> <max-of-generations>100</max-of-generations> <max-deriv-size>20</max-deriv-size> <dataset type="net.sf.jclec.problem.util.dataset.KeelDataSet"> <train-data>data/iris/iris-10-1tra.dat</train-data> <test-data>data/iris/iris-10-1tst.dat</test-data> <attribute-class-name>Class</attribute-class-name> </dataset> <w1>0.7</w1> <w2>0.8</w2> <recombination-prob>0.8</recombination-prob> <mutation-prob>0.1</mutation-prob> <copy-prob>0.01</copy-prob> <support>0.1</support> <elitist-prob>0.1</elitist-prob> <listener type="net.sf.jclec.problem.classification.listener.RuleBaseReporter"> <report-dir-name>reports/reportTan</report-dir-name> <global-report-name>summaryTan</global-report-name> <report-frequency>10</report-frequency> </listener> </process> </experiment>
How to execute the Tan et al. algorithm
Once we have downloaded JCLEC, JCLEC classification module and its examples, and designed our experiment in the configuration file, we can execute the experiment.
Using JAR file
You can execute JCLEC modules using a JAR file.
java -jar jclec4-classification.jar examples/Tan.cfg
You can execute JCLEC modules using Eclipse.
Run -> Run Configurations
Create a new launch configuration as Java Application
Main class: net.sf.jclec.RunExperiment
Program arguments: examples/Tan.cfg
Finally, we execute our algorithm by clicking on the Run button.
Tan et al. algorithm results
File name: data/iris/iris-10-1tst.dat Runtime (s): 4.407 Number of different attributes: 4 Number of rules: 4 Number of conditions: 8 Average number of conditions per rule: 2,0 Accuracy (Percentage of correct predictions): 0,9333 Geometric mean: 0,9283 Cohen's Kappa rate: 0,9000 AUC: 0,9667 Percentage of correct predictions per class Class Iris-setosa: 100,00% Class Iris-versicolor: 100,00% Class Iris-virginica: 80,00% End percentage of correct predictions per class
Classifier 1 Rule: IF (AND NOT AND AND IN SepalLength 5.521539 7.4334537 < PetalWidth 1.579072 > PetalWidth 1.275671 < PetalWidth 0.769985 ) THEN (Class = Iris-setosa) 2 Rule: ELSE IF (AND IN PetalWidth 0.582293 1.815772 IN PetalWidth 0.190182 1.7987844 ) THEN (Class = Iris-versicolor) 3 Rule: ELSE IF (>= PetalLength 4.755571 ) THEN (Class = Iris-virginica) 4 Rule: ELSE (Class = Iris-setosa)
Test Classification Confusion Matrix
Predicted C0 C1 C2 | Actual C0 5 0 0 | C0 = Iris-setosa C1 0 5 0 | C1 = Iris-versicolor C2 1 0 4 | C2 = Iris-virginica
Implementing new algorithms
Including new classification algorithms is an easy issue. The following classes are required:
A new algorithm included in the module should inherit from the ClassificationAlgorithm class. In this new class, all its properties should be defined (parent selector, recombinator, mutator, species, etc.). Each of these properties are configured by means of the configuration file, which specifies the classes and the attribute values. (See [Bojarczuk Algorithm example]).
This new class inherits from the AbstractEvaluator class, allowing of evaluating each individual in the evolutionary algorithm. (See [Bojarczuk Evaluator example]).
This class represents the species to be used in the evolutionary algorithm. Here, user could make a differentiation between expression-tree and syntax-tree respectively. In such a way, each GP classification individual is represented by means of the ExprTreeRuleIndividual class, which represents an individual, comprising all the features required: the genotype, the phenotype, and the fitness value. The nodes and functions in GP trees are defined by the ExprTreeSpecies class. Similarly to GP individuals, the SyntaxTreeRuleIndividual class specifies all the features required to represent a G3P individual, while the SyntaxTreeSpecies allows of defining the terminal and nonterminal symbols of the grammar used to generate the individuals. (See [Bojarczuk Species example]).
In this section you will find the results of running unit tests over the algorithms availables in the JCLEC classification package. A unit test is a piece of code written by a developer that tests a specific functionality in the code which is tested. Unit tests can ensure that functionality is working and can be used to validate that this functionality still works after code changes.
For the sake of running unit tests, JUnit framwork was used.