Classification module documentation

From JCLEC wiki
Jump to: navigation, search

Introduction

JCLEC-Classification is an intuitive, usable and extensible open source module for genetic programming (GP) classification algorithms. This module is an open source software for researchers and end-users to develop classification algorithms based on GP and grammar guided genetic programming (G3P) models. It houses implementations of rule-based methods for classification based on GP, supporting multiple model representations and providing users the tools to easily implement any classifier. This library is a module for JCLEC, which is a software system for Evolutionary Computation (EC) research, developed in the Java programming language. JCLEC provides a high-level software environment to do any kind of Evolutionary Algorithm (EA), with support for genetic algorithms (binary, integer and real encoding), genetic programming (Koza style, strongly typed, and grammar based) and evolutionary programming.

The classification module includes some GP and G3P proposals described in the literature, and provides the necessary classes and methods to develop any kind of evolutionary algorithm for easily solving classification problems.

JCLEC classification module

Similarly to JCLEC core, the structure of the JCLEC classification module is organized in packages. In this section we describe the main packages in the JCLEC classification module. For further information, please visit the API reference. The next figure shows the class diagram for the JCLEC classification module.

JCLEC class diagram

net.sf.jclec.problem.classification

This package contains all JCLEC classification module interfaces. These interfaces allow to represent a classifier and individuals.

Classes:

  • IClassifier
  • IClassifierIndividual

net.fs.jclec.problem.classification.base

This base package provides the abstract classes with the properties and methods that any classification algorithm must contain.

Classes:

  • ClassificationAlgorithm
  • ClassificationReporter
  • Rule
  • RuleBase

net.fs.jclec.problem.classification.blocks

This package contains and represents several implementations of all possible primitive functions that could be used in an expression tree node.

Classes:

  • And
  • Or
  • Not
  • Equal
  • NotEqual
  • Less
  • LessOrEqual
  • Greater
  • GreaterOrEqual
  • In
  • Out
  • AttributeValue
  • ConstantValue
  • RandomConstantOfContinuousValues
  • RandomConstantOfDiscreteValues

net.fs.jclec.problem.classification.blocks.fuzzy

This package contains and represents several implementations of all possible fuzzy primitive functions that could be used in an expression tree node.

Classes:

  • Is
  • Maximum
  • MembershipFunction
  • Minimum
  • TriangularMembershipFunction

net.fs.jclec.problem.classification.crisprule

This package has implementations to represent the phenotype of a crisp rule-base individual.

Classes:

  • CrispRule
  • CrispRuleBase

net.fs.jclec.problem.classification.fuzzyrule

This package has implementations to represent the phenotype of a fuzzy rule-base individual.

Classes:

  • FuzzyRule
  • FuzzyRuleBase

net.fs.jclec.problem.classification.exprtree

This package defines the necessary classes to implement genetic programming encoding individuals.

Classes:

  • ExprTreeSpecies
  • ExprTreeRuleIndividual

net.fs.jclec.problem.classification.multiexprtree

This package defines the necessary classes to implement genetic programming encoding multiple individuals.

Classes:

  • MultiExprTreeRuleIndividual

net.fs.jclec.problem.classification.multisyntaxtree

This package defines the necessary classes to implement grammar guided genetic programming encoding multipe individuals.

Classes:

  • MultiSyntaxTreeRuleIndividual

net.fs.jclec.problem.classification.syntaxtree

This package defines the necessary classes to implement grammar guided genetic programming encoding individuals.

Classes:

  • SyntaxTreeRuleIndividual
  • SyntaxTreeSchema
  • SyntaxTreeSpecies

net.fs.jclec.problem.classification.listener

This package defines the listener to obtain reports in each generation.

Classes:

  • RuleBaseReporter

API Reference

For further information, please visit the API reference.

Download

The JCLEC core and the classification module can be obtained as follows:


There is a tutorial of the JCLEC classification module available to download.

Running a classification algorithm

This section describes how to encode the configuration file required to run the algorithm in the JCLEC classification module.

How to encode the configuration file

The configuration file comprises a series of parameters required to run the algorithm. For a further detail see the configuration page. Following, the Tan algorithm configuration file is shown.

<experiment>
	<process algorithm-type="net.sf.jclec.problem.classification.algorithm.tan.TanAlgorithm">
		<rand-gen-factory type="net.sf.jclec.util.random.RanecuFactory" seed="123456789"/>
		<population-size>100</population-size>
		<max-of-generations>100</max-of-generations>
		<max-deriv-size>20</max-deriv-size>
		<dataset type="net.sf.jclec.problem.util.dataset.KeelDataSet">
			<train-data>data/iris/iris-10-1tra.dat</train-data>
		 	<test-data>data/iris/iris-10-1tst.dat</test-data>	
			<attribute-class-name>Class</attribute-class-name>
		</dataset>
		<w1>0.7</w1>
		<w2>0.8</w2>
		<recombination-prob>0.8</recombination-prob>
		<mutation-prob>0.1</mutation-prob>
		<copy-prob>0.01</copy-prob>
		<support>0.1</support>
		<elitist-prob>0.1</elitist-prob>
		<listener type="net.sf.jclec.problem.classification.listener.RuleBaseReporter">
			<report-dir-name>reports/reportTan</report-dir-name>
			<global-report-name>summaryTan</global-report-name>
			<report-frequency>10</report-frequency>	
		</listener>
	</process> 
</experiment>

How to execute the Tan et al. algorithm

Once we have downloaded JCLEC, JCLEC classification module and its examples, and designed our experiment in the configuration file, we can execute the experiment.

Using JAR file

You can execute JCLEC modules using a JAR file.

java -jar jclec4-classification.jar examples/Tan.cfg

Using Eclipse

You can execute JCLEC modules using Eclipse.

Run -> Run Configurations

Create a new launch configuration as Java Application

Execution1.jpg

Project: jclec4-classification

Main class: net.sf.jclec.RunExperiment

TanConfigure.jpg

Program arguments: examples/Tan.cfg

Finally, we execute our algorithm by clicking on the Run button.

Tan et al. algorithm results

File name: data/iris/iris-10-1tst.dat
Runtime (s): 4.407
Number of different attributes: 4
Number of rules: 4
Number of conditions: 8
Average number of conditions per rule: 2,0
Accuracy (Percentage of correct predictions): 0,9333
Geometric mean: 0,9283
Cohen's Kappa rate: 0,9000
AUC: 0,9667

Percentage of correct predictions per class
Class Iris-setosa: 100,00%
Class Iris-versicolor: 100,00%
Class Iris-virginica: 80,00%
End percentage of correct predictions per class
Classifier
1 Rule: IF (AND NOT AND AND IN SepalLength 5.521539 7.4334537 < PetalWidth 1.579072 > PetalWidth 1.275671 < PetalWidth 0.769985 ) THEN (Class = Iris-setosa)
2 Rule: ELSE IF (AND IN PetalWidth 0.582293 1.815772 IN PetalWidth 0.190182 1.7987844 ) THEN (Class = Iris-versicolor)
3 Rule: ELSE IF (>= PetalLength 4.755571 ) THEN (Class = Iris-virginica)
4 Rule: ELSE (Class = Iris-setosa)

Test Classification Confusion Matrix

			Predicted
			C0	C1	C2	|
	Actual	C0	5	0	0	|	C0 = Iris-setosa
		C1	0	5	0	|	C1 = Iris-versicolor
		C2	1	0	4	|	C2 = Iris-virginica

Implementing new algorithms

Including new classification algorithms is an easy issue. The following classes are required:

net.sf.jclec.problem.classification.algorithm.myalgorithm.MyAlgorithm

A new algorithm included in the module should inherit from the ClassificationAlgorithm class. In this new class, all its properties should be defined (parent selector, recombinator, mutator, species, etc.). Each of these properties are configured by means of the configuration file, which specifies the classes and the attribute values. (See [Bojarczuk Algorithm example]).

net.sf.jclec.problem.classification.algorithm.myalgorithm.MyEvaluator

This new class inherits from the AbstractEvaluator class, allowing of evaluating each individual in the evolutionary algorithm. (See [Bojarczuk Evaluator example]).

net.sf.jclec.problem.classification.algorithm.myalgorithm.MySpecies

This class represents the species to be used in the evolutionary algorithm. Here, user could make a differentiation between expression-tree and syntax-tree respectively. In such a way, each GP classification individual is represented by means of the ExprTreeRuleIndividual class, which represents an individual, comprising all the features required: the genotype, the phenotype, and the fitness value. The nodes and functions in GP trees are defined by the ExprTreeSpecies class. Similarly to GP individuals, the SyntaxTreeRuleIndividual class specifies all the features required to represent a G3P individual, while the SyntaxTreeSpecies allows of defining the terminal and nonterminal symbols of the grammar used to generate the individuals. (See [Bojarczuk Species example]).

Running Tests

In this section you will find the results of running unit tests over the algorithms availables in the JCLEC classification package. A unit test is a piece of code written by a developer that tests a specific functionality in the code which is tested. Unit tests can ensure that functionality is working and can be used to validate that this functionality still works after code changes.

For the sake of running unit tests, JUnit framwork was used.

Falco unit test

TestUnitFalco.png

Tan unit test

TestUnitTan.png

Bojarczuk unit test

TestUnitBojarczuk.png