How To Apply Randomforest Algorithm In Pmml Format

SkLearn2PMML

Python library for converting Scikit-Learn pipelines to PMML.

Features

This library is a thin wrapper around the JPMML-SkLearn command-line application.

For a list of supported Estimator and Transformer types, please refer to JPMML-SkLearn supported packages.

Prerequisites

Python 2.7, 3.4 or newer.
Java 1.8 or newer. The Java executable must be available on system path.

Installation

Installing a release version from PyPI:

Alternatively, installing the latest snapshot version from GitHub:

              pip install --upgrade git+https://github.com/jpmml/sklearn2pmml.git

Usage

A typical workflow can be summarized as follows:

Create a PMMLPipeline object, and populate it with pipeline steps as usual. Class sklearn2pmml.pipeline.PMMLPipeline extends class sklearn.pipeline.Pipeline with the following functionality:

If the PMMLPipeline.fit(X, y) method is invoked with pandas.DataFrame or pandas.Series object as an X argument, then its column names are used as feature names. Otherwise, feature names default to "x1", "x2", .., "x{number_of_features}".
If the PMMLPipeline.fit(X, y) method is invoked with pandas.Series object as an y argument, then its name is used as the target name (for supervised models). Otherwise, the target name defaults to "y".

Fit and validate the pipeline as usual.
Optionally, compute and embed verification data into the PMMLPipeline object by invoking PMMLPipeline.verify(X) method with a small but representative subset of training data.
Convert the PMMLPipeline object to a PMML file in local filesystem by invoking utility method sklearn2pmml.sklearn2pmml(pipeline, pmml_destination_path).

Developing a simple decision tree model for the classification of iris species:

              import              pandas              iris_df              =              pandas.read_csv("Iris.csv")              iris_X              =              iris_df[iris_df.columns.difference(["Species"])]              iris_y              =              iris_df["Species"]              from              sklearn.tree              import              DecisionTreeClassifier              from              sklearn2pmml.pipeline              import              PMMLPipeline              pipeline              =              PMMLPipeline([ 	("classifier",              DecisionTreeClassifier()) ])              pipeline.fit(iris_X,              iris_y)              from              sklearn2pmml              import              sklearn2pmml              sklearn2pmml(pipeline,              "DecisionTreeIris.pmml",              with_repr              =              True)

Developing a more elaborate logistic regression model for the same:

              import              pandas              iris_df              =              pandas.read_csv("Iris.csv")              iris_X              =              iris_df[iris_df.columns.difference(["Species"])]              iris_y              =              iris_df["Species"]              from              sklearn_pandas              import              DataFrameMapper              from              sklearn.decomposition              import              PCA              from              sklearn.feature_selection              import              SelectKBest              from              sklearn.impute              import              SimpleImputer              from              sklearn.linear_model              import              LogisticRegression              from              sklearn2pmml.decoration              import              ContinuousDomain              from              sklearn2pmml.pipeline              import              PMMLPipeline              pipeline              =              PMMLPipeline([ 	("mapper",              DataFrameMapper([ 		(["Sepal.Length",              "Sepal.Width",              "Petal.Length",              "Petal.Width"], [ContinuousDomain(),              SimpleImputer()]) 	])), 	("pca",              PCA(n_components              =              3)), 	("selector",              SelectKBest(k              =              2)), 	("classifier",              LogisticRegression(multi_class              =              "ovr")) ])              pipeline.fit(iris_X,              iris_y)              pipeline.verify(iris_X.sample(n              =              15))              from              sklearn2pmml              import              sklearn2pmml              sklearn2pmml(pipeline,              "LogisticRegressionIris.pmml",              with_repr              =              True)

Documentation

Up-to-date:

Benchmarking Scikit-Learn against JPMML-Evaluator in Java and Python environments
Extending Scikit-Learn with outlier detector transformer type
Analyzing Scikit-Learn feature importances via PMML
Training Scikit-Learn based TF(-IDF) plus XGBoost pipelines
Converting Scikit-Learn based TF(-IDF) pipelines to PMML documents
Converting Scikit-Learn based Imbalanced-Learn (imblearn) pipelines to PMML documents
Extending Scikit-Learn with date and datetime features
Extending Scikit-Learn with feature specifications
Converting logistic regression models to PMML documents
Stacking Scikit-Learn, LightGBM and XGBoost models
Converting Scikit-Learn hyperparameter-tuned pipelines to PMML documents
Extending Scikit-Learn with GBDT plus LR ensemble (GBDT+LR) model type
Converting Scikit-Learn based TPOT automated machine learning (AutoML) pipelines to PMML documents
Converting Scikit-Learn based LightGBM pipelines to PMML documents
Extending Scikit-Learn with business rules (BR) model type

Slightly outdated:

Converting Scikit-Learn to PMML

De-installation

Uninstalling:

              pip uninstall sklearn2pmml

License

SkLearn2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use SkLearn2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes SkLearn2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

SkLearn2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io