public class NaiveBayes
extends Object
implements scala.Serializable
(label, features) pairs.
This is the Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all kinds of
discrete data. For example, by converting documents into TF-IDF vectors, it can be used for
document classification. By making every vector a 0-1 vector, it can also be used as
Bernoulli NB (http://tinyurl.com/p7c96j6). The input feature values must be nonnegative.
| Constructor and Description |
|---|
NaiveBayes() |
NaiveBayes(double lambda) |
| Modifier and Type | Method and Description |
|---|---|
double |
getLambda()
Get the smoothing parameter.
|
String |
getModelType()
Get the model type.
|
NaiveBayesModel |
run(RDD<LabeledPoint> data)
Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
|
NaiveBayes |
setLambda(double lambda)
Set the smoothing parameter.
|
NaiveBayes |
setModelType(String modelType)
Set the model type using a string (case-sensitive).
|
static NaiveBayesModel |
train(RDD<LabeledPoint> input)
Trains a Naive Bayes model given an RDD of
(label, features) pairs. |
static NaiveBayesModel |
train(RDD<LabeledPoint> input,
double lambda)
Trains a Naive Bayes model given an RDD of
(label, features) pairs. |
static NaiveBayesModel |
train(RDD<LabeledPoint> input,
double lambda,
String modelType)
Trains a Naive Bayes model given an RDD of
(label, features) pairs. |
public static NaiveBayesModel train(RDD<LabeledPoint> input)
(label, features) pairs.
This is the default Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all
kinds of discrete data. For example, by converting documents into TF-IDF vectors, it
can be used for document classification.
This version of the method uses a default smoothing parameter of 1.0.
input - RDD of (label, array of features) pairs. Every vector should be a frequency
vector or a count vector.public static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda)
(label, features) pairs.
This is the default Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all
kinds of discrete data. For example, by converting documents into TF-IDF vectors, it
can be used for document classification.
input - RDD of (label, array of features) pairs. Every vector should be a frequency
vector or a count vector.lambda - The smoothing parameterpublic static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda, String modelType)
(label, features) pairs.
The model type can be set to either Multinomial NB (http://tinyurl.com/lsdw6p)
or Bernoulli NB (http://tinyurl.com/p7c96j6). The Multinomial NB can handle
discrete count data and can be called by setting the model type to "multinomial".
For example, it can be used with word counts or TF_IDF vectors of documents.
The Bernoulli model fits presence or absence (0-1) counts. By making every vector a
0-1 vector and setting the model type to "bernoulli", the fits and predicts as
Bernoulli NB.
input - RDD of (label, array of features) pairs. Every vector should be a frequency
vector or a count vector.lambda - The smoothing parameter
modelType - The type of NB model to fit from the enumeration NaiveBayesModels, can be
multinomial or bernoullipublic NaiveBayes setLambda(double lambda)
public double getLambda()
public NaiveBayes setModelType(String modelType)
modelType - (undocumented)public String getModelType()
public NaiveBayesModel run(RDD<LabeledPoint> data)
data - RDD of LabeledPoint.