web123456

Machine Learning Chapter 6 on Simple Bayesian Models

Plain Bayes (math.)mould(Understanding)

    • 6.1 Principles of the Plain Bayesian Modeling Algorithm
      • 6.1.1 Bayesian modeling under one-dimensional feature vectors
      • 6.1.2 Bayesian modeling under two-dimensional feature vectors
      • 6.1.3 Bayesian modeling under n-dimensional feature vectors
      • 6.1.4 Simple Code Demonstration of Plain Bayesian Modeling
    • 6.2 Case Study - Tumor Prediction Model (Classification Model)
      • 6.2.1 Case background
      • 6.2.2 Data reading and segmentation
      • 6.2.3 Modeling and Prediction
    • 6.3 Curriculum-related resources

This chapter focuses onmachine learningin the plain Bayesian model, including the plain BayesianarithmeticPrinciples and programming implementation. A classic example of Park Bayes: determining whether a tumor is benign or malignant will also be presented to reinforce the points learned.

6.1 Principles of the Plain Bayesian Modeling Algorithm

Bayesian classification is one of the more widely used classification algorithms in machine learning, which arises from Bayesian thinking about the inverse probability problem, and plain Bayes is one of the simplest Bayesian models.
The core of the algorithm is a Bayesian formulation:
在这里插入图片描述
where P(A) is the probability of event A occurring and P(B) is the probability of event B occurring.P(A|B) denotes the probability of event A occurring conditional on event B occurring, and similarly P(B|A) denotes the probability of event B occurring conditional on event A occurring
As a simple example, it is known that the probability of a person catching a cold during the flu season (event A) is 40% (P(A)), the probability of a person sneezing (event B) is 80% (P(B)), and the probability of a person sneezing under the condition of catching a cold is 100% (P(B|A)), so it is known that a person has started to sneeze, what is the probability of him being suffering from a cold? This is really about finding the probability that the person is suffering from a cold under the condition of sneezing P(A|B), and the solution process is shown below:
在这里插入图片描述

6.1.1 Bayesian modeling under one-dimensional feature vectors

Let's start with a more detailed example of a more real-world-oriented application of Bayes' formula: how to determine whether a person has a cold. Suppose there have been 5 samples of data, as shown in the table below:
在这里插入图片描述
For demonstration purposes, a feature variable is chosen here: sneezing (X1), where the number 1 means sneezing and 0 means not sneezing; the target variable here is cold (Y), where the number 1 means having a cold and the number 0 means not having a cold.
Based on the above data, we are going to utilize Bayesian formulas to predict whether a person is in a state of flu or not.For example, if a person sneezes (X1=1), what is the probability that he has a cold, i.e., what is the probability of predicting that he is in a state of cold, which mathematically we write as P(Y|X1).
Applying the Bayesian formula has:
在这里插入图片描述
Based on the above data, we can calculate the probability of getting a cold under the condition of sneezing (X1=1) as
在这里插入图片描述
where P(X1=1|Y=1) is the probability of sneezing under the condition of already having a cold, here 3 out of 4 samples with colds sneeze, so the probability is 3/4; P(Y=1) is the probability of having a cold in all the samples, here 4 out of 5 have colds, so it is 4/5; and P(X1=1) is the probability of having a sneeze in all the samples, here 4 out of 5 sneezing, so 4/5.
Similarly the probability of not getting a cold conditional on sneezing (X1=1) is:
在这里插入图片描述
where P(X1=1|Y=0) is the probability of sneezing in the condition of not having a cold, which is 1; P(Y=0) is the probability of not having a cold in all samples, which is 4/5; and P(X1=1) is the probability of sneezing in all samples, which is 4/5.
Since 3/4 is greater than 1/4, the probability of getting a cold under the condition of sneezing (X1=1) is higher than the probability of not getting a cold, so the person is judged to have a cold.

6.1.2 Bayesian modeling under two-dimensional feature vectors

Layer by layer, we add another feature variable: headache (X2),where the number 1 means headache and 0 means no headache; the target variable here is still cold (Y).
在这里插入图片描述
Based on the above data, we still utilize the Bayesian formula to predict whether a person is in a cold or not, theFor example, if a person sneezes and has a headache (X1=1, X2=1), what is the probability that he has a cold, i.e., what is the probability of predicting that he is in a cold state, which mathematically we write as P(Y|X1,X2).
Applying the Bayesian formula has:
在这里插入图片描述
When comparing P(Y=1|X1,X2) with P(Y=0|X1,X2), since the denominator P(X1,X2) has the same value, we can eliminate this part of the calculation and directly compare the size of the numerators of the two.. To wit:
在这里插入图片描述
Supplementary knowledge: assumptions of independence
Before calculating this probability, we first introduce the independence assumption of the plain Bayesian model: the features in the plain Bayesian model are independent of each other, that is, the在这里插入图片描述, so the above equation can be written:
在这里插入图片描述

Under the independence assumption, we can compute the probability of catching a cold P(Y=1|X1,X2) conditional on sneezing and having a headache (X1=1, X2=1), which reduces to computing the value of P(X1|Y=1)P(X2|Y=1)P(Y=1) (P(X1|Y)P(X2|Y)P(Y)):
在这里插入图片描述
Similarly, we can calculate the probability P(Y=0|X1,X2) of not having a cold under the condition of sneezing and having a headache (X1=1, X2=1), which simplifies to calculating P(X1|Y=0)P(X2|Y=0)P(Y=0):
在这里插入图片描述
Since 9/20 is greater than 1/5, we can determine that the probability of getting a cold under the condition of sneezing but not having a headache (X1=1, X2=1) is higher than the probability of not getting a cold.

6.1.3 Bayesian modeling under n-dimensional feature vectors

We can generalize from 2 feature variables to n feature variables X1, X2, ... , Xn by applying the Bayesian formula there:
在这里插入图片描述
Similarly since the denominators are the same, we only need to focus on the numerator
在这里插入图片描述
The plain Bayesian model assumes that the features are independent of each other given the target value, and the above equation can be written as
在这里插入图片描述
Where P(X1|Y), P(X2|Y), P(Y) and other data are known, from which it is possible to calculate the probability of the target variable taking a certain value under the condition of different values of the n feature variables according to the above formula, and select the one with higher probability to classify the sample.

6.1.4 Simple Code Demonstration of Plain Bayesian Modeling

The plain Bayesian model (here a Gaussian Bayesian classifier is used) is introduced as shown below:

from sklearn.naive_bayes import GaussianNB
  • 1

In JupyterNotebookIn the editor, after introducing the library, you can get the official explanation with the following code:

GaussianNB?
  • 1

A simple code demonstration of the plain Bayesian model is shown below:

from sklearn.naive_bayes import GaussianNB
X = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
y = [0, 0, 0, 1, 1]

model = GaussianNB()
model.fit(X, y)

print(model.predict([[5, 5]]))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

Where X is the feature variable, which has a total of 2 features; y is the target variable, which has a total of two categories: 0 and 1; line 5 introduces the model; line 6 trains the model through the fit() function; and the last line makes a prediction through the predict() function, and the prediction results are as follows:

[0]
  • 1

6.2 Case Study - Tumorpredictive model(Classification model)

This section will explain how to apply the plain Bayesian model in practice using one of the more classic tumor prediction models in the healthcare industry as an example, which we will use to predict whether a tumor is benign or malignant.

6.2.1 Case background

With the rapid advancement of medical care, the demand for hospitals to quickly identify whether a tumor is benign or not is also increasing, and the ability to quickly determine the nature of a tumor based on the level of relevant characteristics of the patient's tumor affects the way the patient is treated and the speed of healing. Traditionally, doctors determine the nature of a tumor based on dozens of indicators, but the prediction of this method depends on the doctor's personal experience and is less efficient, while through machine learning we expect to be able to quickly predict the nature of the tumor.

6.2.2 Data reading and segmentation

1. Read data
Firstly, the data of breast tumors of 569 patients in a hospital with 6 characteristic dimensions and whether the tumor of the patient is benign (Y) or not are imported. The data of benign tumors are 358 cases and malignant tumors are 211 cases.

import pandas as pd
df = pd.read_excel('Tumor data.xlsx')
df.head()
  • 1
  • 2
  • 3

() is used to display the first five rows of data. The result of the run is shown below:
在这里插入图片描述
The six feature variables are: maximum perimeter, maximum concavity, average concavity, maximum area, maximum radius, and average surface texture gray value. Maximum perimeter represents the average of the three largest values of perimeter in all tumors; maximum concavity represents the average of the three largest values of concavity in all tumors; average concavity represents the average of concavity in all tumors; maximum area represents the average of the three largest values of area in all tumors; maximum radius represents the average of the three largest values of radius in all tumors; and the average grayscale value represents the average of grayscale value of the images of all tumors. mean value. For the target variable tumor properties, Y=0 means the tumor is malignant and Y=1 means the tumor is benign.
Note that only six feature variables were selected here for demonstration purposes, but there are far more feature variables used to determine whether a tumor is benign or not in the healthcare industry.

2. Classification of characteristic and target variables
The feature variables and target variables are extracted separately by the following code which is given below:

X = df.drop(columns='Nature of the tumor') 
y = df['Nature of the tumor']   
  • 1
  • 2

Here, the column "Tumor Properties" is deleted by the drop() function, and the remaining data is assigned as a feature variable to the variable X. Then the column "Tumor Properties" is extracted as a target variable by the way of extracting columns from the DataFrame and assigned to the variable y. The column "Tumor Properties" is then extracted as a target variable by the way of extracting columns from the DataFrame. Then extract the column "Tumor Properties" as the target variable by DataFrame extraction and assign it to variable y.

6.2.3 Modeling and Prediction

1. Divide the training set and test set
Similar to the previous sections, the data is divided into a training set and a test set by the following code, where the training set is used to train the model and the test set is used to test the model and thus detect the quality of the model.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
  • 1
  • 2

Where X_train, y_train are the feature variable and target variable data in the training set, and X_test, y_test are the feature variable and target variable data in the test set, interested readers can print them out and take a look. Note that there are more than 500 data here, which is not much, so the training and test sets are divided by 8:2 ratio, so test_size is set to 0.2.
Because every time you run the program, train_test_split() function is a random division of data, if you want to divide the data every time the content is consistent, you can set the random_state parameter to 1, the number 1 does not have a special meaning, you can replace it with another number, which is just equivalent to a seed parameter, so that every time you divide the data, the content is consistent. is consistent.

2.Model building
The process of model building is relatively easy, and a plain Bayesian model can be built with the following code.

from sklearn.naive_bayes import GaussianNB
nb_clf = GaussianNB()  # Gaussian plain Bayesian modeling
nb_clf.fit(X_train,y_train)
  • 1
  • 2
  • 3

The first line of code from theScikit-LearnBayesian models (naive_bayes) are introduced in the library, and the one used here is the GaussianNB, which is the most widely used in the application scenario, and which can be applied to any continuous numerical dataset.
The second line of code assigns the logistic regression model to the nb_clf variable, which is not parameterized here, i.e., the default parameters are used.
The third line of code then performs the training of the model through the fit() method, where the parameters passed in are the training set data X_train, y_train obtained in the previous step.
At this point, a simple Bayesian model has been built, after which the model can be used to make predictions, at which time the previously divided test set can play a role, we can use the test set to make predictions and evaluate the model's prediction effect.

3. Model Prediction - Predicting Data Results
The purpose of building the model is to use it to predict the data, here the data in the test set is imported into the model to make predictions, the code is as follows, where nb_clf is the plain Bayesian regression model built above.

y_pred = nb_clf.predict(X_test)
  • 1

The first 100 items of the predicted y_pred are printed out as follows by print(y_pred[:100]), 0 and 1 are the predicted results, 0 is predicted to be a malignant tumor and 1 is predicted to be a benign tumor.
在这里插入图片描述
Using the knowledge points related to creating a DataFrame, the predicted y_pred and the actual y_test of the test set are summarized in the following code:

a = pd.DataFrame()  # Create an empty DataFrame
a['Predicted value'] = list(y_pred)
a['Actual value'] = list(y_test)
  • 1
  • 2
  • 3

The comparison table generated at this point is shown below:
在这里插入图片描述
You can see that the prediction accuracy of the first five items is 80% at this point, and if you want to see the prediction accuracy of all the test set data, you can use the following code:

from sklearn.metrics import accuracy_score
score = accuracy_score(y_pred, y_test)
  • 1
  • 2

Where the first line introduces the accuracy_score() method that can calculate the accuracy, then the second line passes the predicted value y_pred and the actual value y_test into the parentheses of the accuracy_score(), then the accuracy of the prediction can be calculated, and the printout of the score is printed out, and it is found that the value of the score is 0.947, that is, the accuracy of the prediction is 94.7%, which means that out of 114 (569*0.2) test data, a total of about 108 data were predicted correctly and 6 data were predicted incorrectly.
act asclassification modelThe plain Bayesian model can also be evaluated using ROC curves to assess its model modeling effects, and interested readers can try it out for themselves in the same way as the previous logistic regression model and thedecision treeThe method of the model is the same. The six feature variables here are the ones that the author has already screened for higher feature importance, so the model will have a steeper ROC curve and a higher AUC value.

To summarize, the plain Bayesian model is a very classic machine learning model, which is mainly based on the Bayesian formula, and in the process of application, the features in the dataset will be regarded as independent of each other, without considering the correlation relationship between the features, and its computing speed is faster. Compared to other classic machine learning models, the ability of the plain Bayesian model to generalize is slightly weaker, but when the number of samples and features increases, its prediction effect is also good.

6.3 Curriculum-related resources

How to get the author: micro-signal to get

Add the following wechat: huaxz001 .

The author's website:

Yutao Wang related courses can be passed:
Jingdong Link:[/Search?keyword=Wang Yutao], search for "Wang Yutao", inTaobao, DangdangAlso available for purchase. To join the learning exchange group, you can add the following WeChat: huaxz001 (please specify the reason).
在这里插入图片描述

Various courses are available atNetease Cloud, 51CTOSearch for Yutao Wang to view.
在这里插入图片描述
在这里插入图片描述