Machine Learning Chapter 1 Introduction to Big Data Analytics and Machine Learning

oldestdata analysisIntroduction to Machine Learning

- 1.1 Overview of Big Data Analytics and Machine Learning
- - 1.1.1 Application areas of big data analytics and machine learning
  - 1.1.2 Basic concepts of machine learning
  - 1.1.3 Python's role in data science
- 1.2 Python environment deployment
- - 1.2.1 Python Installation
  - 1.2.2 Pycharm Installation
  - 1.2.3 Jupyter Notebook usage
- 1.3 Summary of Python Basics
- 1.4 Course-related resources

This chapter will first introduce the principles and application areas of Big Data analytics,machine learningand the basic concepts of Python in thedata scienceWe will then explain how to install Python and use the relevant code editors, and finally we will mention how to quickly master the basics of Python.

source code (computing)
/docs/y6cCpQWqXCWvvyy8/ The Python Source Code and Related Benefits (Machine Learning)

1.1 Overview of Big Data Analytics and Machine Learning

When it comes to big data analysis and machine learning, some readers may feel unfamiliar, however, when it comes to AlphaGo, the intelligent robot that has defeated the world's top Go players, I think we all have heard of it.The principle behind AlphaGo is big data analysis, through the machine's non-stop training and learning, and after accumulating a huge amount of data, AlphaGo gradually mastered a large number of Go skills and defeated the top Go players with its high-speed computing power. AlphaGo gradually mastered a large number of Go skills and defeated top Go players with its high-speed computing power. Machine learning is the simulation or implementation of human learning behavior, in order to explore the laws behind the big data, machine learning can be said to some extent is the core of artificial intelligence.

1.1.1 Application areas of big data analytics and machine learning

In addition to the field of Go, big data analytics also has a lot of application space in many other fields. In the information age, we are exposed to huge amount of data every day, and there is a great limitation to find the law in this huge amount of data by human power, while big data analysis by means of machine learning can analyze the data and refine the law in an efficient and fast way.
We briefly introduce the application of big data analytics and machine learning in 8 major fields through a table below, most of which we will explain with examples in the following chapters.

在这里插入图片描述
The above case is just as a demonstration, in the actual application of big data analytics and machine learning there are many application scenarios. Although the application scenarios of different industries are different, but the principle is common, and so after learning the chapter, I believe you will have a clearer perception of these cases.

1.1.2 Basic concepts of machine learning

Machine learning is a powerful tool for big data analytics. Machine learning is divided into two main categories: supervised learning and unsupervised learning, and the difference between the two lies in the presence or absence of target variables, or predictor variables, in the training data.
Let's use two diagrams to explain the difference between the two, where supervised learning is shown in the figure below, with three feature variables (body size, hair, and characteristics) and one target variable (breed) in the training data, the purpose of this machine learning is to build a model based on the training data to predict the breed of the dog.
在这里插入图片描述
Unsupervised learning is shown in the figure below, the main difference between it and supervised learning is that: it only has feature variables in the training data, but not the target variable (breed), so it is not the purpose of machine learning to predict the breeds, as an example of the clustering model in Chapter 13, it can be based on these characteristics of the training data of the dog for categorization, such as class A dog, class B dog, class C dog, then for a new sample can then be based on its features to determine which classification it belongs to.
在这里插入图片描述
Subdivided further, supervised learning is mainly divided into regression and classification problems.

And unsupervised learning is mainly divided into data clustering and clustering (Clustering) and data dimensionality reduction (Dimension Reduction):

From the perspective of machine learning models can be categorized into different algorithmic models in the following table, these different models will be explained in detail in the subsequent chapters, and each chapter will consolidate the learning of the model through the specific real-world cases mentioned in the previous subsection, so that you know the principles of the model and its real-world applications.
在这里插入图片描述

1.1.3 Python's role in data science

There are many tools used to do data analysis, such as the classicMatlabWith R language, as well as the current very hot Python. the reason why Python can now become the main tool for big data analysis, one of the main reasons is that there are a lot of other people have been written in Python data analysis and machine learning toolkit (academically called "library"), such as numpy library. pandas library, Scikit-learn library (referred to as sklearn library) and other toolkits, these libraries encapsulate a lot of other people have already written good data analysis and machine learning tools (academic called "library"), pandas library, Scikit-learn library (referred to as sklearn library) and other toolkits, these libraries encapsulate a lot of other people have already written a good algorithmic model, we just need to take it directly to call. It is these toolkits to facilitate the analysis of data without the need to focus on the mathematical expression of the programming construction.
After understanding the power of Python, the following will explain how to install Python and the relatedcode editorThe method of use.

1.2 Python environmentdeployments

This section explains how to install Python and the compilerPycharm, and will introduce the use of the compiler Jupyter notebook. As the focus of this book is on big data analysis and machine learning, so this part of the content only explains the core points, more content can refer to the author's first book "Python financial bigdata miningand analysis of the whole process in detail", in the book's supporting video and PDF textbook for this part of the basic content also has a very detailed explanation.

1.2.1 Python Installation

It is recommended to utilize theAnaconda Anaconda is a distribution of Python. Installing Anaconda is equivalent to installing Python, and it also integrates a lot of third-party libraries for scientific computing in Python, through which we can easily conduct research related to data analysis and machine learning.
Anaconda's official website download address /download/ , or direct web search Anaconda, enter the official website, choose to download, we choose Python3.7 version, it is the default 64-bit computer.

在这里插入图片描述
After the program is downloaded, it is recommended not to change the default installation path (to prevent possible installation problems), and then click on the installation can be, there is a very important point of attention: the installation of this step in the figure below, when theBe sure to check the first box.Because of this for beginners, it's the equivalent of automatically configuring the environment variables, otherwise you'd have to go through the trouble of configuring them manually.
在这里插入图片描述
Then keep clicking Next, there is a step in the middle "Install Microsoft VSCode session" choose skip (skip) can be.
Other always choose Next can be, and finally click Finish, that Python installation is complete.
After you install Anaconda, it has already installed some good compilers for you (compiler is the software to knock in the code), such as Spyder, Jupyter Notebook, the following will introduce a personal favorite compiler Pycharm.

1.2.2 Pycharm installation

PyCharm is also a compiler for Python, and as you can see below, its interface has a strong technological style.

在这里插入图片描述
Go to the official website/pycharm/download/#section=windows Downloading the PyCharm installer, we chose the free version (Community) which is perfectly adequate.

After downloading, double-click on the installation, the installation process, always choose Next and Install can be, which this page to choose the following two can be.
在这里插入图片描述
After that, keep clicking Next until the last Finish appears and then click Finish. What is not clear in the process of using the book can watch the accompanying tutorials and videos, basically covers most of the possible situations.
For the first step after pressing Finish: check the "Do not import settings" box.
在这里插入图片描述
Step 2: Choose the page style, we recommend choosing the default black style.
Step 3: Select the assistive tool, you can choose to skip it directly.
Step 4: Click "Create New Project" to create a Python file.
Step 5: The file is named.Always remember to tap Project Interpreter and check Existing interpreter in this step.
在这里插入图片描述
Then click on the far rightin the pop-up page: Select theSystem Interpreter, You can see that Interpreter has changed to Anaconda3\, select OK.

Once you are back on the project creation page, click Create to create a new Python Project.
Step 6: Close the official tip prompt and wait for the Index at the bottom to finish buffering, the process of it buffering is actually configuring your Python runtime environment.Index buffering took longer when I first ran Pycharm, it got better later.
Step 7: Wait until Index has been buffered and proceed to the next step:Creating Python files, as shown below, click on the project folder you created earlier, then right-click, click New, and select Python File.

在这里插入图片描述
Name the new Python file hello world.
Step 8: Type print('hello world') in English mode, where there is no difference between single and double quotes.

print('hello world')

At this point one has to wait for the previously mentioned Index buffer to finish, and we're in the 在这里插入图片描述
orInside the code entry boxRight-click and select Run 'hello world' to run the program successfully and output hello world at the bottom. Note that if the index is not cached, you may not have Run 'hello world' when you right-click, because your runtime environment has not been configured yet.
在这里插入图片描述
After that you can also click on the top right corner of the interface by clicking on
Green Run button to run the program, or hold down the shortcut keyShift + F10You can also run the program. However, I personally recommend running the program by right-clicking on the file and selecting Run 'Python filename', which is less error-prone for beginners.
Here's another look at Pycharm'sFont Size SettingTo make an introduction, everyone click on File and select Settings in the image below.
在这里插入图片描述

Select Editor in Settings, select Font, as shown in the figure below, you can adjust the size of the display font and line spacing in Size on the right.
在这里插入图片描述
Frequently asked questions about Pycharm usage:
Q1: Why do I have to wait a long time to open it for the first time before I can proceed to the next step?
A1: The first time you open a small amount of time to wait for the buffer, especially the first time to install, when waiting for the bottom of the buffer tips Indexing to finish, and then the following operation will not be a problem.
在这里插入图片描述
Q2: Why does Pycharm remind me that I don't have an Interpreter (runtime environment) when I open it?

A2：This is because when you reopen Pycharm, it creates a new project by default, and this Python file belongs to this project, and the default runtime environment of Pycharm is empty, so if this project does not have a runtime environment, there is no way to run the Python file, so this time you need to configure the runtime environment. So if the project does not have a runtime environment, the Python file will not be able to run.
The solution is as follows: you can click on "Configure Python interpreter" in the picture above to modify the runtime environment of individual project files, or you can directly change Pycharm's default runtime environment settings: click on File->Default settings (in some Pycharm versions, this is called Other Settings -> Settings for New Projects). Click on Default settings (in some Pycharm versions called Other Settings -> Settings for New Projects).
在这里插入图片描述
Select Project Interpreter - select the interpreter you have installed (specific method: select the gear-shaped settings button on the right, select Anaconda under the runtime environment), and then click Apply on the lower right, and then click OK to exit. The default interpreter will be associated with it, as shown in the following figure:
在这里插入图片描述

1.2.3 Jupyter Notebook usage

Jupyter Notebook is a very good code editing software that comes with Anaconda, and its strengths are:
1, can be very convenient to run the code in chunks;
2, the results of the run can be automatically saved, do not need to repeat the code after the run;
3, can be directly in this in a single module to print data to view, very convenient code debugging, so it is very helpful in machine learning this often deal with data in the process.
4. Because Anaconda comes with the editor, so there is no need to configure the environment;
5. Compared to Pycharm, Jupyter Notebook's open speed is very fast, but its automatic error checking and interface beauty is slightly weaker than Pycharm.
I commonly use Jupyter Notebook for machine learning code debugging and organizing, and finally run the complete project in Pycharm. The following explains the use of Jupyter Notebook skills:
1. Open and view Jupyter Notebook
The first contact with Jupyter Notebook, will feel that the way it opens compared to Pycharm directly click on the Python file can be opened will seem a little cumbersome, but its opening speed is very fast, familiar with it will be able to easily use. Here first to explain how to open and view Jupyter Notebook.
(1) Open files in the C drive environment
Jupyter Notebook opens as follows: Open Anaconda in the lower left corner of your computer and click Jupyter Notebook.
在这里插入图片描述
At this point in the default browser to open Jupyter Notebook, the browser is just a tool carrier, so you do not need to use the Internet to use, as shown in the figure below is the initial interface, you can see that at this time are some of the folders in the C disk, we can create a Python file in any of these folders (how to create will be told in the next step).
在这里插入图片描述
In addition, in addition to the pop-up browser interface, in fact, it will also pop up the Jupyter Notebook management window, this management window under normal circumstances the user does not need, but you can not close it, once closed, the browser in the Jupyter Notebook will show the connection is disconnected. In addition, if the browser does not automatically pop-up interface related to Jupyter Notebook, you can also copy the following figure in the red box in the line of the link to the browser search bar can be.
在这里插入图片描述
(2) Open a file on any disk
The above open is the relevant files in the C disk, if the Jupyter Notebook code is stored in other disks how to open it? As shown in the figure below, there are some Jupyter Notebook format code files in the "Machine Learning Demo" folder in the E disk (the file with .ipynb suffix is the Python file in Jupyter Notebook format), how to open it?
在这里插入图片描述
One way to do this is to copy the code to a folder on your desktop and open it via the method above: open in a C drive environment.
Another method is much more convenient: as shown in the following figure: in the path box of the folder, type "cmd", and then press Enter, as shown in the following figure.
在这里插入图片描述
Then in the pop-up interface, type "jupyter notebook", and then press Enter to enter, as shown below:

Or Shift + Right click in the folder and choose Open HerePowershellwindow, you can also access the above page.

You can then see the following in your default browser, and click on the relevant Python file to open it and view it. 在这里插入图片描述
For example, if we open the second of these files, the interface effect is shown below:

In addition, since Jupyter Notebook is opened through a browser, you can adjust the interface size by using Ctrl + Mouse Wheel keys if you feel that the interface has a small font.

2. Create Python file
As shown in the picture below, in the upper right corner of the New button, select Python3, you can create Python files, if you need to create a new folder, you guys choose one of the Folder can be.
在这里插入图片描述
For example, select "Python3" to create the following interface, click Untitled above to rename the file.

As demonstrated earlier, Python files in the Jupyter Notebook format have a .ipynb suffix, whereas regular Python files have a .py suffix, so in Jupyter Notebook we create and open files with a .ipynb suffix.

3. Writing code
As shown in the figure below, you can write the code in the block, after writing, hold down the Ctrl + Enter key to run the current block, or press the Run button in the menu bar above to run the code, in the preparation of the code in the block border is displayed in green.
在这里插入图片描述
As mentioned earlier, one of the benefits of Jupyter Notebook is that it can be run in blocks, so how to add a new code block? As shown in the figure below, we can click on the upper-left corner of the "+" button, you can add a new block under the current code block, the second method is that you can click on the left side of the current code block (at this time, the left side of the code block border will become blue), and then through the shortcut key "b The second way is to click on the left side of the current code block (at this time the left side of the code block will turn blue), and then use the shortcut "b" to add a new code block down (the shortcut "a" is to add a new block above the code block).
在这里插入图片描述
Another benefit of Jupyter Notebook is that in the case of variables, it does not require the input of the print() function, but also quickly prints the contents for easy viewing by the programmer, as shown in the following figure.

For some types of data, such as the DataFrame table type data that will be discussed in the next chapter, it is better to print directly from the variable name than to use the print() function to print the presentation.

4. Introduction to the menu bar
Here again to introduce the menu bar, usually, we will not often use the menu bar, but some of the functions still need to pay attention to, we will focus on some of the Cell and Kernel later some of the features, the following figure for the menu bar:
在这里插入图片描述
The File button can be used to open and store files, and the Download As in the File button can save a Python file created by Jupyter Notebook with a .ipynb extension as a regular Python file with a .py extension.
The Edit button contains some editing blocks, such as cutting, copying, deleting blocks, etc. Some of these functions can also be realized by the shortcuts in the figure below, when hovering the mouse over the shortcut button, you can see the explanation.
The Insert button allows you to insert a block, which is usually done with the shortcuts described below; the Cell button allows you to choose to run the current block, before or after running the current block, and so on; the Kernel button allows you to interrupt or restart the program; and the Keyboard Shortcuts in the Help button allows you to view the shortcuts.
Some of the more meaningful functions in the Cell menu are shown below:
在这里插入图片描述
Some of the more meaningful features of the Kernel menu are shown below:

The reason why I want to emphasize the "Restart" option in the "Kernel" menu is that there are times when Jupyter Notebook is running and the program keeps getting stuck because of some problem (e.g. Code into a dead loop), then through the terminate button, or through the above picture of the "Interrupt" (Interrupt System) option can not terminate the program, but through the Restart (Restart System) can be very quickly terminate the program.

5. Introduction of shortcut buttons
In addition to using the menu bar, Jupyter Notebook has nice shortcut buttons as shown below:
在这里插入图片描述
The sequential roles are: save; insert code block below; cut code block, copy code block, paste below; move selected code block up, move selected code block up; run the current code block, interrupt the system (if you can't interrupt it, we recommend restarting the system), reboot the system (that's Restart in the Kernel mentioned above), reboot and run all the code; the code as well as the title box ; Open Command Configuration.
Here is a separate explanation of the "Code and Heading Box" button, it can set the block as Code, Heading or Markdown (similar to notes or comments, Markdown is a specialized note-taking language, you can search for more Markdown tips), as shown in the figure below. You can search by yourself), as shown in the figure below, through which we can set the title and logo in the code, easy to read the code. Note that you have to press Ctrl+Enter to run the block after setting to complete the setup.
在这里插入图片描述
In addition, the shortcut key "m" can be used to quickly switch from code format to logo format, and the shortcut key "y" can be used to switch to code format.

6. Commonly used shortcuts
In practice, more will use shortcuts to operate, Jupyter Notebook commonly used shortcuts are shown below:
在这里插入图片描述
Note that shortcut methods such as a, b, and pressing d twice in a row need to be selected for the block to take effect, and when the block is selected, its border color is blue, as shown in the following figure:

In addition, Jupyter Notebook does not display the line number of code by default, if you need to display the line number of code, you can use the shortcut key Shift + L to display the line number in the programming interface, display line number effect is shown in the following figure:
在这里插入图片描述

1.3 Summary of Python Basics

As the focus of this book is on big data analysis and machine learning, the basics of Python is not complex, in the author's first book, "Python financial big data mining and analysis of the whole process of detail" has a detailed explanation, here no longer repeat. For zero-based readers, this book also provides relevant supporting video and PDF teaching materials to explain this part of the content, the first contact with Python readers are recommended to watch the content of the Python basics and then proceed to the next stage of learning, the following table is a summary of the supporting video of the basics of Python.
在这里插入图片描述
Add link description

1.4 Course-related resources

How to get the author: micro-signal to get

Add the following wechat: huaxz001 .

The author's website:

Yutao Wang related courses can be passed:
Jingdong Link:[/Search?keyword=Wang Yutao], search for "Wang Yutao", inTaobao, DangdangAlso available for purchase. To join the learning exchange group, you can add the following WeChat: huaxz001 (please specify the reason).
在这里插入图片描述

Various courses are available atNetease Cloud, 51CTOSearch for Yutao Wang to view.
在这里插入图片描述