An overview of the similarities and differences between multi-processing and multi-threading in Python and Java

Recently learning Pythonmulti-threadedIn a related section, came across the phrase: "For any Python program, no matter how many processors there are, there is always only one thread executing at any given time", i.e., multithreading in Python is "Fake multithreading", what is the reason for such a statement, collected and accessed some information, and compared to Java, the following is some personal understanding of this issue:

interpreted language

Compiling language such as c language: after developing a program in c language, you need to compile the program into machine language (i.e., computer-recognized binary files, because different operating systems computer-recognized binary files are different) through a compiler, so the c language program to be ported to be recompiled. (e.g.windows(compiles to ext file, linux compiles to erp file).

Java and Python are both interpreted languages, and relatively speaking, programs written in interpreted languages are not pre-compiled and store program code in text. When the program is released, it looks like the compilation process is eliminated. However, when running the program, theInterpreted languages must be interpreted before they can be run. For example, the java language, java program is first compiled into a class file by a compiler and then interpreted by the java virtual machine (VM) on the windows platform if running on the windows platform. If run onlinuxplatform, then through the linux platform java virtual machine for interpretation and execution. So it can be cross-platform, provided that the platform must have a matching java virtual machine. If there is no java virtual machine, it can not be cross-platform.

The former is used in the development of operating systems, large-scale applications, database systems, etc., such as C/C++, Pascal/Object Pascal, etc., due to the high speed of program execution and, all things being equal, lower system requirements.Delphi), etc. are compiled languages, while some web scripts, server scripts and auxiliary development interfaces such as the speed requirements are not high, the compatibility between different system platforms have certain requirements of the program is usually used interpreted languages, such as Java, JavaScript, VBScript, Perl, Python, Ruby, MATLAB and so on.

Reasons why Python is not really multithreaded

To take advantage of multi-core systems, Python must support multi-threaded operation. As an interpreted language, Python's interpreter must be both safe and efficient. We all know the problems that can be encountered with multi-threaded programming. The interpreter has to keep an eye out to avoid manipulating internally shared data in different threads, and it also has to make sure that there is always a maximization of computational resources when managing user threads. So what is the mechanism for protecting data when accessed by different threads at the same time? The answer is the interpreter global lock. The name tells us a lot, obviously it's a global (from the interpreter's point of view) lock added to the interpreter (for an explanation you can refer to a previous article)/megustas_jjc/article/details/79110284). But this also leads to the problem mentioned above:For any Python program, no matter how many processors there are, there is always only one thread executing at any given time (this is where Python differs from Java, which for multi-core cases can have multiple threads open for processing at the same time, see the following example)。

"Why does my brand-new multi-threaded Python program run slower than when it had only one thread? "Many people are still very confused when they ask this question, because obviously a program with two threads is faster than when it has only one (assuming the program is indeed parallelizable). In fact, this question is asked so often that Python experts have crafted a standard answer: "Don't use multithreading, use multiprocessing."

So, for computationally intensive, I would recommend not using python's multi-threading but using a multi-process approach, while for IO-intensive, I would recommend using a multi-process approach, because if you have a problem using a multi-threaded approach, you won't even know where the problem lies (personal understanding, for computationally and IO-intensive you can focus on a previous article)./megustas_jjc/article/details/79110063）。

Let's talk about "multiprocessing" first.

Java

Programs written in Java run in the Java Virtual Machine (JVM), and every time a java application is started with a java command, a JVM process is started. In the same JVM process, there is and only one process, which is itself. In this JVM environment, all the program code runs in threads. the JVM finds the entry point main() of the program program and runs the main() method, which creates a thread, which is called the main thread. When the main method finishes, the main thread is finished, and the JVM process exits.

Python

But I heard that Python's multithreading doesn't actually utilize multiple cores, so if you use multithreading you're still actually doing concurrent processing on one core. However, if you use multi-processing you can actually utilize multiple cores because the processes are independent of each other and do not share resources, and you can execute different processes on different cores to achieve parallelism.

Python multiprocessing library multiprocessing

Process-based multiprocessing

The multiprocessing module provides the process class to implement a new process. The following code creates a new child process.

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('bob',)) # Create a new child process p, the target function is f, and args is the list of arguments to the function f
    () # Commencement of the implementation process
    () # Wait for the child process to finish

In the above code, () means wait for the end of the child process before executing the subsequent operations, which is generally used for inter-process communication. For example, if there is a read process pw and a write process pr, you need to write () before calling pw, which means to wait for the end of the write process before starting the execution of the read process.

Multi-process based on process pool Pool

To create multiple child processes at the same time you can use the class. The class creates a pool of processes and then executes those processes on multiple cores.

import multiprocessing
import time

def func(msg):
    print multiprocessing.current_process().name + '-' + msg

if __name__ == "__main__":
    pool = (processes=4) # 4 processes created
    for i in xrange(10):
        msg = "hello %d" %(i)
        pool.apply_async(func, (msg, ))
    () # Close the process pool, indicating that no processes can be added to the pool
    () # Wait for all processes in the pool to finish executing, must be called after close().
    print "Sub-process(es) done."

Results:

Sub-process(es) done.
PoolWorker-34-hello 1
PoolWorker-33-hello 0
PoolWorker-35-hello 2
PoolWorker-36-hello 3
PoolWorker-34-hello 7
PoolWorker-33-hello 4
PoolWorker-35-hello 5
PoolWorker-36-hello 6
PoolWorker-33-hello 8
PoolWorker-36-hello 9

Pool can use the apply() function or the map function to process the data.

And then there's "multithreading."

Java

You can refer to the previous article:

Java multithreading implementation and control
Multi-threaded study notes (I) of the thread creation and thread state
Multi-threaded study notes (II) of the thread safety issues
Multi-threaded study notes (3) of the single instance pattern in the thread problem
Multi-threaded study notes (4) of the inter-thread communication - waiting for the wake-up mechanism
Multi-threaded study notes (V) of the multi-producer multi-consumer in the thread problem
Multi-threaded study notes (VI) of the lock object Lock
Multi-threaded study notes (VII) the difference between wait and sleep, thread stopping and daemon thread, etc.
Some understanding of the setting of the number of concurrent threads

Python

You can refer to the previous article:

Python Multithreading

Data sharing between processes and threads

#-*- coding:utf-8 -*-


The #multiprocessing module provides a Process class to represent a process object.
import multiprocessing
from multiprocessing import Process
import threading

def run(lock1,info_list,i):
    with lock1:
        info_list.append(i)
        print info_list

def run2(info_list,i):
    ()
    info_list.append(i)
    print info_list
    ()

info = []

#Multiprocess execution, memory is independent, each process has a separate copy of the
lock1 = ()
for i in range(10):
    p = Process(target=run,args=(lock1,info,i))
    ()

#Multi-threaded execution. Memory is shared.
lock2 = ()
for i in range(10):
    p = (target=run2,args=(info,i))
    ()

# So to communicate between processes, a bridge is needed

Run results:

[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[0]
[0, 1]
[0, 1, 2]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 5]
[0, 1, 2, 3, 4, 5, 6]
[0, 1, 2, 3, 4, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 6, 7, 8]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Due to the multi-process memory is not shared, so to carry out inter-process communication, you need a queue Queue for transit, from multiprocessing import Queue can be used, you can also use Value, Array to share memory, so that the process to share data, you can modify the operation, for example:

#-*- coding:utf-8 -*-
from multiprocessing import Process,Array,Value

def f(n,a):
     = 3.1415926
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == "__main__":
    num = Value("d",0.0)
    array = Array("i",range(10))
    print , array[:]
    p = Process(target=f,args=(num,array))
    ()
    ()
    print , array[:]

Results:

0.0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3.1415926 [0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

Values and arrays make it possible for child processes to modify num and array.