ray official website:Welcome to the Ray documentation — Ray 2.3.1
Some python modules call ray, which can be really annoying if you don't have a good installation and configuration, and there are differences between using ray on a server and on a windows system.
1. conda creationvirtualized environment
If the python path in the virtual environment after creation does not match the python path in the first call to the system (e.g., multiple virtual environments), the python path in the subsequent calls to thepip orray Always use absolute paths
-
# Create and activate a virtual environment ray
-
conda create -c conda-forge python=3.8 -n ray
-
conda activate ray
2. Installation of ray
In the process of installing ray, in addition to installing the most basic features of ray, you also need to install some dependency libraries, which are equivalent to some of the extensions of ray, you can refer to the specific onesRay Default ::
-
# There are others besides these five, but enough is enough #
-
pip install -U "ray[default]"
-
pip install -U "ray[air]"
-
pip install -U "ray[tune]"
-
pip install -U "ray[rllib]"
-
pip install -U "ray[serve]"
3. ray start
together withwindows Different.To run ray on a server, you must first create a ray cluster, after which the cluster is initialized (connected to) in python. If you just need to run commands on a separate server, and don't need to communicate between servers, or between a remote server and a local computer, then just run theThe simplest command below
-
# Create a file with30ray cluster for cpu
-
ray start --head --num-cpus=30
-
-
# Logs
-
Usage stats collection is enabled. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run .
-
-
Local node IP: 10.11.11.179
-
2023-04-16 01:49:35,305 ERROR :1169 -- Failed to start the dashboard
-
2023-04-16 01:49:35,306 ERROR :1194 -- Error should be written to '' or ''. We are printin.
-
2023-04-16 01:49:35,307 ERROR :1238 --
-
The last 20 lines of /tmp/ray/session_2023-04-16_01-48-54_707151_109593/logs/ (it contains the error message from
-
2023-04-16 01:49:32,933 INFO :239 -- Starting dashboard metrics server on port 44227
-
2023-04-16 01:49:32,941 INFO :112 -- Get all modules by type: DashboardHeadModule
-
-
-
--------------------
-
Ray runtime started.
-
--------------------
-
-
Next steps
-
To connect to this Ray runtime from another node, run
-
ray start --address='10.11.11.179:6379'
-
-
Alternatively, use the following Python code:
-
import ray
-
(address='auto')
-
-
To see the status of the cluster, use
-
ray status
-
-
If connection fails, check your firewall settings and network configuration.
-
-
To terminate the Ray runtime, run
-
ray stop
4. head nodes and worker nodes
If there is a need for servers to communicate with each other or between remote servers and local computers, then it is a bit more complicated and there are two concepts involved in configuring a ray cluster:Master and worker nodes
The head node is the central node of the Ray cluster, which is responsible for coordinating task execution and resource management, and for:
- Assigning tasks: head node assigns tasks to worker nodes so that they can execute them.
- Managing resources: the head node is responsible for managing the resources in the cluster, such as CPU, memory, and GPUs, to ensure that tasks are executed correctly and the appropriate resources are used.
- Tracking task status: the head node tracks the execution status of the task and returns the result to the caller when the task is completed.
The worker nodes are compute nodes in the Ray cluster that are responsible for executing tasks and returning the results to the head node:
- Receiving tasks: worker nodes receive tasks assigned to them from head nodes and execute the instructions for the tasks.
- Execute the task: the worker node executes the code for the task and returns the result to the head node.
- Release resources: worker node releases the used resources after completing the task so that other tasks can use them.
In a nutshell, the difference between head nodes and worker nodes lies in their responsibilities and behaviors. head nodes are the central nodes of the cluster and are responsible for coordinating and managing the execution of tasks, while worker nodes are the compute nodes of the cluster and are responsible for executing tasks. In a Ray cluster, head nodes and worker nodes communicate with each other to ensure that tasks are executed correctly and results are returned.
PS: Personally, I understand the relationship between parent and child processes or between base and virtual environments.
Specific Steps:
Specify the server IP address
ifconfig
Initialize to create a master node
-
ray start --head --port=6379 --num-cpus=<number_of_cpus> --redis-password=<password>
-
# port = 0Random port
-
# port=6379default port
Worker node connected to master node
ray start --address=<address_of_head_node>:<port_of_head_node> --num-cpus=<number_of_cpus>
5. Enabling commands in the ray cluster via python
-
import ray
-
# Initialize ray
-
ray.init(address='auto')
-
# Some specified functions
-
()
Note: In python, the end command must be followed by (), otherwise the next time you run (), you will get an error!
6. Other orders
-
ray dashboard
-
-
# Errors are reported due to the lack of a visualization window GUI on the server, but they do not affect use
-
'''
-
Usage: ray dashboard [OPTIONS] CLUSTER_CONFIG_FILE
-
Try 'ray dashboard --help' for help.
-
-
Error: Missing argument 'CLUSTER_CONFIG_FILE'.
-
'''
-
-
-
-
ray status
-
# Logs
-
'''
-
======== Autoscaler status: 2023-04-16 04:04:48.643492 ========
-
Node status
-
---------------------------------------------------------------
-
Healthy:
-
1 node_36f47b4427ed06ce849863e323f684649dce7aa5c1ad7d3be38416aa
-
Pending:
-
(no pending nodes)
-
Recent failures:
-
(no failures)
-
-
Resources
-
---------------------------------------------------------------
-
Usage:
-
0.0/30.0 CPU
-
0.00/595.632 GiB memory
-
0.00/186.265 GiB object_store_memory
-
-
Demands:
-
(no resource demands)
-
'''
-
-
# Shut down the ray cluster
-
ray stop
-