Article Directory
- gossip automated trading system
- requirements
- data
- Download Qlib Data
- Download CN Data
- Downlaod US Data
- Download CN Simple Data
- Help
- Using in Qlib
- US data
- CN data
- highfreq
- High-Frequency Dataset
- Get High-Frequency Data
- Dump & Reload & Reinitialize the Dataset
- About Reinitialization
- Run the Code
- StockSelection
- Introduction
- ENVS
- strategy and forecast
- workflow
gossip automationTrading system
- It mainly uses qlib and modules written by itself to realize data acquisition, stock selection, strategy analysis, model training, model prediction, backtesting and other operations, and realizes a financial quantitative trading system.
- Currently supported algorithms include
- GBDT based on XGBoost (Tianqi Chen, et al. 2016)
- GBDT based on LightGBM (Guolin Ke, et al. 2017)
- GBDT based on Catboost (Liudmila Prokhorenkova, et al. 2017)
- MLP based on pytorch
- LSTM based on pytorch (Sepp Hochreiter, et al. 1997)
- GRU based on pytorch (Kyunghyun Cho, et al. 2014)
- ALSTM based on pytorch (Yao Qin, et al. 2017)
- GATs based on pytorch (Petar Velickovic, et al. 2017)
- SFM based on pytorch (Liheng Zhang, et al. 2017)
- TFT based on tensorflow (Bryan Lim, et al. 2019)
- TabNet based on pytorch (Sercan O. Arik, et al. 2019)
- DoubleEnsemble based on LightGBM (Chuheng Zhang, et al. 2020)
requirements
qlib
logure
fire
requests
pandas
lxml
numpy
tqdm
yahooquery
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
data
Download Qlib Data
Download CN Data
python get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
- 1
Downlaod US Data
python get_data.py qlib_data --target_dir ~/.qlib/qlib_data/us_data --region us
- 1
Download CN Simple Data
python get_data.py qlib_data --name qlib_data_simple --target_dir ~/.qlib/qlib_data/cn_data --region cn
- 1
Help
python get_data.py qlib_data --help
- 1
Using in Qlib
For more information: /en/latest/start/
US data
Need to download data first: Download US Data
import qlib
from qlib.config import REG_US
provider_uri = "~/.qlib/qlib_data/us_data" # target_dir
qlib.init(provider_uri=provider_uri, region=REG_US)
- 1
- 2
- 3
- 4
CN data
Need to download data first: Download CN Data
import qlib
from qlib.config import REG_CN
provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir
qlib.init(provider_uri=provider_uri, region=REG_CN)
- 1
- 2
- 3
- 4
highfreq
High-Frequency Dataset
This dataset is an example for RL high frequency trading.
Get High-Frequency Data
Get high-frequency data by running the following command:
python get_data
- 1
Dump & Reload & Reinitialize the Dataset
The High-Frequency Dataset is implemented as in the
.
DatatsetH
is the subclass of , whose state can be dumped in or loaded from disk in pickle
format.
About Reinitialization
After reloading Dataset
from disk, Qlib
also support reinitializing the dataset. It means that users can reset some states of Dataset
or DataHandler
such as instruments
, start_time
, end_time
and segments
, etc., and generate new data according to the states.
The example is given in , users can run the code as follows.
Run the Code
Run the example by running the following command:
python dump_and_load_dataset
- 1
StockSelection
Introduction
====
This project demonstrates how to apply machine learning algorithms to distinguish “good” stocks from the “bad” stocks. To this end, we construct 244 technical and fundamental features to characterize each stock, and label stocks according to their ranking with respect to the return-volatility ratio. Algorithms ranging from traditional statistical learning methods to recently popular deep learning method, . Logistic Regression (LR), Random Forest (RF), Deep Neural Network (DNN), and Stacking Ensemble model, are trained to solve the classification task. Genetic Algorithm is also used to implement features selection. The effectiveness of the stock selection strategy is validated in Chinese stock market from both statistical and practical aspects, showing that:
- Stacking outperforms other models reaching an AUC score of 0.972;
- Genetic Algorithm picks a subset of 114 features and the prediction performances of all models remain almost unchanged after the selection procedure, which suggests some features are indeed redundant;
- LR and DNN are radical models; RF is risk-neutral model; Stacking is somewhere between DNN and RF.
ENVS
- python = 3.5
numpy
pandas
matplotlib
math
os
sklearn
tensorflow
keras
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8