gossip automated trading system "Develop yourself, open source later"

Article Directory

gossip automated trading system
requirements
data
- Download Qlib Data
- - Download CN Data
  - Downlaod US Data
  - Download CN Simple Data
  - Help
- Using in Qlib
- - US data
  - CN data
highfreq
- High-Frequency Dataset
- - Get High-Frequency Data
  - Dump & Reload & Reinitialize the Dataset
  - - About Reinitialization
    - Run the Code
StockSelection
- Introduction
- ENVS
strategy and forecast
workflow

gossip automationTrading system

It mainly uses qlib and modules written by itself to realize data acquisition, stock selection, strategy analysis, model training, model prediction, backtesting and other operations, and realizes a financial quantitative trading system.
Currently supported algorithms include
- GBDT based on XGBoost (Tianqi Chen, et al. 2016)
- GBDT based on LightGBM (Guolin Ke, et al. 2017)
- GBDT based on Catboost (Liudmila Prokhorenkova, et al. 2017)
- MLP based on pytorch
- LSTM based on pytorch (Sepp Hochreiter, et al. 1997)
- GRU based on pytorch (Kyunghyun Cho, et al. 2014)
- ALSTM based on pytorch (Yao Qin, et al. 2017)
- GATs based on pytorch (Petar Velickovic, et al. 2017)
- SFM based on pytorch (Liheng Zhang, et al. 2017)
- TFT based on tensorflow (Bryan Lim, et al. 2019)
- TabNet based on pytorch (Sercan O. Arik, et al. 2019)
- DoubleEnsemble based on LightGBM (Chuheng Zhang, et al. 2020)

requirements

qlib
logure
fire
requests
pandas
lxml
numpy
tqdm
yahooquery

data

Download Qlib Data

Download CN Data

python get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

Downlaod US Data

python get_data.py qlib_data --target_dir ~/.qlib/qlib_data/us_data --region us

Download CN Simple Data

python get_data.py qlib_data --name qlib_data_simple --target_dir ~/.qlib/qlib_data/cn_data --region cn

Help

python get_data.py qlib_data --help

Using in Qlib

For more information: /en/latest/start/

US data

Need to download data first: Download US Data

import qlib
from qlib.config import REG_US
provider_uri = "~/.qlib/qlib_data/us_data"  # target_dir
qlib.init(provider_uri=provider_uri, region=REG_US)

CN data

Need to download data first: Download CN Data

import qlib
from qlib.config import REG_CN
provider_uri = "~/.qlib/qlib_data/cn_data"  # target_dir
qlib.init(provider_uri=provider_uri, region=REG_CN)

highfreq

High-Frequency Dataset

This dataset is an example for RL high frequency trading.

Get High-Frequency Data

Get high-frequency data by running the following command:

    python  get_data

Dump & Reload & Reinitialize the Dataset

The High-Frequency Dataset is implemented as in the . DatatsetH is the subclass of , whose state can be dumped in or loaded from disk in pickle format.

About Reinitialization

After reloading Dataset from disk, Qlib also support reinitializing the dataset. It means that users can reset some states of Dataset or DataHandler such as instruments, start_time, end_time and segments, etc., and generate new data according to the states.

The example is given in , users can run the code as follows.

Run the Code

Run the example by running the following command:

    python  dump_and_load_dataset

StockSelection

Introduction

====
This project demonstrates how to apply machine learning algorithms to distinguish “good” stocks from the “bad” stocks. To this end, we construct 244 technical and fundamental features to characterize each stock, and label stocks according to their ranking with respect to the return-volatility ratio. Algorithms ranging from traditional statistical learning methods to recently popular deep learning method, . Logistic Regression (LR), Random Forest (RF), Deep Neural Network (DNN), and Stacking Ensemble model, are trained to solve the classification task. Genetic Algorithm is also used to implement features selection. The effectiveness of the stock selection strategy is validated in Chinese stock market from both statistical and practical aspects, showing that:

Stacking outperforms other models reaching an AUC score of 0.972;
Genetic Algorithm picks a subset of 114 features and the prediction performances of all models remain almost unchanged after the selection procedure, which suggests some features are indeed redundant;
LR and DNN are radical models; RF is risk-neutral model; Stacking is somewhere between DNN and RF.

ENVS

python = 3.5

numpy
pandas
matplotlib
math
os
sklearn
tensorflow
keras