1 Star 0 Fork 0

yellow/IGMC

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

IGMC -- Inductive Graph-based Matrix Completion

alt text

Update

9/23/2021: Create a "latest" branch to enable running IGMC with latest PyG versions.

11/23/2020: Optimized the subgraph extraction speed further. On large datasets it shows up to 20 times speed-up.

8/27/2020: Significantly improved the subgraph extraction speed. With an 8-core machine, now it only takes 30 seconds and 10 minutes to extract subgraphs for ml_100k and ml_1m, respectively. Using --dynamic-dataset also shows about 50% speed-up.

About

IGMC is an inductive matrix completion model based on graph neural networks without using any side information. Traditional matrix factorization approaches factorize the (rating) matrix into the product of low-dimensional latent embeddings of rows (users) and columns (items), which are transductive since the learned embeddings cannot generalize to unseen rows/columns or to new matrices. To make matrix completion inductive, content (side information), such as user's age or movie's genre, has to be used previously. However, high-quality content is not always available, and can be hard to extract. Under the extreme setting where not any side information is available other than the matrix to complete, can we still learn an inductive matrix completion model? IGMC achieves this by training a graph neural network (GNN) based purely on local subgraphs around (user, item) pairs extracted from the bipartite graph formed by the rating matrix, and maps these subgraphs to their corresponding ratings. It does not rely on any global information specific to the rating matrix or the task, nor does it learn embeddings specific to the observed users/items. Thus, IGMC is a completely inductive model.

Since IGMC is inductive, it can generalize to users/items unseen during the training (given that their interactions exist), and can even transfer to new tasks. Our transfer learning experiments show that a model trained out of the MovieLens dataset can be directly used to predict Douban movie ratings and works surprisingly well. For more information, please check our paper:

M. Zhang and Y. Chen, Inductive Matrix Completion Based on Graph Neural Networks. [PDF]

Requirements

Stable version: Python 3.8.1 + PyTorch 1.4.0 + PyTorch_Geometric 1.4.2. If your PyG version is higher than this, please refer to #7.

If you use latest PyTorch/PyG versions, you may also refer to the latest branch.

Install PyTorch

Install PyTorch_Geometric

Other required python libraries: numpy, scipy, pandas, h5py, networkx, tqdm etc.

Usages

Flixster, Douban and YahooMusic

To train on Flixster, type:

python Main.py --data-name flixster --epochs 40 --testing --ensemble

The results will be saved in "results/flixster_testmode/". The processed enclosing subgraphs will be saved in "data/flixster/testmode/". Change flixster to douban or yahoo_music to do the same experiments on Douban and YahooMusic datasets, respectively. Delete --testing to evaluate on a validation set to do hyperparameter tuning.

MovieLens-100K and MovieLens-1M

To train on MovieLens-100K, type:

python Main.py --data-name ml_100k --save-appendix _mnph200 --data-appendix _mnph200 --epochs 80 --max-nodes-per-hop 200 --testing --ensemble --dynamic-train

where the --max-nodes-per-hop argument specifies the maximum number of neighbors to sample for each node during the enclosing subgraph extraction, whose purpose is to limit the subgraph size to accomodate large datasets. The --dynamic-train option makes the training enclosing subgraphs dynamically generated rather than generated in a preprocessing step and saved in disk, whose purpose is to reduce memory consumption. However, you may remove the option to generate a static dataset for future reuses. Append "--dynamic-test" to make the test dataset also dynamic. The default batch size is 50, if a batch cannot fit into your GPU memory, you can reduce batch size by appending "--batch-size 25" to the above command.

The results will be saved in "results/ml_100k_mnph200_testmode/". The processed enclosing subgraphs will be saved in "data/ml_100k_mnph200/testmode/" if you do not use dynamic datasets.

To train on MovieLens-1M, type:

python Main.py --data-name ml_1m --save-appendix _mnhp100 --data-appendix _mnph100 --max-nodes-per-hop 100 --testing --epochs 40 --save-interval 5 --adj-dropout 0 --lr-decay-step-size 20 --ensemble --dynamic-train

Sparse rating matrix

To repeat the sparsity experiment in the paper (sparsify MovieLens-1M' rating matrix to keep 20% ratings only), type the following:

python Main.py --data-name ml_1m --save-appendix _mnhp100_ratio02 --ratio 0.2 --data-appendix _mnph100 --max-nodes-per-hop 100 --testing --epochs 40 --save-interval 5 --adj-dropout 0 --lr-decay-step-size 20 --ensemble --dynamic-train

Modify --ratio 0.2 to change the sparsity ratios. Attach --ensemble and run again to get the ensemble test results.

Transfer learning

To repeat the transfer learning experiment in the paper (transfer the model trained previously on MovieLens-100K to Flixster, Douban, and YahooMusic), use the provided script by typing:

./run_transfer_exps.sh DATANAME

Replace DATANAME with flixster, douban and yahoo_music to transfer to each dataset. The results will be attached to each dataset's original "log.txt" file.

Visualization

After training a model on a dataset, to visualize the testing enclosing subgraphs with the highest and lowest predicted ratings, type the following (we use Flixster as an example):

python Main.py --data-name flixster --epochs 40 --testing --no-train --visualize

It will load "results/flixster_testmode/model_checkpoint40.pth" and save the visualization in "results/flixster_testmode/visualization_flixster_prediction.pdf".

Check "Main.py" and "train_eval.py" for more options to play with. Check "models.py" for the graph neural network used.

Reference

If you find the code useful, please cite our paper.

@inproceedings{
  Zhang2020Inductive,
  title={Inductive Matrix Completion Based on Graph Neural Networks},
  author={Muhan Zhang and Yixin Chen},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=ByxxgCEYDS}
}

Check out our another successful work of inductive link prediction.

Muhan Zhang, Washington University in St. Louis muhan@wustl.edu 10/13/2019

MIT License Copyright (c) 2019 Muhan Zhang Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

暂无描述 展开 收起
Python 等 2 种语言
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/honghahaha/IGMC.git
git@gitee.com:honghahaha/IGMC.git
honghahaha
IGMC
IGMC
master

搜索帮助