Heterogeneous Graph Condensation

Graph neural networks greatly facilitate data processing in homogeneous and heterogeneous graphs. However, training GNNs on large-scale graphs poses a significant challenge to computing resources. It is especially prominent on heterogeneous graphs, which contain multiple types of nodes and edges, an...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering Vol. 36; no. 7; pp. 3126 - 3138
Main Authors: Gao, Jian, Wu, Jianshe, Ding, Jingyi
Format: Journal Article
Language:English
Published: New York IEEE 01-07-2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Graph neural networks greatly facilitate data processing in homogeneous and heterogeneous graphs. However, training GNNs on large-scale graphs poses a significant challenge to computing resources. It is especially prominent on heterogeneous graphs, which contain multiple types of nodes and edges, and heterogeneous GNNs are also several times more complex than the ordinary GNNs. Recently, Graph condensation (GCond) is proposed to address the challenge by condensing large-scale homogeneous graphs into small-scale informative graphs. Its label-based feature initialization and fully-connected design perform well on homogeneous graphs. While in heterogeneous graphs, label information generally only exists in specific types of nodes, making it difficult to be applied directly to heterogeneous graphs. In this article, we propose heterogeneous graph condensation (HGCond). HGCond uses clustering information instead of label information for feature initialization, and constructs a sparse connection scheme accordingly. In addition, we found that the simple parameter exploration strategy in GCond leads to insufficient optimization on heterogeneous graphs. This article proposes an exploration strategy based on orthogonal parameter sequences to address the problem. We experimentally demonstrate that the novel feature initialization and parameter exploration strategy is effective. Experiments show that HGCond significantly outperforms baselines on multiple datasets. On the dataset DBLP, HGCond can condense DBLP to 0.5% of its original scale to obtain DBLP-0.005. GNNs trained on DBLP-0.005 can retain nearly 99% accuracy compared to the GNNs trained on full-scale DBLP.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2024.3362863