Predicting Information Pathways Across Online Communities

Yiqiao Jin
Yeon-Chang Lee
Kartik Sharma
Meng Ye
Karan Sikka
Ajay Divakaran
Srijan Kumar
Georgia Institute of Technology, SRI International

⭐ Accepted as a research paper at KDD 2023 main conference



The problem of community-level information pathway prediction (CLIPP) aims at predicting the transmission trajectory of content across online communities. A successful solution to CLIPP holds significance as it facilitates the distribution of valuable information to a larger audience and prevents the proliferation of misinformation. Notably, solving CLIPP is non-trivial as inter-community relationships and influence are unknown, information spread is multi-modal, and new content and new communities appear over time. In this work, we address CLIPP by collecting large-scale, multi-modal datasets to examine the diffusion of online YouTube videos on Reddit. We analyze these datasets to construct community influence graphs (CIGs) and develop a novel dynamic graph framework, INPAC (Information Pathway Across Online Communities), which incorporates CIGs to capture the temporal variability and multi-modal nature of video propagation across communities. Experimental results in both warm-start and cold-start scenarios show that INPAC outperforms seven baselines in CLIPP.


Code and Dataset

Code: We make the code for INPAC available at

Datasets: We constructed two real-world, large-scale datasets covering 54 months of historical Reddit posts from January 2018 to June 2022
Large Small
#Videos 183,596 6,802
#Subreddits 57,894 7,319
#Users 291,047 8,752
#Shares 1,323,714 36,118
Density 7.96E-05 6.11E-04
#Cold-Start Videos 3,042,068 68,095

Sample data

url netloc post_id timestamp subreddit author v eiazyl 1577836805 virtualreality Zweetprot tmmpaOZ3nQg eib0a6 1577836845 FTMMen 00110100-00110010 LuAyGWqYza4 eib0a6 1577836845 FTMMen 00110100-00110010 d4hJA7IUaDs eib0a6 1577836845 FTMMen 00110100-00110010 5U_2V6yr-Nw eib0em 1577836862 SteamVR Zweetprot tmmpaOZ3nQg eib0h6 1577836869 SmallYTChannel thevinamazing mumHdNhclrM eib0nk 1577836892 VRGaming Zweetprot tmmpaOZ3nQg eib0se 1577836909 ripplers daNext1 uxtqIvOP0rQ eib0ur 1577836917 HTC_Vive Zweetprot tmmpaOZ3nQg eib0wn 1577836926 HelpMeFind Sanojoj HE1Vy5lKuzw

title=Predicting Information Pathways Across Online Communities},
author={Jin, Yiqiao and Lee, Yeon-Chang and Sharma, Kartik and Ye, Meng and Sikka, Karan and Divakaran, Ajay and Kumar, Srijan},


This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.