Predicting Information Pathways Across Online Communities

Yiqiao Jin
Yeon-Chang Lee
Kartik Sharma
Meng Ye
Karan Sikka
Ajay Divakaran
Srijan Kumar
Georgia Institute of Technology, SRI International



⭐ Accepted as a research paper at KDD 2023 main conference

[Paper]
[GitHub]
[Slides]

Abstract

The problem of community-level information pathway prediction (CLIPP) aims at predicting the transmission trajectory of content across online communities. A successful solution to CLIPP holds significance as it facilitates the distribution of valuable information to a larger audience and prevents the proliferation of misinformation. Notably, solving CLIPP is non-trivial as inter-community relationships and influence are unknown, information spread is multi-modal, and new content and new communities appear over time. In this work, we address CLIPP by collecting large-scale, multi-modal datasets to examine the diffusion of online YouTube videos on Reddit. We analyze these datasets to construct community influence graphs (CIGs) and develop a novel dynamic graph framework, INPAC (Information Pathway Across Online Communities), which incorporates CIGs to capture the temporal variability and multi-modal nature of video propagation across communities. Experimental results in both warm-start and cold-start scenarios show that INPAC outperforms seven baselines in CLIPP.


Talk

Code and Dataset

Code: We make the code for INPAC available at https://github.com/claws-lab/INPAC

Datasets: We constructed two real-world, large-scale datasets covering 54 months of historical Reddit posts from January 2018 to June 2022
Large Small
#Videos 183,596 6,802
#Subreddits 57,894 7,319
#Users 291,047 8,752
#Shares 1,323,714 36,118
Density 7.96E-05 6.11E-04
#Cold-Start Videos 3,042,068 68,095

Sample data

url netloc post_id timestamp subreddit author v
https://youtu.be/tmmpaOZ3nQg youtu.be eiazyl 1577836805 virtualreality Zweetprot tmmpaOZ3nQg
https://www.youtube.com/watch?v=LuAyGWqYza4 www.youtube.com eib0a6 1577836845 FTMMen 00110100-00110010 LuAyGWqYza4
https://www.youtube.com/watch?v=d4hJA7IUaDs www.youtube.com eib0a6 1577836845 FTMMen 00110100-00110010 d4hJA7IUaDs
https://www.youtube.com/watch?v=5U_2V6yr-Nw&feature=youtu.be www.youtube.com eib0a6 1577836845 FTMMen 00110100-00110010 5U_2V6yr-Nw
https://youtu.be/tmmpaOZ3nQg youtu.be eib0em 1577836862 SteamVR Zweetprot tmmpaOZ3nQg
https://youtu.be/mumHdNhclrM youtu.be eib0h6 1577836869 SmallYTChannel thevinamazing mumHdNhclrM
https://youtu.be/tmmpaOZ3nQg youtu.be eib0nk 1577836892 VRGaming Zweetprot tmmpaOZ3nQg
https://www.youtube.com/watch?v=uxtqIvOP0rQ www.youtube.com eib0se 1577836909 ripplers daNext1 uxtqIvOP0rQ
https://youtu.be/tmmpaOZ3nQg youtu.be eib0ur 1577836917 HTC_Vive Zweetprot tmmpaOZ3nQg
https://youtu.be/HE1Vy5lKuzw youtu.be eib0wn 1577836926 HelpMeFind Sanojoj HE1Vy5lKuzw
Bibtex:

@inproceedings{jin2023predicting,
title=Predicting Information Pathways Across Online Communities},
author={Jin, Yiqiao and Lee, Yeon-Chang and Sharma, Kartik and Ye, Meng and Sikka, Karan and Divakaran, Ajay and Kumar, Srijan},
booktitle={KDD},
year={2023}
}


Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.