I Know You’ll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application
PDF Bibitex Link Code

Carl Yang, Xiaolin Shi, Jie Luo, Jiawei Han

Taking the anonymous large-scale real-world data from Snapchat as an example, we develop ClusChurn, a systematic two-step framework for interpretable new user clustering and churn prediction, based on the intuition that proper user clustering can help improve and understand user churn. It is able to simultaneously predict user churn and user types with very limited initial behavior data, which enables rapid reactions to different types of user churn.

The 24th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), London, United Kingdom, 2018.


Did You Enjoy the Ride: Understanding Passenger Experience via Heterogeneous Network Embedding
PDF Bibitex Link Code

Carl Yang, Chao Zhang, Xuewen Chen, Jieping Ye, Jiawei Han

We model the large-scale real-world transportation data of DiDi China with heterogeneous information networks (HIN), and design pattern-aware HIN embedding to accurately predict passenger satisfaction over their rides and understand the key factors that lead to good/bad experiences. The learned passenger/driver/spot embeddings can further serve wide critical applications including passenger/driver profiling, route planning, and so on.

The 34th IEEE International Conference on Data Engineering (ICDE), Paris, France, 2018.


Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI Recommendation
PDF Bibitex Link Code

Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan, Jiawei Han

For POI recommendation, we aim to alleviate the scarcity of check-in data via smoothing among similar users and places on the context graphs, which are constructed to take various context information around users (e.g., friendships) and places (e.g., geographical distances). A deep neural architecture called PACE that generalizes matrix factorization and graph laplacian regularizer is developed to bridge collaborative filtering and semi-supervised learning.

The 23rd SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Halifax, Canada, 2017.


Bi-directional Joint Inference for User Links and Attributes on Large Social Graphs
PDF Bibitex Link Code

Carl Yang, Zhong Lin, Li-Jia Li, Jie Luo

We propose to jointly infer user links and attributes by exploiting homopily and iteratively addressing smoothness on the social graphs through two directions, i.e., from closeness to similarity (stronger links lead to mrore similar attributes), and vice versa. The two processes are done in a unified probabilistic framework through label propagation and graph construction.

The 26th International World Wide Web Conference (WWW), Perth, Australia, 2017.


CONE: Community Oriented Network Embedding
PDF Bibitex Link Code

Carl Yang, Hanqing Lu, Kevin Chang

We doubt the universality of the two widely adopted assumptions for community detection: link density and node homogeneity. We compare human labeled communities with machie predicted ones obtained via typical mainstream applications and show their deficiency. A supervised approach based on network embedding is devised to integrate network attributes, links and example communities to the underlying social patterns for reliable community detection.

The 2018 IEEE International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 2018.


Geodesic Distance Function Learning via Heat Flows on Vector Fields
PDF Bibitex Link Code

Binbin Lin, Ji Yang, Xiaofei He, Jieping Ye

We propose to learn the geodesic distance funtion d(p, ·) on data manifolds based on severe theoretical analysis. Specifically, we first learn the gradient field of the distance function by transporting an initial vector field around p to the whole manifold via heat flows on vector fields. Then we obtain d(p, ·) by requiring its gradient field to be close to the normalized vector field.

The 31th International Conference on Machine Learning (ICML), Beijing, China, 2014.


Multi-Query Parallel Field Ranking for Image Retrieval
PDF Bibitex Link

Ji Yang, Bin Xu, Binbin Lin, Xiaofei He

For multi-query retrieval tasks, we propose a novel approach which finds an optimal ranking function whose gradient field is as parallel as possible. In this way, the obtained ranking function varies linearly along the geodesics of the data manifold, and achieves the highest value at multiple queries simultaneously, making efficient use of query information and the intrinsic distribution of data.

Neurocomputing, 2014.


Local Coordinate Concept Factorization for Image Representation
PDF Bibitex Link

Haifeng Liu, Zheng Yang, Ji Yang, Zhaohui Wu, Xuelong Li

We introduce a locality constraint into the traditional concept factorization. By requiring the concepts (basis vectors) to be as close to the original data points as possible, we represent each data point by a linear combination of only a few basis concepts, thus addressing sparsity and locality simultaneously.

IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2014.