Publications

Publications

2022 Data-Driven Network Neuroscience: On Data Collection and Benchmark

All types

This paper presents a comprehensive and quality collection of functional human brain network data for potential research in the intersection of neuroscience, machine learning, and graph analytics. Anatomical and functional MRI images of the brain have been used to understand the functional connectivity of the human brain and are particularly important in identifying underlying neurodegenerative conditions such as Alzheimer’s, Parkinson’s, and Autism. Recently, the study of the brain in the form of brain networks using machine learning and graph analytics has become increasingly popular, especially to predict the early onset of these conditions. A brain network, represented as a graph, retains richer structural and positional information that traditional examination methods are unable to capture. However, the lack of brain network data transformed from functional MRI images prevents researchers from data-driven explorations. One of the main difficulties lies in the complicated domain-specific preprocessing steps and the exhaustive computation required to convert data from MRI images into brain networks. We bridge this gap by collecting a large amount of available MRI images from existing studies, working with domain experts to make sensible design choices, and preprocessing the MRI images to produce a collection of brain network datasets. The datasets originate from 5 different sources, cover 3 neurodegenerative conditions, and consist of a total of 2,642 subjects. We test our graph datasets on 5 machine learning models commonly used in neuroscience and on a recent graph-based analysis model to validate the data quality and to provide domain baselines. To lower the barrier to entry and promote the research in this interdisciplinary field, we release our complete preprocessing details, codes, and brain network data.

The brain network datasets can be accessed at: https://figshare.com/s/ e389233c2090e00635af. During this phase we are sharing via a private sharing link. A public DOI assigned through Figshare will be used with the exception of brain networks created with neuroimages from ADNI. ADNI has requested that derived data should be hosted in the same data repository as the original imaging data, LONI IDA.

We have a Github repository at https://github.com/bna-data-analysis/extract-brain-network which includes all our preprocessing codes and a demo on how to convert a raw fMRI image to a brain network in a step-by-step manner with sample input images from a subject in TaoWu.

BibTex Key
Authors GURURAJAPATHY Sophi Shilpa | HUANG David Tse Jung | KE Yiping | KUMAR Haribalan | QIAO Miao | WANG Alan
Tags dataset
DOI Number 10.48550/arXiv.2211.12421

2022 Combining advanced magnetic resonance imaging (MRI) with finite element (FE) analysis for characterising subject-specific injury patterns in the brain after traumatic brain injury

Journal Article

Traumatic brain injury (TBI) is a leading cause of death and disability. The way mechanical impact is transferred to the brain has been shown to be a major determinant for structural damage and subsequent pathological sequalae. Although finite element (FE) models have been used extensively in the investigation of various aspects of TBI and have been instrumental in characterising a TBI injury threshold and the pattern of diffuse axonal injuries, subject-specific analysis has been difficult to perform due to the complexity of brain structures and its material properties. We have developed an efficient computational pipeline that can generate subject-specific FE models of the brain made up of conforming hexahedral elements directly from advanced MRI scans. This pipeline was applied and validated in our sheep model of TBI. Our FE model of the sheep brain accurately predicted the damage pattern seen on post-impact MRI scans. Furthermore, our model also showed a complex time-varying strain distribution pattern, which was not present in the homogeneous model without subject-specific material descriptions. To our knowledge, this is the first fully subject-specific FE model of the sheep brain able to predict structural damage after a head impact. The pipeline developed has the potential to augment the analysis of human brain MRI scans to detect changes in brain structures and function after TBI.

BibTex Key @article{shim2022combining, title={Combining advanced magnetic resonance imaging (MRI) with finite element (FE) analysis for characterising subject-specific injury patterns in the brain after traumatic brain injury}, author={Shim, Vickie and Tayebi, Maryam and Kwon, Eryn and Guild, Sarah-Jane and Scadeng, Miriam and Dubowitz, David and McBryde, Fiona and Rosset, Samuel and Wang, Alan and Fernandez, Justin and others}, journal={Engineering with Computers}, pages={1--13}, year={2022}, publisher={Springer} }
Authors DUBOWITZ David | FERNANDEZ Justin | GUILD Sarah-Jane | HOLDSWORTH Samantha | KWON Eryn | LI Shaofan | MCBRYDE Fiona | ROSSET Samuel | SCADENG Miriam | SHIM Vickie | TAYEBI Maryam | WANG Alan
Tags
DOI Number 10.1007/s00366-022-01697-4
Book Title Engineering with Computers
Publisher Springer

2021 Subdomain adaptation with manifolds discrepancy alignment

Inproceeding

Reducing domain divergence is a key step in transfer learning. Existing works focus on the minimization of global domain divergence. However, two domains may consist of several shared subdomains, and differ from each other in each subdomain. In this article, we take the local divergence of subdomains into account in transfer. Specifically, we propose to use the low-dimensional manifold to represent the subdomain, and align the local data distribution discrepancy in each manifold across domains. A manifold maximum mean discrepancy (M3D) is developed to measure the local distribution discrepancy in each manifold. We then propose a general framework, called transfer with manifolds discrepancy alignment (TMDA), to couple the discovery of data manifolds with the minimization of M3D. We instantiate TMDA in the subspace learning case considering both the linear and nonlinear mappings. We also instantiate TMDA in the deep learning framework. Experimental studies show that TMDA is a promising method for various transfer learning tasks.

BibTex Key @article{wei2021subdomain, title={Subdomain adaptation with manifolds discrepancy alignment}, author={Wei, Pengfei and Ke, Yiping and Qu, Xinghua and Leong, Tze-Yun}, journal={IEEE Transactions on Cybernetics}, year={2021}, publisher={IEEE} }
Authors KE Yiping | LEONG Tze-Yun | QU Xinghua | WEI Pengfei
Tags
DOI Number 10.1109/TCYB.2021.3071244
Book Title IEEE Transactions on Cybernetics
Issue Title Volume: 52, Issue: 11
Publisher IEEE

2021 Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions

Inproceeding

Attributed event sequences are commonly encountered in practice. A recent research line focuses on incorporating neural networks with the statistical model–marked point processes, which is the conventional tool for dealing with attributed event sequences. Neural marked point processes possess good interpretability of probabilistic models as well as the representational power of neural networks. However, we find that performance of neural marked point processes is not always increasing as the network architecture becomes more complicated and larger, which is what we call the performance saturation phenomenon. This is due to the fact that the generalization error of neural marked point processes is determined by both the network representational ability and the model specification at the same time. Therefore we can draw two major conclusions: first, simple network structures can perform no worse than complicated ones for some cases; second, using a proper probabilistic assumption is as equally, if not more, important as improving the complexity of the network. Based on this observation, we propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers, thus it can be easily accelerated by the parallel mechanism. We directly consider the distribution of interarrival times instead of imposing a specific assumption on the conditional intensity function, and propose to use a likelihood ratio loss with a moment matching mechanism for optimization and model selection. Experimental results show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.

BibTex Key @inproceedings{li2021mitigating, title={Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions}, author={Li, Tianbo and Luo, Tianze and Ke, Yiping and Pan, Sinno Jialin}, booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining}, pages={986--994}, year={2021} }
Authors KE Yiping | LI Tianbo | LUO Tianze | PAN Sinno Jialin
Tags
DOI Number 10.1145/3447548.3467436
Book Title KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
Publisher ACM

2021 Two-Attribute Skew Free, Isolated CP Theorem, and Massively Parallel Joins

Inproceeding

This paper presents an algorithm to process a multi-way join with load O(n/p^2/(α φ) ) under the MPC model, where n is the number of tuples in the input relations, α the maximum arity of those relations, p the number of machines, and φ a newly introduced parameter called the generalized vertex packing number. The algorithm owes to two new findings. The first is a two-attribute skew free technique to partition the join result for parallel computation. The second is an isolated cartesian product theorem, which provides fresh graph-theoretic insights on joins with α ≥ 3$ and generalizes an existing theorem on α = 2.

BibTex Key @inproceedings{qiao2021two, title={Two-attribute skew free, isolated CP theorem, and massively parallel joins}, author={Qiao, Miao and Tao, Yufei}, booktitle={Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems}, pages={166--180}, year={2021} }
Authors QIAO Miao | TAO Yufei
Tags
DOI Number 10.1145/3452021.3458321
Book Title PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
Publisher ACM

2020 Easy-But-Effective Domain Sub-Similarity Learning for Transfer Regression

Journal Article

Transfer covariance function, which can model domain similarity and adaptively control the knowledge transfer across domains, is widely used in transfer learning. In this paper, we concentrate on Gaussian process ( GP ) models using a transfer covariance function for regression problems in a black-box learning scenario. Precisely, we investigate a family of rather general transfer covariance functions, T∗ , that can model the heterogeneous sub-similarities of domains through multiple kernel learning. A necessary and sufficient condition to obtain valid GP s using T∗ ( GPT∗ ) for any data is given. This condition becomes specially handy for practical applications as (i) it enables semantic interpretations of the sub-similarities and (ii) it can readily be used for model learning. In particular, we propose a computationally inexpensive model learning rule that can explicitly capture different sub-similarities of domains. We propose two instantiations of GPT∗ , one with a set of predefined constant base kernels and one with a set of learnable parametric base kernels. Extensive experiments on 36 synthetic transfer tasks and 12 real-world transfer tasks demonstrate the effectiveness of GPT∗ on the sub-similarity capture and the transfer performance.

BibTex Key @article{wei2020easy, title={Easy-but-effective Domain Sub-similarity Learning for Transfer Regression}, author={Wei, Pengfei and Sagarna, Ramon and Ke, Yiping and Ong, Yew-Soon}, journal={IEEE Transactions on Knowledge and Data Engineering}, year={2020}, publisher={IEEE} }
Authors KE Yiping | ONG Yew-Soon | SAGARNA Ramon | WEI Pengfei
Tags
DOI Number 10.1109/TKDE.2020.3039806
Book Title IEEE Transactions on Knowledge and Data Engineering
Issue Title Volume: 34, Issue: 9
Publisher IEEE

2022 On Scalable Computation of Graph Eccentricities

Inproceeding

Given a graph, densest subgraph search reports a single subgraph that maximizes the density (i.e., average degree). To diversify the search results without imposing rigid constraints, this paper studies the problem of anchored densest subgraph search (ADS). Given a graph, a reference node set S and an anchored node set A with A-R, ADS reports a supergraph of A that maximizes the R-subgraph density ? a density that favors the nodes that are close to S and are not over-popular in comparison with nodes in R. The two levels of localities bring wide applications, as demonstrated by our use cases. For ADS, we propose an algorithm that is local since the complexity is only related to the nodes in S as opposed to the entire graph. Extensive experiments show that our local algorithm for ADS outperforms the global algorithm by up to three orders of magnitudes in time and space consumption; moreover, our local algorithm outperforms existing local community detection solutions in locality, result density, and query processing time and space.

BibTex Key @inproceedings{li2022scalable, title={On Scalable Computation of Graph Eccentricities}, author={Li, Wentao and Qiao, Miao and Qin, Lu and Chang, Lijun and Zhang, Ying and Lin, Xuemin}, booktitle={Proceedings of the 2022 International Conference on Management of Data}, pages={904--916}, year={2022} }
Authors LI Wentao | LIJUN Chang | QIAO Miao | QIN Lu | XUEMIN Lin | YING Zhang
Tags
Editors ACM
DOI Number 10.1145/3514221.3517890
Book Title SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
Publisher ACM

2022 Anchored Densest Subgraph

Inproceeding

Given a graph, eccentricity measures the distance from each node to its farthest node. Eccentricity indicates the centrality of each node and collectively encodes fundamental graph properties: the radius and the diameter — the minimum and maximum eccentricity, respectively, over all the nodes in the graph. Computing the eccentricities for all the graph nodes, however, is challenging in theory: any approach shall either complete in quadratic time or introduce a 1/3 relative error under certain hypotheses. In practice, the state-of-the-art approach PLLECC in computing exact eccentricities relies heavily on a precomputed all-pair-shortest-distance index whose expensive construction refrains PLLECC from scaling up. This paper provides insights to enable scalable exact eccentricity computation that does not rely on any index. The proposed algorithm IFECC handles billion-scale graphs that no existing approach can process and achieves up to two orders of magnitude speedup over PLLECC. As a by-product, IFECC can be terminated at any time during execution to produce approximate eccentricities, which is empirically more stable and reliable than KBFS, the state-of-the-art algorithm for approximately computing eccentricities.

BibTex Key @inproceedings{dai2022anchored, title={Anchored Densest Subgraph}, author={Dai, Yizhou and Qiao, Miao and Chang, Lijun}, booktitle={Proceedings of the 2022 International Conference on Management of Data}, pages={1200--1213}, year={2022} }
Authors CHANG Lijun | DAI Yizhou | QIAO Miao
Tags
Editors ACM
DOI Number 10.1145/3514221.3517890
Book Title SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
Publisher ACM

2022 Clustering Activation Networks

Inproceeding

A real-world graph often has frequently interacting nodes on less frequently updated edges. Each interaction activates an existing edge and changes the activeness of the edge. In such an activation network, nodes that are cohesively connected by active edges form a cluster in both structural and temporal senses. For activation networks, incrementally maintaining a structure for an efficient clustering query processing is thus important. This raises problems on maintaining the edge activeness, combining the structural cohesiveness and activeness for clustering, and designing indexes for online clustering queries. This paper considers the time-decay scheme in modelling the activeness and proposes a suite of techniques with great effort made on simplification and innovation for efficiency, effectiveness and scalability. The query time is only related to the query results as opposed to the graph. The index size is linear up to a logarithmic factor. Extensive experiments verify the quality of the clustering results and moreover, the update time is up to six orders of magnitude faster than the baseline.

BibTex Key @inproceedings{feng2022clustering, title={Clustering Activation Networks}, author={Feng, Zijin and Qiao, Miao and Cheng, Hong}, booktitle={2022 IEEE 38th International Conference on Data Engineering (ICDE)}, pages={780--792}, year={2022}, organization={IEEE} }
Authors CHENG Hong | FENG Zijin | QIAO Miao
Tags
Editors IEEE
DOI Number 10.1109/ICDE53745.2022.00063
Book Title 2022 IEEE 38th International Conference on Data Engineering (ICDE)
Publisher IEEE