Logo

Works

Talks

Time

CV

Me

(???)


Fabrizio Frasca

Hey! I am Fabrizio. I am a Postdoctoral Fellow at Technion, the Israel Institute of Technology. I work with Prof. Haggai Maron on Geometric Deep Learning and, in particular, Equivariance, Expressiveness, Graph Neural Networks.

I have obtained a PhD in Computing from Imperial College London under the supervision of Prof. Michael Bronstein, with my research focussing on overcoming the intrinsic representational limits of Graph Neural Network. I have explored extensions of message-passing schemes that can capture non-trivial meso-scale topological patterns in networks, and equivariance to symmetries as an overarching design principle to design provably expressive architectures.

In the past, I have conducted more applied research revolving around the application of Machine Learning approaches to problems in the realm of Computational Biology and Bioinformatics, in particular: drug repurposing and epigenetic gene expression regulation.

I have also been a Machine Learning Researcher at Twitter Cortex from 2019 – acquisition of Fabula AI – to early 2023.


News

[2024/1] Quite an update: I have just started as a Postdoctoral Fellow researcher at Technion (Israel), where I will work with Prof. Haggai Maron on Equivariance, Expressiveness, Graph Neural Networks, Geometric Deep Learning and friends :)

[2023/11] Big news: I have just discussed my PhD thesis with examiners Prof. Ben Glocker and Prof. Yaron Lipman. I am excited to share I have passed the viva examination without corrections, this marking the (successful) end of my PhD journey :D

[2023/8] It's been a while, huh? ... Well, these past months I've been mostly working on my PhD thesis 'Expressive and Efficient Graph Neural Networks'. I am happy to share that I have finally submitted it!

... see more here ...


Works as (co-)lead

(* indicates equal contribution; see here for a complete list).


thumbnail

Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries

Fabrizio Frasca*, Beatrice Bevilacqua*, Michael M. Bronstein, Haggai Maron

(paper) (code) (video)

NeurIPS 2022
Oral (1.7% acceptance rate)

Abstract
Subgraph GNNs are a recent class of expressive Graph Neural Networks (GNNs) which model graphs as collections of subgraphs. So far, the design space of possible Subgraph GNN architectures as well as their basic theoretical properties are still largely unexplored. In this paper, we study the most prominent form of subgraph methods, which employs node-based subgraph selection policies such as ego-networks or node marking and deletion. We address two central questions: (1) What is the upper-bound of the expressive power of these methods? and (2) What is the family of equivariant message passing layers on these sets of subgraphs?. Our first step in answering these questions is a novel symmetry analysis which shows that modelling the symmetries of node-based subgraph collections requires a significantly smaller symmetry group than the one adopted in previous works. This analysis is then used to establish a link between Subgraph GNNs and Invariant Graph Networks (IGNs). We answer the questions above by first bounding the expressive power of subgraph methods by 3-WL, and then proposing a general family of message-passing layers for subgraph methods that generalises all previous node-based Subgraph GNNs. Finally, we design a novel Subgraph GNN dubbed SUN, which theoretically unifies previous architectures while providing better empirical performance on multiple benchmarks.


thumbnail

Accurate and Highly Interpretable Prediction of Gene Expression from Histone Modifications

Fabrizio Frasca, Matteo Matteucci, Michele Leone, Marco J. Morelli, Marco Masseroli

(paper) (code)

BMC Bioinformatics (2022)

Abstract
Histone Mark Modifications (HMs) are crucial actors in gene regulation, as they actively remodel chromatin to modulate transcriptional activity: aberrant combinatorial patterns of HMs have been connected with several diseases, including cancer. HMs are, however, reversible modifications: understanding their role in disease would allow the design of ‘epigenetic drugs’ for specific, non-invasive treatments. Standard statistical techniques were not entirely successful in extracting representative features from raw HM signals over gene locations. On the other hand, deep learning approaches allow for effective automatic feature extraction, but at the expense of model interpretation. Here, we propose ShallowChrome, a novel computational pipeline to model transcriptional regulation via HMs in both an accurate and interpretable way. We attain state-of-the-art results on the binary classification of gene transcriptional states over 56 cell-types from the REMC database, largely outperforming recent deep learning approaches. We interpret our models by extracting insightful gene-specific regulative patterns, and we analyse them for the specific case of the PAX5 gene over three differentiated blood cell lines. Finally, we compare the patterns we obtained with the characteristic emission patterns of ChromHMM, and show that ShallowChrome is able to coherently rank groups of chromatin states w.r.t. their transcriptional activity. In this work we demonstrate that it is possible to model HM-modulated gene expression regulation in a highly accurate, yet interpretable way. Our feature extraction algorithm leverages on data downstream the identification of enriched regions to retrieve gene-wise, statistically significant and dynamically located features for each HM. These features are highly predictive of gene transcriptional state, and allow for accurate modeling by computationally efficient logistic regression models. These models allow a direct inspection and a rigorous interpretation, helping to formulate quantifiable hypotheses.


thumbnail

Equivariant Subgraph Aggregation Networks

Beatrice Bevilacqua*, Fabrizio Frasca*, Derek Lim*, Balasubramaniam Srinivasan, Chen Cai, Gopinath Balamurugan, Michael M. Bronstein, Haggai Maron

(paper) (code) (post) (video)

ICLR 2022
Spotlight (5% acceptance rate)

Abstract
Message-passing neural networks (MPNNs) are the leading architecture for deep learning on graph-structured data, in large part due to their simplicity and scalability. Unfortunately, it was shown that these architectures are limited in their expressive power. This paper proposes a novel framework called Equivariant Subgraph Aggregation Networks (ESAN) to address this issue. Our main observation is that while two graphs may not be distinguishable by an MPNN, they often contain distinguishable subgraphs. Thus, we propose to represent each graph as a set of subgraphs derived by some predefined policy, and to process it using a suitable equivariant architecture. We develop novel variants of the 1-dimensional Weisfeiler-Leman (1-WL) test for graph isomorphism, and prove lower bounds on the expressiveness of ESAN in terms of these new WL variants. We further prove that our approach increases the expressive power of both MPNNs and more expressive architectures. Moreover, we provide theoretical results that describe how design choices such as the subgraph selection policy and equivariant neural architecture affect our architecture's expressive power. To deal with the increased computational cost, we propose a subgraph sampling scheme, which can be viewed as a stochastic version of our framework. A comprehensive set of experiments on real and synthetic datasets demonstrates that our framework improves the expressive power and overall performance of popular GNN architectures.


thumbnail

Weisfeiler and Lehman Go Cellular: CW Networks

Cristian Bodnar*, Fabrizio Frasca*, Nina Otter, Yu Guang Wang, Pietro Liò, Guido Montúfar, Michael M. Bronstein

(paper) (code) (post) (video)

NeurIPS 2021

Abstract
Graph Neural Networks (GNNs) are limited in their expressive power, struggle with long-range interactions and lack a principled way to model higher-order structures. These problems can be attributed to the strong coupling between the computational graph and the input graph structure. The recently proposed Message Passing Simplicial Networks naturally decouple these elements by performing message passing on the clique complex of the graph. Nevertheless, these models can be severely constrained by the rigid combinatorial structure of Simplicial Complexes (SCs). In this work, we extend recent theoretical results on SCs to regular Cell Complexes, topological objects that flexibly subsume SCs and graphs. We show that this generalisation provides a powerful set of graph "lifting" transformations, each leading to a unique hierarchical message passing procedure. The resulting methods, which we collectively call CW Networks (CWNs), are strictly more powerful than the WL test and not less powerful than the 3-WL test. In particular, we demonstrate the effectiveness of one such scheme, based on rings, when applied to molecular graph problems. The proposed architecture benefits from provably larger expressivity than commonly used GNNs, principled modelling of higher-order signals and from compressing the distances between nodes. We demonstrate that our model achieves state-of-the-art results on a variety of molecular datasets.


thumbnail

Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks

Cristian Bodnar*, Fabrizio Frasca*, Yu Guang Wang*, Nina Otter, Guido Montúfar*, Pietro Liò, Michael M. Bronstein

(paper) (code) (post)

ICML 2021

Abstract
The pairwise interaction paradigm of graph machine learning has predominantly governed the modelling of relational systems. However, graphs alone cannot capture the multi-level interactions present in many complex systems and the expressive power of such schemes was proven to be limited. To overcome these limitations, we propose Message Passing Simplicial Networks (MPSNs), a class of models that perform message passing on simplicial complexes (SCs). To theoretically analyse the expressivity of our model we introduce a Simplicial Weisfeiler-Lehman (SWL) colouring procedure for distinguishing non-isomorphic SCs. We relate the power of SWL to the problem of distinguishing non-isomorphic graphs and show that SWL and MPSNs are strictly more powerful than the WL test and not less powerful than the 3-WL test. We deepen the analysis by comparing our model with traditional graph neural networks (GNNs) with ReLU activations in terms of the number of linear regions of the functions they can represent. We empirically support our theoretical claims by showing that MPSNs can distinguish challenging strongly regular graphs for which GNNs fail and, when equipped with orientation equivariant layers, they can improve classification accuracy in oriented SCs compared to a GNN baseline.


thumbnail

Scalable Inception Graph Neural Networks

Fabrizio Frasca*, Emanuele Rossi*, Davide Eynard, Ben Chamberlain, Michael M. Bronstein, Federico Monti

(paper) (code) (post)

Abstract
Graph representation learning has recently been applied to a broad spectrum of problems ranging from computer graphics and chemistry to high energy physics and social media. The popularity of graph neural networks has sparked interest, both in academia and in industry, in developing methods that scale to very large graphs such as Facebook or Twitter social networks. In most of these approaches, the computational cost is alleviated by a sampling strategy retaining a subset of node neighbors or subgraphs at training time. In this paper we propose a new, efficient and scalable graph deep learning architecture which sidesteps the need for graph sampling by using graph convolutional filters of different size that are amenable to efficient precomputation, allowing extremely fast training and inference. Our architecture allows using different local graph operators (e.g. motif-induced adjacency matrices or Personalized Page Rank diffusion matrix) to best suit the task at hand. We conduct extensive experimental evaluation on various open benchmarks and show that our approach is competitive with other state-of-the-art architectures, while requiring a fraction of the training and inference time. Moreover, we obtain state-of-the-art results on ogbn-papers100M, the largest public graph dataset, with over 110 million nodes and 1.5 billion edges.


thumbnail

Exposing and Characterizing Subpopulations of Distinctly Regulated Genes by K-Plane Regression

Fabrizio Frasca, Matteo Matteucci, Marco J. Morelli, Marco Masseroli

(paper)

LNBI (Lecture Notes in BioInformatics, 2020; extended from CIBB 2018)

Abstract
Understanding the roles and interplays of histone marks and transcription factors in the regulation of gene expression is of great interest in the development of non-invasive and personalized therapies. Computational studies at genome-wide scale represent a powerful explorative framework, allowing to draw general conclusions. However, a genome-wide approach only identifies generic regulative motifs, and possible multi-functional or co-regulative interactions may remain concealed. In this work, we hypothesize the presence of a number of distinct subpopulations of transcriptional regulative patterns within the set of protein coding genes that explain the statistical redundancy observed at a genome-wide level. We propose the application of a K-Plane Regression algorithm to partition the set of protein coding genes into clusters with specific shared regulative mechanisms. Our approach is completely data-driven and computes clusters of genes significantly better fitted by specific linear models, in contrast to single regressions. These clusters are characterized by distinct and sharper histonic input patterns, and different mean expression values.


thumbnail

Learning Interpretable Disease Self-Representations for Drug Repositioning

Fabrizio Frasca*, Diego Galeano*, Guadalupe Gonzalez, Ivan Laponogov, Kirill Veselkov, Alberto Paccanaro, Michael M. Bronstein

(paper) (code)

Abstract
Drug repositioning is an attractive cost-efficient strategy for the development of treatments for human diseases. Here, we propose an interpretable model that learns disease self-representations for drug repositioning. Our self-representation model represents each disease as a linear combination of a few other diseases. We enforce proximity in the learnt representations in a way to preserve the geometric structure of the human phenome network - a domain-specific knowledge that naturally adds relational inductive bias to the disease self-representations. We prove that our method is globally optimal and show results outperforming state-of-the-art drug repositioning approaches. We further show that the disease self-representations are biologically interpretable.


thumbnail

Modeling Gene Transcriptional Regulation by Means of Hyperplanes Genetic Clustering

Fabrizio Frasca, Matteo Matteucci, Marco Masseroli, Marco J. Morelli

(paper)

IJCNN 2018

Abstract
In the wide context of biological processes regulating gene expression, transcriptional regulation driven by epigenetic activity is among the most effective and intriguing ones. Understanding the complex language of histone modifications and transcription factor bindings is an appealing yet hard task, given the large number of involved features and the specificity of their combinatorial behavior across genes. Genome-wide regression models for predicting mRNA abundance quantifications from epigenetic activity are interesting in an exploratory framework, but their effectiveness is limited as the relative predictive power of epigenetic features is hard to discern at such level of resolution. On the other hand, an investigative analysis cannot rely on prior biological knowledge to perform sensible grouping of genes and locally study epigenetic regulative processes. In this context, we shaped the “gene stratification problem” as a form of epigenetic feature-based hyperplanes clustering, and proposed a genetic algorithm to approach this task, aiming at performing datadriven partitioning of the whole set of protein coding genes of an organism based on the characteristic relation between their expression and the associated epigenetic activity. We observed how, not only the hyperplanes described by the resulting partitions significantly differ from each other, but also how different epigenetic features are of diverse importance in predicting gene expression within each partition. This demonstrates the validity and biological interest of the proposed computational method and the obtained results.





My personal website has been realised through Jekyll and Github Pages. The theme — slightly customised — is by orderedlist. Drop me a message if you'd fancy a chat!