GNN

Meta-ESM, Evolutionary Scale Modeling

GNN

Background

gnnlayer

To start, let’s establish what a graph is. A graph represents the relations (edges) between a collection of entities (nodes).
To further describe each node, edge or the entire graph, we can store information in each of these pieces of the graph.
We can additionally specialize graphs by associating directionality to edges (directed, undirected).

graph

Tasks:

A GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances).
GNNs adopt a “graph-in, graph-out” architecture meaning that these model types accept a graph as input, with information loaded into its nodes, edges and global-context, and progressively transform these embeddings, without changing the connectivity of the input graph.

mp

The design space for our GNN has many levers that can customize the model:

multigraph

Other types of graphs

The Challenges of Computation on Graphs:

Introduce operation to GNN:

gcn

Modern GNN

GVP-GNN

引入了一个新的模块,几何向量感知机(GVPs),以取代GNN中的密集层:

gvp

Meta-ESM, Evolutionary Scale Modeling

ESM系列五篇文章的主线是利用蛋白质语言模型实现从蛋白序列预测蛋白质结构和功能,提出了ESM-1b、ESM-MSA-1b、ESM-1v、ESM-IF1、ESM-Fold五种基于Transformer的无监督的蛋白质语言模型。

Shorthand esm.pretrained. Dataset Description
ESM-1b esm1b_t33_650M_UR50S() UR50, 12m seqs SOTA general-purpose protein language model. Can be used to predict structure, function and other protein properties directly from individual sequences. Released with Rives et al. 2019 (Dec 2020 update).
ESM-MSA-1b esm_msa1b_t12_100M_UR50S() UR50 + MSA MSA Transformer language model. Can be used to extract embeddings from an MSA. Enables SOTA inference of structure. Released with Rao et al. 2021 (ICML'21 version, June 2021).
ESM-1v esm1v_t33_650M_UR90S_1() ... esm1v_t33_650M_UR90S_5() UR90 Language model specialized for prediction of variant effects. Enables SOTA zero-shot prediction of the functional effects of sequence variations. Same architecture as ESM-1b, but trained on UniRef90. Released with Meier et al. 2021.
ESM-IF1 esm_if1_gvp4_t16_142M_UR50() CATH + UR50 Inverse folding model. Can be used to design sequences for given structures, or to predict functional effects of sequence variation for given structures. Enables SOTA fixed backbone sequence design. Released with Hsu et al. 2022.
模型名称 输入数据类型 普适性
ESM-1b single sequence family-specific
ESM-MSA-1b MSA few-shot
ESM-1v single sequence zero-shot

ESM-1b

基因突变数据集的标签来自于临床观察, 一般是定性的标记int dtype(pathogenic, benign, uncertain),没有准确的score

Evaluation metric of unsupervised method:

ESM-MSA-1b

Model modification:

ESM-1v

相似点:

ESM-IF1

Predict protein sequence based on protein structure through auto-regressive training.从蛋白质骨架坐标(每个氨基酸三个原子C2N的中心坐标)中预测出它的蛋白质序列。
esmif1

ESM-Fold

esmvsalpha