Representing Crystalline Structures as Graphs (with Feature Vectors).
Below, the representation of crystalline structures as graphs is discussed. This is a framework often used in machine-learning models (i.e. graph neural networks) applied to materials science.
Modern machine learning models applied to the prediction of material properties from crystalline structures depict such crystalline structures as graphs. In these graphs, nodes represent atoms and edges represent some general bond between or pairing of atoms. The construction of these graphs then requires us to define, at minimum, some criteria for the construction of edges. Often, the edges are just made for the 12 nearest neigbors; or based upon some distance cut-off.
Such a basic model of these crystalline structures obviously includes no substantial atomic information, and hence doesn't have enough physical information for any downstream model to perform well in most predictive tasks.
To resolve this, both nodes and edges in these graphs then each have some additional 'hidden' feature vector or 'representation' that describes the physical characteristics of it's corresponding graph element. Generally, these initial features are derived from known properties of the element's corresponding physical state; and then updated via some graph convolutional network. The output of the network then is some 'learned representation' of the crystal structure (generally after the pooling of node features). This learned representation is then used as input for some (trained) predictive neural network.
Example Graph Network:
As an example of the use of these graph encodings, consider the graph convolutional network architecture below: which is generally trained on the order of tens of thousands of crystals to predict specfic physical properties (such as formation energy per atom, total energy, minimum band gaps, etc.).
The architecture displayed here is heavily inspired by the model utilized in this paper, and which we refer to as CGCNN.
Below, we discuss the usual schemes of graph encoding, along with their included feature schemes, employed in some famous crystal graph convolutional networks. Then, we consider limitations of the models and discuss a possible extension of the framework.
Crystal Graphs
The mathematical object known generally as a graph is just a set of nodes (or vertices) and a set of edges between them. They may be represented diagrammatically rather nicely, with one displayed below: They also may be decribed explicitly as a set of two sets: one set of nodes \(\mathcal{V}\); and one set of tuples \(\mathcal{E}\), denoting an edge between two nodes in the previous set.
For the matter at hand, we'll use this basic structure as a framework for the mathematical representation of crystalline strucutres. And then, we attach some sort of representation to the constituent elements in a way that encodes the physical information of the system.
For the underlying graph we can simply take the atoms to be the nodes, but then edges must be constructed according to some criteria. That is, the nodes obviously can be taken as the atomic sites of the lattice, but we need to define explicitly what an edge between two nodes represents. Below, some of these criteria for edge construction are discussed, with specific criteria from previous works being used as examples.
Edge Criteria
Most of the edge criteria employed are very basic, and consider only distance. This is to effectively maintain generality of task and to minimize computational complexity (perhaps as well as intellectual). These criteria are crude, but proven to work with modern CGCNNs rather well.
Distance Cut-off
Distance cut-off is one of the most common approaches. The interatomic cut-off is often in the range of 5-10 Angstroms. So, for every given atomic site (or node), the edges are determined to be those sites around it within the cut-off distance.
Set Number of Neighbors
Sometimes a protocol calls for a certain number of neighbors for every node. This may always be accomplished in crystalline structures by assuming periodicity and including those within neighboring cells (if there aren't enough in a single unit cell).
Often, both a maximum number of neighbors and a maximum distance cut-off are used in conjunction with each other to minimize the total number of edges but still capture relevant neighborhoods of each site.
To avoid treating all of these neighbors (with varying distance) exactly the same, the difference in distance will be accounted for in the associated edge features, as discussed later.
Other Criteria
More sophisticated criteria are rarely used in modern models, but more rigorous means by which we may determine the set of relevant site neighbors are possible. Some other commonly used criteria (in the general determination of nearest neighbors) are solid angle cut-offs, Voronoi tesselation neighbors, and scaled cut-offs (that depend on the site's closest neighbor).
These more sophisticated criteria are more often applied in physical applications, such as the determination of a site's local symmetry or motif structure.
Initial Features
With a graph in hand, we now need a way to associate the important physical information of the crystal with the graph. This is done by associating a vector, termed the feature vector, with each node, and often edge, of the graph. The feature vectors are intended to encode the relevant physical data for the chosen predictive task.
Note that we term these initial features, since they are usually taken as input for some machine learning model, which will consequently update the features layer-by-layer. Having acknowledged this, we drop the term initial in the following discussions and consider the features of these two objects: the atoms or nodes; and the edges.
Node/Atom Features
Nodes of the crystal graph represent atoms most usually, so we may essentially consider this to be a discussion of atom representation vectors.
Of course, atoms have corresponding valence charges, differing orbital structures, and groups. A common scheme for atom representation vectors is that employed in the classic CGCNN by Xie & Grossman. This is described succinctly in the diagram below: Note that some values are one-hot encoded; others are expansions, etc. The implementation of the scheme above can be found as atom_init.json here, organized by proton number.
One Hot Encoding:
When we need to describe something that's a true or false property, we may place a zero or one to describe this true-false relationship. Alternatively, to increase the dimensionality, we could have an inefficient two-dimensional space where either of the two must be one, with the first being one and the other zero (denoting true) and then vice versa (denoting false) being the only two options.
This padded representation that extends the dimensionality is termed one-hot encoding; where the example above for true or false may naturally be extended to describe a larger set (>2) of mutually-exclusive categories (such as an elements group number).
One-hot encoding is inefficient, but, it may help communicate better the different classes to the model it's input in to. This is especially true in models for which the first layers act effectively as linear maps.
Sometimes no atomic information is provided with the graph (or some randomly initialized set of orthogonal vectors may be chosen). The workflow defined in this paper, and which we will refer to as MEGNet, implemented random feature vectors for the atoms. Suprisingly, negligible difference in performance of the model for certain tasks was observed (compared to the previously discussed physical atom vectors).
Edge/"Bond" Features
Apparently, due to their simple criteria for construction, the feature vector of most crystal graph edges is only based upon distance.
We can always describe a distance as a scalar distance, but we may also describe it as an offset vector (though there is some ambiguity up to a minus sign). More frequently, the scalar distance corresponding to the edge is expanded in a so-called Gaussian distance expansion, or Radial-Basis Function expansion (RBF). This gives us a larger dimensional representation vector based only on the distance.
One could also conceivably include other information about the bond, such as the combined charge or combined electrostatic valency of the neighbors, but this then wouldn't apply to next nearest neighbors very well, or the next next etc. which may have been included in the edge construction. Of course, the loose definition of an edge requires us to drop any interpretation of the edges as 'chemical bonds' between graph neighbors; but other bond-specific attributes may also be incorporated, such as displacement vectors (in terms of components) and the like.
Conclusion
Modern models utilizing the basic constructions discussed above have shown suprisingly good performance on a multitude of training tasks such as: formation energy per atom, minimum band-gap, metal/non-metal classification, etc. This suggests that the information packaged in such a structure, as above, is an intrinsically useful representation of the corresponding physical structures.
Greater performance and more robust representations are almost certainly intertwined, however. To that end, it's useful to consider what flaws such models may have, in hopes we may improve them.
Flaws
One major flaw with these techniques is their lack of inclusion of larger geometrical information. Then again, the high performance of such models without this geometrical information may be a testament to it's neglible effect on the chosen targets.
Extensions
The above graph constructions have been extended to some extent in more modern models; two notable examples being the atomistic LINE graph, and the state vector of the MEGNet model. The atomistic LINE graph approach associated a secondary, 'line' graph to each crystal, in which nodes represented edges of the crystal graph and connections between these 'edge' nodes describe the angle made between connected bonds. Megnet's approach associated an additional state-level vector with each crystal graph as a sort of graph-level feature.