Atlas Schema

From AtlasWiki
Jump to: navigation, search

The way that GraphElement objects are attributed and tagged and the way nodes are connected by edges in the Atlas index follows a strict rule set that forms the Atlas Schema. The rule set is influenced by the language the index is being created for (ex: Java, C, C++) and the way Atlas represents the language abstractions. For example, call edges only exist between two nodes that are both methods or from a callsite (control flow block) to a method. Adding a call relationship anywhere else in the graph would be confusing (and violate our schema!). Some edges are summaries of other more fine grain relationships. A call edge between two methods represents a summary of at least one control flow edge leaving the method to the target method. This summary relationship relative to its finer grain edges can be thought of as a hierarchy of edges.

To see a list of all the Atlas nodes and edge types with their summary relationships you should refer to the official Extensible Common Software Graph documentation, which Atlas implements.

In general the index currently contains:

  • The major declarations (projects, types, packages, fields, methods), and associated relationships, such as the type hierarchy.
  • A "summary" graph, which includes method and control-flow-granularity relationships, such as calls and reads/writes of fields, among other things. For example, if a method foo() calls bar() in 5 places, the method-granularity relationships will have a single CALL edge to represent that there exists at least one call from foo() to bar(). Similarly, the control-flow-granularity relationships will have a single CALL edge per control flow block.
  • Control flow.
  • Data flow.

Schema Importance

To become proficient in wielding Atlas, you should have a decent understanding of the Atlas Schema, which means having some familiarly with the Extensible Common Software Graph documentation and the properties of the language you are analyzing. Let's consider an illustrating example.

Suppose you are writing an analysis program to discover common bugs in Java programs. While developing your analysis program you discover that you are detecting results in nested class, which for whatever reason you decide you would like to exclude. In Java source we define a nested class as a class that is defined inside of another class. We know from the Atlas Schema that classes will have a declares edge from their declaring parent. We also know that packages declare classes. Using this knowledge about the structure of the graph is enough to isolate the set of nested classes, by removing classes that are not reachable by one step forward along declares edges starting at the set of all package nodes.

Q allClasses = Common.universe().nodesTaggedWithAny(XCSG.Java.Class);
Q allPackages = Common.universe().nodesTaggedWithAny(XCSG.Package);
Q containsEdges = Common.universe().edgesTaggedWithAny(XCSG.Contains);
Q topLevelClasses = containsEdges.successors(allPackages);
Q nestedClasses = allClasses.difference(topLevelClasses);

Note that in this case the Atlas Schema actually provides a tag that would be useful in this situation. The above queries could be rewritten as follows. There are however, many analysis tasks that knowledge of the structure of the graph will be essential to solving the task (so don't depend on tags to exist for everything!).

Q nestedClasses = Common.universe().nodesTaggedWithAny(XCSG.InnerClass);

Finally, its worth noting that you need to keep in mind the language you are indexing as well. This example makes sense for Java source, but if we were to index Java bytecode (using the Atlas for Jimple indexer) the concept of a top level class seems mute (because every class is a top level class!). When the Java compiler compiles the Java source, nested classes are split out into separate equivalent classes.

← Attributes and Tags | Learning Atlas | Modifying the Universe →