Using the Atlas Shell

From AtlasWiki
Revision as of 18:52, 6 March 2014 by Admin (Talk | contribs) (Data Flow)

Jump to: navigation, search

The easiest way to get started with Atlas is Smart Views, but sometimes you need a different kind of graph, statistics, or even your own custom tooling. That's where the Atlas Interpreter comes in.  The interpreter allows you to execute commands interactively using our Scala-based scripting language or to write scripts using Java, Scala, or any other JVM compatible language.

Configuring an Interpreter Project

The Atlas Interpreter requires a Scala project in the workspace for context (mostly for establishing the classpath). This project can also be used to build up longer scripts and utilities, which may be written in either Java, Scala, or any mix of JVM compatible languages.

To configure the interpreter project please follow the instructions in the video below. You can download the zip file referenced in the video here: Example-Interpreter-Plug-in.zip.

Importing a Scala Project Template

For convenience, a sample Scala project is supplied in this Example Interpreter zip file. You can import the project from the zip file using the steps below.

  • Menu: File -> Import...
  • Select "Existing Projects into Workspace" under "General"
  • Use "Select archive file" and pick the zip file above.
  • Select Finish.

If you would prefer to create the Scala project manually, refer to the section Manually Creating a Scala Project below.

Manually Creating a Scala Project for use with the Interpreter

Create a Plug-in project, then convert it into a Scala project.

Then, to set the classpath for use with Atlas, open the manifest and add as dependencies:

  • the plugins included with the Scala IDE (org.scala*, scalariform)
  • the Atlas plugins
    • com.ensoftcorp.atlas.java.core
    • com.ensoftcorp.atlas.java.ui
    • com.ensoftcorp.atlas.java.interpreter

Finally, disable indexing of the Scala project from the menu item Atlas -> Manage Project Settings.

Executing Commands Interactively

Once you have configured an interpreter project right click on the project and select the menu item: Atlas -> Open Interpreter. The interpreter will open and automatically index your project if the Atlas Smart View has not done so already.

Screenshot of the Atlas Interpreter

Let's try a simple command.  Type in the command below and then press enter.  Now you know how many private fields are in your indexed projects.

index.nodesTaggedWithAll(Node.IS_PRIVATE,Node.FIELD).eval.nodes.size

To create custom graphs, scripts, and much more follow the Query Language Tutorial below.

What kind of information can I query?

Atlas indexes source code in the workspace, and produces an index which is essentially a graph. The entire graph is usually referred to as the "index".  The Atlas query language is an internal DSL embedded within Java. The primary interface used to build queries is Q. Queries written using Q are evaluated, yielding a Graph, which is a subset of the index. One may write queries entirely in terms of Graph, but most routine queries are easier to write using Q.

The index currently contains:

  • The major declarations (projects, types, packages, fields, methods), and associated relationships, such as the type hierarchy.
  • A "summary" graph, which includes method and control-flow-granularity relationships, such as calls and reads/writes of fields, among other things.
  • Control flow.
  • Data flow.

Whether using Graph or Q, most queries will involve use of the constants defined in Attr, which essentially forms the schema for the index. Studying the javadocs for this interface is highly recommended. In addition to the above, the Common class provides a handful of convenience methods for writing queries. In particular, Common.index() is the starting point for a query based on Q.</p>

Additional documentation about the query language, graph data structures and more can be found in the Atlas javadocs.

New users are encouraged to review the Query Language Tutorial below.

Query Language Tutorial

The following is a brief tutorial on the Atlas query language. To follow along, you will need a Scala project for the interpreter and the sample project code to index. You can import them into your workspace from the following zip files:

To import each project from the zip file using the steps below.

  • Menu: File -> Import...
  • Select "Existing Projects into Workspace" under "General"
  • Use "Select archive file" and pick the zip file above.
  • Select Finish.

This tutorial is divided into sections.

Scala Mini-tutorial

This section assumes you have already performed the setup steps mentioned in start of this guide.

The Atlas interpreter is a Scala interpreter with access to the Atlas index and API. Once you learn a few basics about Scala syntax, you will find it easy to write queries for Atlas. The Atlas API is written in Java, but since Scala seamlessly interoperates with Java, you have access to the full API, including the Java API itself.

To write a query in the interpreter, the expression is written in Scala. The first thing we will do is define a new variable. Copy the following into your interpreter, and push enter.

var i = 42

That's it - you've just written something in Scala! The result of interpreting the expression will appear below,

i: Int = 42

Notice that you did not have to specify the type of i - it was inferred from the value assigned to it. The type of i appears after the :. Note that Int is the Scala equivalent of Java's primitive int.

Scala is a strongly-typed language. If you wish to declare the type of a variable, you add it after the variable name.

var d:Double = 42;

Once defined, you may use the variable.

i = i + 17

You may also find it helpful to define a method. You can do this using def. The following defines a method called inc which takes an Int and returns that plus 17.

def inc(i:Int) = { i + 17 }

The equivalent in Java is:

int inc(int i) { return i + 17; }

In the Scala version, note the = between the method name and the body - this is what returns the value of the last expression in the body.

Once you have the method defined, you can use it like any other method.

inc(3)

You may have noticed the lack of semi-colons. In Scala, the semi-colon tends to be optional, but when defining a method on a single line, you will find it helpful.

Parenthesis for method calls are not always required either; however, for clarity, the tutorials will use parenthesis.

On to Basic Atlas Queries.

Basic Atlas Queries

This section assumes you have already performed the setup steps mentioned in start of this guide.

Before writing any queries, the Atlas has to index the workspace. When you start the interpreter, it will generally trigger indexing (or loading of a previous index). If you need to refresh the index, you can do so by typing:

indexWorkspace

If you need to clear any variables you've defined, you can restart the interpreter with the toolbar button or by typing:

:restart

Note: you should restart the interpreter after re-indexing the workspace. This removes any variables which might be pointing to the previous index.

Your First Query

Your first Atlas query: ask for everything. Go ahead, it won't take long, we promise.

index

In response, you'll see something similar to:

res0: com.ensoftcorp.atlas.java.core.query.Q = <Atlas query expression>

What happened?

First, we did not specify what to assign the result to, so the interpreter helpfully made a new variable called res0 for the result. The type of the result is com.ensoftcorp.atlas.java.core.query.Q, and the string representation is <Atlas query expression>.

Evaluating the Index

The expression index is a starting point for Atlas queries based on Q. In short, Q is a way to build up expressions which specify what you want, but nothing actually happens until you evaluate and start enumerating the result.

Evaluating Q will yield a Graph, which has a set of nodes and a set of edges. Try that next.

index.eval()

The result is a Graph.

res1: com.ensoftcorp.atlas.java.core.db.graph.Graph = ....

Let's find out how many nodes and edges are in the index, shall we?

var g = index.eval();
g.nodes().size();
g.edges().size();

You should see counts for the number of nodes and edges, respectively.

The key point here is that we write queries using Q, and eval() will return a Graph with nodes and edges.

Seek and Display

Let's try some more interesting queries.

The Q interface contains a variety of methods for building expressions. These methods will return the query, allowing you to chain expressions together.

Starting from the index, we can select all types named "Base", using the query Q.types(java.lang.String). By "types", we include classes, interfaces, enums and annotations.

// all types named Base
var t = index.types("Base")

Let's display the result in a graph editor. The show() method takes a Q.types(java.lang.String) directly, so no need to evaluate it first.

show(t)

Basic-type.png

You will see the class Base, along with it's parent package and project. This is because show() is automatically providing some visual context by including the nodes which point to the result via an Attr.Edge.DECLARES edge.

If you want to see the exact result of your query, which is sometimes helpful for debugging, you can tell show() not to extend your result.

show(t, extend=false)

Basic-type-no-extend.png

The above syntax, extend=false is the Scala way of passing a parameter value by naming the parameter. By default, extend is true. There are other parameters which can be set in this manner, including highlighter and title.

Note: show() is defined in com.ensoftcorp.atlas.java.iterpreter.lib.Common. You can look at the code for show() yourself by using using Eclipse's Open Type feature (usually Ctrl-Space-T).

Attributes and Tags

The types() query is a convenience for searching for all nodes with a given name and kind (i.e. Java types). By way of introducing other queries, we'll break this down into steps and search for both aspects separately.

Recall that evaluating Q results in a Graph. Graphs have nodes and edges, which are represented by GraphElement. GraphElement has both attributes and tags. The values of attributes and tags can be specified using queries, and we'll need both to implement our own types() query.

The name of the node is encoded as an attribute. An attribute key is a string, but best practice is to use the constant defined in the Attr interface. In this case, we need the key Attr.Node.NAME.

The kind of a node is encoded as a tag. A tag is just a string, but again, best practice is to use the constant. For historical reasons, constants for tags are also found in the Attr interface. The tag for Java types is Attr.Node.TYPE.

Now we can find all the types with a given name step by step. First, select all the nodes with a given name using the query for node attributes, Q.selectNode().

// select by attribute
var named = index.selectNode(Attr.Node.NAME, "Base")

Next, select all nodes tagged Attr.Node.TYPE using the query Q.nodesTaggedWithAny().

// select by tag
var tagged = index.nodesTaggedWithAny(Attr.Node.TYPE)

Finally, take the intersection.

var result = named.intersection(tagged);
show(result)

Basic-type.png

We now have the same result as if we had used Q.types() directly.

Note: the interfaces Attr, Attr.Node and Attr.Edge are automatically imported, so one can simply write Node.TYPE in a query. We will continue to fully qualify the constants for the remainder of this section for clarity.

Chaining

Queries can build on one another directly. The following query has the same effect as the query in the previous section, accomplished in a slightly different way. From the index, the graph is reduced first by tags. From within that graph, nodes named "Base" are selected.

var result = index.nodesTaggedWithAny(Attr.Node.TYPE) .selectNode(Attr.Node.NAME, "Base")

Walking the Graph

Now that we have a way to query for the type called "Base", what can we do with it? A simple thing to do is to get the type hierarchy.

In consulting the Attr interface, we find that Attr.Node.TYPE nodes have edges tagged Attr.Edge.SUPERTYPE which point to their parent type. How can we query for the subtypes of "Base"?

We can walk the Attr.Edge.SUPERTYPE edges down to the subtypes. Since a type points to its supertype, we walk the edges in the reverse direction to get the subtypes. To walk the edges, we will use the Q.reverse() query.

But first, we have to set up the Graph in which the traversal will take place. The easiest way to walk over a certain kind of edge is to set up a filter on the edges. We accomplish this using the Q.edgesTaggedWithAny() query.

The edges we are interested in are tagged Attr.Edge.SUPERTYPE.

Now, put it all together.

// get a reference to Base
var base = index.types("Base")
 
// filter edges
var edgesSupertype = index.edgesTaggedWithAny(Attr.Edge.SUPERTYPE)
 
// walk to the subtypes from Base)
var result = edgesSupertype.reverse(base)
show(result)

Basic-supertypes.png

That's it - you now have a type hierarchy starting from "Base", and it's subtypes.

Note: in the future, you can use the convenience method Common.edges() to filter edges. The equivalent query is:

var base = index.types("Base")
var result = edges(Attr.Edge.SUPERTYPE).reverse(base)
show(result)

Finding Methods Under a Type

Now that we know how to walk edges, let's query for everything under a type. Parent/child relationships between nodes are represented using edges tagged Attr.Edge.DECLARES.

To expand the graph to include all the declarations immediately under a type, we can walk a single step along the Attr.Edge.DECLARES edges, using the Q.forwardStep() query.

var base = index.types("Base")
var result = edges(Attr.Edge.DECLARES).forwardStep(base)
show(result)

Basic-methods-under-type.png

You should see several methods declared by "Base", including the method "helloWorld()". If you double-click on the method, Atlas will open and highlight the corresponding source code.

Next...

On to Intermediate Atlas Queries.

Intermediate Atlas Queries

This section assumes you have already performed the setup steps mentioned in start of this guide.

A Forward Call Graph

Getting a forward call graph is a traversal, much like walking edges to get the type hierarchy.

For this example, we will start the call graph from the method com.ensoftcorp.atlas.java.example.project.Flow.foo(). That's a lot to type in. Since the method name is reasonably unique, we can use the convenience method Common.methodSelect() , which allows us to partially qualify the method name.

var methodFoo = methodSelect("Flow", "foo");
var cg = edges(Edge.CALL).forward(methodFoo);
show(cg);

Int-cg.png

Adding Some Color

A traversal might result in a fairly large graph, so it may help to add some color to call attention to where the traversal started. You can do that with a Highlighter. A Highlighter associates a color with a query expression, which is then passed to show() via the highlighter parameter.

// call graph query
var methodFoo = methodSelect("Flow", "foo");
var cg = edges(Attr.Edge.CALL).forward(methodFoo);
 
// highlight the origin in red
var h = new Highlighter();
h.highlight(methodFoo, java.awt.Color.RED);
 
// display
show(cg, highlighter=h);

Int-cg-color.png

A Reverse Call Graph, Part 1

Reverse call graphs are specified in similar way as a forward call graph, walking backwards along Attr.Edge.CALL edges.

var methodFoo = methodSelect("Flow", "print");
var cg = edges(Attr.Edge.CALL).reverse(methodFoo);
show(cg);

Int-rcg-1.png

You will notice one big difference between the forward call graph and the reverse call graph: the inclusion of control flow blocks, which appear in yellow. This is a feature: an Attr.Edge.CALL edge from a method represents the existence of one or more calls within the method. Likewise, there is an Attr.Edge.CALL edge at the control flow block granularity as well. In the next step, we'll learn how to distinguish between the two.

A Reverse Call Graph, Part 2

In the previous step, we wrote a query for a reverse call graph which included both methods and control flow blocks. In this step, we will write a query that includes only methods.

Again, an Attr.Edge.CALL edge from a method represents the existence of one or more calls within the method. Likewise, there is an Attr.Edge.CALL edge at the control flow block granularity as well.

As a convenience to distinguish between the two granularities, Atlas provides additional tags. They are:

In conjunction with the query Q.edgesTaggedWithAll(), we can quickly obtain a reverse call graph using the tag Attr.Edge.PER_METHOD.

var methodFoo = methodSelect("Flow", "print");
var callEdges = index.edgesTaggedWithAll(Attr.Edge.CALL, Attr.Edge.PER_METHOD);
var cg = callEdges.reverse(methodFoo);
show(cg);

Int-rcg-2.png

Data Flow

Suppose that you want to find out where the values in a field flow to. Atlas provides data flow edges which you can query to find out.

For this example, we will start from the field Flow.source. We will use the convenience method Common.fieldSelect() to obtain the field, which works similarly to Common.methodSelect().

Starting from the field, we can walk forward over all of the data flow edges, tagged Attr.Edge.DATA_FLOW.

var src = fieldSelect("Flow", "source");
var dataFlowGraph = edges(Attr.Edge.DATA_FLOW).forward(src);
show(dataFlowGraph);

Int-dataflow.png

In the resulting graph, the data flow nodes are colored green, and appear nested in control flow blocks, which are colored yellow.

For the example code, the forward data flow from Flow.source will show the flow being returned from getSource(), passing through foo() and bar(), ultimately ending at the field Flow.sink.

Also note that some edges are labeled local and others interprocedural. The tag Attr.Edge.DATA_FLOW overlaps several kinds of data flow edges, including:

These tags are more specific kinds of data flow edges. Edges tagged Attr.Edge.DF_LOCAL represent local data flow within a method. Edges tagged with Attr.Edge.DF_INTERPROCEDURAL represent data flow between methods, including flows to parameters, from the return statement of a method, or to and from fields.

Filling in Edges

Suppose that you have the nodes of interest, but need to fill in the edges in order to see the direct relationships between the nodes. You could use traversals, but it may be awkward to ensure that traversal does not add more nodes to the result. In these cases, it is more convenient to simply define the result in terms of the nodes and the space of edges.

By using induce(), one can specify additional edges to fill in. The edges are only added if the nodes they connect are already in the graph, so it is only necessary to specify the kinds of edges to add.

As an example, we construct a call graph by starting from the methods of interest, and add the call edges afterwards.

// obtain methods
var parentType = index.types("Flow");
var members = edges(Edge.DECLARES).forwardStep(parentType);
var methodsInFlow = members.nodesTaggedWithAny(Attr.Node.METHOD);
 
// fill in call edges
var callEdges = index.edgesTaggedWithAll(Attr.Edge.CALL,  Attr.Edge.PER_METHOD);
var cg = methodsInFlow.induce(callEdges);
show(cg);

Int-induce.png

The resulting call graph includes only methods from the class Flow, and the call relationships between them.

Focusing on Edges

At some point, you may wish to see all the edges of a particular kind, without specifying the nodes that they are connected to.

You will recall that edges can be filtered using the query Q.edgesTaggedWithAny(). If you display the result of such a query, you will notice that all of the nodes in the index are included. This is by design - it ensures that the starting point of a traversal query is always included in the answer, even if the starting point itself is not connected along the edges of interest.

To trim the graph to just the nodes which are connected to an edge in the graph, use the query Q.retainEdges. For a small project, this is a useful way to get a better understanding of how particular kinds of edges connect.

For example, to see all the local and interprocedural data flow edges (and the nodes they connect) in the sample project, try the following.

var dataFlowEdges = index.edgesTaggedWithAny( Attr.Edge.DF_INTERPROCEDURAL, Attr.Edge.DF_LOCAL);
var connected = dataFlowEdges.retainEdges();
show(connected);

<a href="http://cdn.ensoftcorp.com/atlas_docs/user-guide-files/int-edges.png" target="blank"><img style="max-width: 100%; vertical-align: middle;" src="http://cdn.ensoftcorp.com/atlas_docs/user-guide-files/int-edges.png" alt="" /></a>

Troubleshooting

General

Atlas logs errors and status messages to the Eclipse Error Log. If there are problems indexing the code, or if there are more serious runtime errors, you should check the log. The Eclipse Error Log can be opened in Eclipse using the Window menu: Window -> Show View -> Other... Then, under General, select the "Error Log" view.

Atlas Interpreter

Query results are not as expected

There a lot of reasons why a query might not return the results you expected. Here are a few basic steps to try next:

  • If the code which you indexed with Atlas changed, you may need to index the code again. You can do this via the Atlas menu: Atlas -> Index Workspace
  • If you re-index, the interpreter may be referring to data from a prior index. Restart the interpreter by enterting :restart in the interpreter prompt.
  • If the code in your interpreter plug-in changed, you may need to restart the interpreter to load the latest version of the code by enterting :restart in the interpreter prompt..

If you suspect a logical error in the query, you should try breaking the query into smaller parts, and display each step separately.

</div>