Basic Queries

From AtlasWiki
Revision as of 11:54, 17 February 2016 by Kscott (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This tutorial assumes you have the HelloWorld Java project in your workspace and have just completed the Learning Atlas Setup tutorials.

If you haven't already entered the following queries in the Atlas Shell. Don't worry what they mean right now, by the end of this tutorial you will!

var app = universe.project("HelloWorld").contained()
var appMethods = app.nodesTaggedWithAny(XCSG.Method)
var initializers = app.methods("<init>") union app.methods("<clinit>")
var constructors = app.nodesTaggedWithAny(XCSG.Constructor)
var callEdges = universe.edgesTaggedWithAny(XCSG.Call)
var graph = (appMethods difference (initializers union constructors)).induce(callEdges)
show(graph)

Your Eclipse should now look something like the following screenshot.

Hello-world-queries.png

Note: You can drag and drop Eclipse views around and rearrange them by pinning views to different areas in Eclipse.

Let's take a look at the graph that Atlas computed for us in response to our queries. We see a graph with 7 methods (A, B, ... G). We also see that all of these methods are public (green dot icon) static (S icon) methods. We see that all methods are declared inside the class MyClass. MyClass is declared by the package com.example which is declared in the project HelloWorld. Most importantly, we see that many of the methods have a directed call relationship to other methods. Atlas has recorded and abstracted the relationships in the MyClass.java source code file. You can see in the source that method A calls method B in its body. In the graph this is represented by a call edge from the A method node to the B method node.

As a developer if I wanted to know what methods method A calls it would be pretty easy to open up the MyClass.java source code file, find the method and read its source. However if I wanted to know what methods called method A I would have to search all source files in my project since method A could be called from anywhere. With Atlas you can use the graph abstraction to easily answer this question. First we start at method A on the graph and follow all call edges backwards to find the methods that call method A. In this case no methods call method A. If we look at the graph node for method B we quickly learn that method B is called by methods A and C. This graph with call edges is typically called a "call graph". A graph that starts at a set of method nodes and expands backwards along call edges is called a "reverse call graph". Call graphs show a rough granularity of program control flow at the method level. Atlas tracks many relationships between program artifacts, which we will explore more at a later time.

Query Language

Atlas indexes Eclipse projects in the workspace, and produces an index, which is essentially a graph data structure containing all the necessary program artifacts and relationships needed to perform program analysis. The entire graph is usually referred to as the "universe". The universe graph may be accessed directly via Common.universe(). Note that on the Atlas Shell we can just type universe instead of Common.universe(), because we have automatically imported the class Common and the ., (, and ) characters are implicit.

The Atlas query language is an internal DSL embedded within Java. The primary interface used to build queries is Q. Queries written using Q are evaluated, yielding a Graph, which is a subset of the "universe". One may write queries entirely in terms of Graph, but most routine queries are easier to write using Q. This tutorial focuses on the common uses of Q (query) objects.

Queries are methods of a Q object that tend to follow the form graph.operation(orgin). Here "graph" is the universe or a subgraph of the universe that the graph traversal will operate in. The "operation" is the Q method to perform in the context of the subgraph. The "origin" is Q that defines where the operation (typically a graph traversal) will begin. Note that Q’s can be chained together to form more complex queries.

In the sections below we are going to explore some of the most common Q operations in detail. To follow along you should declare a few variables to represent different methods in our example graph. Declare these variables by running the following queries in the Atlas Shell.

// create a query for each method in the HelloWorld project
// recall: var app = universe.project("HelloWorld").contained()
var A = app.methods("A")
var B = app.methods("B")
var C = app.methods("C")
var D = app.methods("D")
var E = app.methods("E")
var F = app.methods("F")
var G = app.methods("G")

In the examples below you will need to pass the query to the show method to display the graph. Example: show(graph.forward(D)).

graph.forward(origin)

Selects the subgraph reachable from the given nodes using a forward transitive traversal. This query includes the origin in the resulting graph.

graph.forward(D) will output the graph D->E and D->G.

graph.forward(C) will output the graph C->B->C, C->D->E and C->D->G.

graph.forwardStep(origin)

Selects the subgraph reachable from the given nodes along a path length of 1 in the forward direction. This query includes the origin in the resulting graph.

graph.forwardStep(D) will output the graph D->E and D->G.

graph.forwardStep(C) will output the graph C->B and C->D.

graph.forwardStep(F) will output a graph with only F.

graph.successors(origin)

Selects the successors reachable from the given nodes along a path length of 1. By definition this query does not include the origin unless a node succeeds itself (has a self pointer). The final result never includes edges.

graph.successors(C) will output a graph with only D and B.

graph.successors(F) will output an empty graph.

graph.reverse(origin)

Selects the subgraph reachable from the given nodes using reverse transitive traversal. This query includes the origin in the resulting graph.

graph.reverse(D) will output the graph D<-C<-B<-A and D<-C<-B<-C.

graph.reverse(C) will output the graph C<-B<-A and C<-B<-C.

graph.reverseStep(origin)

Selects the subgraph reachable from the given nodes along a path length of 1 in the reverse direction. This query includes the origin in the resulting graph.

graph.reverseStep(D) will output the graph D<-C.

graph.reverseStep(C) will output the graph C<-B.

graph.predecessors(origin)

Selects the predecessors reachable from the given nodes along a path length of 1. By definition this query does not include the origin unless a node precedes itself (has a self-pointer). The final result never includes edges.

graph.predecessors(C) will output a graph with only B.

graph.predecessors(F) will output an empty graph.

graph.between(from, to)

Selects the subgraph such that the given nodes in to are reachable from the nodes in from using forward traversal. Note this query selects all paths in the graph between the “from” and “to” nodes.

graph.between(C, A) will output an empty graph.

graph.between(C, E) will output the graph C->D->E, C->B->C.

graph.betweenStep(from, to)

Selects the subgraph such that the given nodes in to are reachable from the nodes in from using forward step traversal.

graph.betweenStep(C, D) will output the graph from C->D.

graph.betweenStep(D, C) will output an empty graph.

graph.betweenStep(C, E) will output an empty graph.

Note: We can intuitively say that if there is a direct edge from the 'from' node to the 'to' node then betweenStep() will output the nodes with the edge between them. A possible implementation of betweenStep could be:

graph.forwardStep(from).intersection(graph.reverseStep(to))

graph.union(otherGraphs...)

Yields the union of this graph and the given graphs. That is, the resulting graph's nodes are the union of all nodes, and likewise for edges.

B.union(C) outputs a graph with nodes B and C.

A.union(B, C) outputs a graph with nodes A, B, and C.

graph.union(C) outputs the entire graph (shown previously).

graph.difference(otherGraphs...)

Select the current graph, excluding the given graphs. Note that, because an edge is only in a graph if its nodes are in a graph, removing an edge will necessarily remove the nodes it connects as well. Removing either node would remove the edge as well. This behavior may seem counter-intuitive if one is thinking in terms of removing a single edge from a graph. Consider the graphs: g1: a->b->c and g2: a->b. Then g1.difference(g2) yields the graph containing only node c: because b is removed, so b->c is also removed. In general, this operation is useful for removing nodes from a graph, but may not be as useful for operating on edges.

B.difference(C) outputs a graph with only node B.

B.difference(A, B) outputs an empty graph.

graph.difference(C) outputs the figure above without the node C and any edges entering or leaving node C.

graph.differenceEdges(otherGraphs...)

Selects the current graph, excluding the edges from the given graphs.

graph.differenceEdges(universe.edgesTaggedWithAny(Edge.CALL)) outputs a graph with only the nodes A,B,C,D,E,F,G.

graph.intersection(otherGraphs...)

Yields the intersection of this graph and the given graphs. That is, the resulting graph's nodes are the intersection of all node sets, and likewise for edges.

A.intersection(B) outputs an empty graph.

graph.intersection(C) outputs a graph with only the node C.

graph.leaves()

Selects the nodes from the given graph with no successors.

graph.leaves() outputs a graph with the nodes E, F, and G.

graph.roots()

Selects the nodes from the given graph with no predecessors.

graph.roots() outputs a graph with the nodes A, F.

subgraph.induce(graph)

Adds edges from the given graph to the subgraph.

var subgraph = B.union(C)

subgraph.induce(graph) will induce the edges between B and C and gives the output graph B->C->B.

var subgraph2 = B.union(F)

subgraph2.induce(graph) will result in a graph with nodes B and F (with no new edges added since no edges between B and F existed in the graph).

graph.nodesTaggedWithAny(...)

Selects the subgraph where nodes are tagged with at least one of the given tags. The final result contains only nodes.

graph.nodesTaggedWithAny(Node.METHOD) will output a graph with only the METHOD nodes A,B,C,D,E,F,G.

graph.nodesTaggedWithAny(Node.CLASS) will output an empty graph because graph does not contain CLASS nodes.

graph.nodesTaggedWithAny(Node.METHOD, Node.CLASS) will output a graph with only the METHOD nodes A,B,C,D,E,F,G.

graph.nodesTaggedWithAll(...)

Selects the subgraph where nodes are tagged with all of the given tags. The final result contains only nodes.

graph.nodesTaggedWithAll(Node.METHOD) will output a graph with only the METHOD nodes A,B,C,D,E,F,G.

graph.nodesTaggedWithAll(Node.CLASS) will output an empty graph because graph does not contain CLASS nodes.

graph.nodesTaggedWithAll(Node.METHOD, Node.CLASS) will output an empty graph because graph does not contain nodes that are both CLASS and METHOD nodes.

graph.edgesTaggedWithAny(...)

Selects the subgraph where edges are tagged with at least one of tags. Since an edge is defined as a relationship between two nodes this query returns a graph with nodes and edges. Note this will also it will return all the nodes in graph.

graph.egdesTaggedWithAny(Edge.CALL) will generate the graph shown in the figure above.

graph.edgesTaggedWithAll(...)

Selects the subgraph where edges are tagged with all of the given tags. Since an edge is defined as a relationship between two nodes this query returns a graph with nodes and edges. Note this will also it will return all the nodes in graph.

graph.egdesTaggedWithAll(Edge.CALL) will generate the graph shown in the figure above.

graph.retainNodes()

Selects all nodes from the graph, ignoring edges.

graph.retainNodes() will output a graph with only the nodes A,B,C,D,E,F,G.

graph.retainEdges()

Retain only edges, retaining only those nodes directly connected to a retained edge.

graph.retainEdges() will return the figure above without the node F because the node F does not have a call edge.

Chaining Queries

Remember that we can chain queries together to form more complex queries and that Q objects may contain multiple nodes and edges (so an origin can include multiple starting points). Let’s take a look at the queries we started the example with again.

1) First create a subgraph (called app) of the universe that only contains nodes and edges contained in the HelloWorld project.

var app = universe.project("HelloWorld").contained()

2) From the app subgraph we select all nodes that are methods.

var appMethods = app.nodesTaggedWithAny(XCSG.Method)

3) From the app subgraph we select all nodes that are methods that are named "<init>" (instance initializer methods) or "<clinit>" (static initializer methods). We are using a query method called methods(String methodName) that selects methods that have a name that matches the given string. We will explore more query methods in the next section.

var initializers = app.methods("<init>") union app.methods("<clinit>")

Note that since this is Scala the ".", "(", ")", and ";" characters are implied. So the above could also be written as the following:

var initializers = app.methods("<init>").union(app.methods("<clinit>"));

4) From the app subgraph we select all nodes that are constructors.

var constructors = app.nodesTaggedWithAny(XCSG.Constructor)

5) From the universe create a subgraph (called callEdges) that only contains nodes and edges connected by a call relationship.

var callEdges = universe.edgesTaggedWithAny(XCSG.Call)

6) Define graph to be the methods in the app ignoring initializers and constructors with call edges added in where they exist.

var graph = (appMethods difference (initializers union constructors)).induce(callEdges)

7) Display the graph.

show(graph)

Query Evaluation

Note that until you called show(graph) you were only defining and chaining together Q's. The show(graph) is the step that actually evaluates your queries, up until that point you are simply defining a recipe for a graph. You can think of Q's as a way to specify constraints to a constraint satisfaction problem (CSP) that is solved when the graph is finally evaluated (in this example when show(graph) is invoked). Just like most CSPs there are many ways you could specify the constraints that get you the same answer, some more efficient than others. As you are writing queries consider the cost and whether you could write the query in a more efficient manner. As a rule of thumb, if you are able to reduce the size of your node/edge sets early (i.e. perform your main query on a smaller subgraph of the universe) then your queries are likely to be much more efficient. In later sections we will be pointing out optimizations that you could make to your queries as we go.

Declarative Content

Let's discuss one last point with this example before we move on. When you called show(graph) you saw 7 method nodes displayed in the graph, but the graph that got displayed actually has 10 nodes! An astute reader would point out that there are nodes for the project, package, and class displayed in the graph as well. With that knowledge, let's ask ourselves how many edges are displayed in this graph? That's sort of a trick question because there are only 6 call edges displayed in the graph, but there are contains edges from the project to the package, the package to the class, and the class to each method that have not been displayed, but must exist in the graph that was shown. When you called show(graph), Atlas added one extra query on top of your query to traverse backwards along contains edges (i.e. show(universe.edgesTaggedWithAny(XCSG.Contains).reverse(graph))) to provide you some context for what classes, packages, and projects your results were in. The contains edges are the only edges not shown explicitly in Atlas graphs. Instead they are shown implicitly by nesting the nodes within the declaring parent node (sort of like Russian dolls). To disable this feature call show with the extra parameter set to false to disable extending the context.

show(graph, extend=false)

Your resulting graph will now show only the nodes and edges contained in the graph query.

NoExtendGraph.png


← Atlas Shell | Learning Atlas | Common Queries →