Difference between revisions of "Discovering Valid Java Main Methods"
BenHolland (Talk | contribs) (→Analysis Step 4) Select methods that only take one parameter) |
BenHolland (Talk | contribs) (→Analysis Step 4) Select methods that only take one parameter) |
||
Line 229: | Line 229: | ||
// methods with no parameters will not have a PARAM edge (and won't be reachable from parameters) | // methods with no parameters will not have a PARAM edge (and won't be reachable from parameters) | ||
mainMethods = paramEdgesInContext.predecessors(mainMethodParams); | mainMethods = paramEdgesInContext.predecessors(mainMethodParams); | ||
− | // methods with 2 or more parameters will have at least one | + | // methods with 2 or more parameters will have at least one parameter node with attribute |
+ | // PARAMETER_INDEX == 1 (index 0 is the first parameter) | ||
Q mainMethodSecondParams = mainMethodParams.selectNode(Node.PARAMETER_INDEX, 1); | Q mainMethodSecondParams = mainMethodParams.selectNode(Node.PARAMETER_INDEX, 1); | ||
Q methodsWithTwoOrMoreParams = mainMethodSecondParams.predecessors(mainMethodSecondParams); | Q methodsWithTwoOrMoreParams = mainMethodSecondParams.predecessors(mainMethodSecondParams); |
Revision as of 16:22, 6 February 2015
The Toolbox Commons project defines an Analyzer
interface that encapsulates the logic for traversing a program graph to extract an "envelope" (a subgraph that is either empty if a property is satisfied or non-empty containing the necessary information to locate the violation of the property). Analyzers encapsulate their descriptions, assumptions, analysis context, and analysis logic. Of course you can define your own "Analyzer" simply by writing a program with your analysis logic, but we find this abstraction helps keep code organized when contributing to a toolbox project.
Contents
Development Process
Let's start off with a simple analysis goal. Write an analyzer that discovers all valid Java main methods in a program. We might want to discover main methods to locate developer test code or alternate entry points into an application.
Step 1) Understanding the problem
The first step is always to ask yourself if you really understand the problem. Now is the time to do some background research. Can main methods be located in inner classes? Can main methods be final? Can main methods return anything other than void? This blog has enumerated through several variations on main methods.
Step 2) Developing test cases
A little upfront work to create a decent test set will likely save you a lot of time in the development of your analyzer since you will be able to quickly identify the cases you are not handling correctly. For this tutorial we've already created an application with several test cases for you! Just checkout the https://github.com/EnSoftCorp/LearningAtlas repository and import the MainMethodTestCases
Eclipse project into your workspace. Since you didn't have to go through the work of developing the test cases yourself, its probably a good idea to go through the test cases now. We have created several Java classes, each with a main method. The classes with valid main methods are in the com.example.valid
package, whereas the classes with invalid main methods are in the com.example.invalid
package.
Step 3) Create Analyzer
From the exercise of going through steps 1 and 2, we can now create a new Analyzer and document our assumptions (at least the assumptions we've made so far).
In the Starter Toolbox create a new class in the toolbox.analysis.analyzers
package. Name the class DiscoverMainMethods
and extend the com.ensoftcorp.open.toolbox.commons.analysis.Analyzer
base class.
Let's take this opportunity to fill out the information that we know so far. We learned that Java main methods must be public, static, void methods named "main" (case-sensitive). Main methods take a single parameter of a one-dimensional String array. The main method may optionally be marked as final, synchronized, or strictfp. After documenting this information, your analyzer should look something like the following.
package toolbox.analysis.analyzers; import com.ensoftcorp.atlas.core.query.Q; import com.ensoftcorp.open.toolbox.commons.analysis.Analyzer; public class DiscoverMainMethods extends Analyzer { @Override public String getName(){ return "Discover Main Methods"; } @Override public String getDescription() { return "Locates valid Java main methods."; } @Override public String[] getAssumptions() { return new String[]{"Main methods are methods.", "Main methods are case-sensitively named \"main\"", "Main methods are public.", "Main methods are static.", "Main methods return void.", "Main methods take a single String array parameter", "Main methods may be final.", "Main methods may have restricted floating point calculations.", "Main methods may be synchronized."}; } @Override protected Q evaluateEnvelope() { // TODO: Implement return null; } }
Step 4) Develop and Debug Analyzer Logic
The evaluateEnvelope
method is where we put our analysis logic. The Analyzer
base class defines a method getEnvelope
that lazily evaluates the result of your analysis defined in the evaluateEnvelope
method and caches the result for later. Future calls to getEnvelope
return the cached result.
We can run our analyzer on the Atlas shell by instantiating a new DiscoverMainMethods
object and calling the getEnvelope
method.
var analyzer = new DiscoverMainMethods() var envelope = analyzer.getEnvelope() show(envelope)
The show
method should fail here because our getEnvelope
currently returns null. We've broken out the steps to implement our analysis logic into five steps, but keep in mind there is more than one way to implement this analyzer! If you are ambitious, you could try stopping here and implementing your own analyzer then comparing your solution with ours.
Analysis Step 1) Select public static methods
Let's start off our implementation by making a set of all the public static methods we can find in the program graph. These three properties are all Atlas Tags so we can query for them directly. Node we use nodesTaggedWithAll
here instead of nodesTaggedWithAny
because we want nodes that are public and static and methods.
protected Q evaluateEnvelope() { // Step 1) select nodes from the index that are marked as public, static, methods Q mainMethods = Common.universe().nodesTaggedWithAll(Node.IS_PUBLIC, Node.IS_STATIC, Node.METHOD); return mainMethods; }
If we run our analyzer now, we should be returning all public static methods in the universe. Let's test it out. If you haven't already, save your DiscoverMainMethods.java
file. Now reload the Atlas Shell. Its important to reload the Atlas shell at this point because if you don't you will be running the version of DiscoverMainMethods
that was compiled the last time you reloaded the shell. After reloading the shell, run the following query.
show(new DiscoverMainMethods().getEnvelope())
After running the query, you will notice a Eclipse job progress monitor in the bottom right corner of the Eclipse window as shown below.
If you included Jar Indexing in your Atlas preferences, then this job will be very slow. This is because you are trying to display a very large graph! Displaying a large graph should be avoided because it will be too large for a human to understand anyway. Let's cancel this job. Click on the the Eclipse job progress icon in the bottom right hand corner of the Eclipse window or open the Eclipse Progress view. Now click on the red cancel job button to the right of the task name to cancel the job as shown in the screenshot below.
Let's find out just how big that graph was. Run the following query in the Atlas Shell to count the number of nodes in the graph.
CommonQueries.nodeSize(new DiscoverMainMethods().getEnvelope())
With Jar Indexing, this graph is about 10,000 nodes (no edges). In general a good rule of thumb would be don't try show graphs with anything more than 100 nodes. You will end up spending too much time trying to understand a graph visually if it is too large. Instead focus on refining your queries to produce a graph with only the information you need to understand the result.
In our case, we probably don't care about main methods in the Java runtime libraries, so let's refine our queries to exclude those results. The Analyzer
base class defines two default analysis contexts for us; context
(defaults to Common.universe()
) and appContext
(defaults to SetDefinitions.app()
). An analyzer "context" defines a subgraph of the universe that the results should be calculated in. Given the Analyzer
context
the appContext
is automatically calculated as the intersection between SetDefinitions.app()
and the given context
. These analysis contexts can be changed after the Analyzer
is instantiated with the setContext
method, see the Analyzer
Javadocs for Toolbox Commons for more details.
Let's use the appContext
to refine our query.
protected Q evaluateEnvelope() { // Step 1) select nodes from the index that are marked as public, static, methods Q mainMethods = appContext.nodesTaggedWithAll(Node.IS_PUBLIC, Node.IS_STATIC, Node.METHOD); return mainMethods; }
Reload your Atlas Shell, and run the following.
var analyzer = new DiscoverMainMethods() var envelope = analyzer.getEnvelope() CommonQueries.nodeSize(envelope)
Since the graph is small, let's show it.
show(envelope)
You should see the following graph.
Analysis Step 2) Select methods named "main"
Currently we select all public static void methods, but we are also selecting methods that are not named "main". Notice that the class Main17
contains a public static method named Main
that is included in our results. Let's examine the attributes of the selected nodes and filter based on name.
We are going to open a new Eclipse view to help inspect the attributes and tags of a GraphElement
. Navigate to Eclipse
or Window
> Show View
> Other...
> Atlas
> Element Detail View
. Click on the "Main" method in the Main17
class in the graph we displayed in Step 1 (if you closed the graph already, you will need to display it again). Notice that as you click on elements in the graph, the Element Detail View updates to show you the different attributes and tags applied to the selected GraphElement
. Notice also the value for the node name is different for public static method in the Main17
class. We will use this knowledge to filter out methods with the wrong method name.
Note: We could also access this information on the Atlas Shell using the selected
variable.
var graphElement = selected.eval().nodes().getFirst()
Now we can use the selectNode
query to select nodes that match the given attribute key and value. Add the following code to your evaluateEnvelope
method.
// Step 2) select nodes from the public static methods that are named "main" mainMethods = mainMethods.selectNode(Node.NAME, "main");
Test your updated analyzer by running the following Atlas Shell commands and noting that the method in Main17
is no longer included in the result. Don't forget to save your analyzer code and reload the Atlas Shell first! This is the last time we will remind you to reload the shell after editing your analyzer.
show(new DiscoverMainMethods().getEnvelope())
Analysis Step 3) Select methods that return void
By looking at the graph our analyzer is generating at the end of Step 2, we see that we are still failing test cases related to non-void return types and incorrect method parameters. Let's tackle the return type test case (Main22
) now.
Atlas defines an abstraction of a "void" type (even though a void type doesn't technically exist in Java, see JLS 14.8). We can select the void type with a query of Common.types("void")
. We can then find all methods that return void, by walking one step backwards along returns
edges starting at the void type. First lets create a subgraph of only returns
edges.
Q returnsEdges = context.edgesTaggedWithAny(Edge.RETURNS).retainEdges();
Note: We are grabbing edges from context
(which defaults to Common.universe() unless configured otherwise) instead off appContext
because SetDefinitions.app()
does not contain the void type and would therefore exclude the edges we are interested in.
Then within the returnsEdges
graph let's find all methods that return void. A method node has a returns
edge from the method to the return type. Since we are starting with the return type of interest (void) we should walk backwards one step along the edge to find the methods that return void. Since we are only interested in the method nodes, we can use the predecessors
query to exclude the query origin (void) in the result.
Q voidMethods = returnsEdges.predecessors(Common.types("void"));
Now that we have a set of all voidMethods, we can use the intersection of the mainMethods
set we calculated earlier and voidMethods
to remove non-void methods.
mainMethods = mainMethods.intersection(voidMethods);
Your evaluationEnvelope
method should not have the following code added to the previous queries.
// Step 3) filter out methods that are not void return types Q returnsEdges = context.edgesTaggedWithAny(Edge.RETURNS).retainEdges(); Q voidMethods = returnsEdges.predecessors(Common.types("void")); mainMethods = mainMethods.intersection(voidMethods);
Test your updated analyzer on the Atlas Shell. The method in Main22
should now be removed from the resulting envelope (graph).
Analysis Step 4) Select methods that only take one parameter
Looking at the envelope our analyzer is producing after Step 3 shows that the only test cases we are failing deal with method parameters. Let's break these test cases into two smaller problems. First some test cases fail because there are the wrong number of parameters (there should only be one parameter) and the remaining test cases fail because the parameter is the wrong type. Let's deal with the wrong number of parameters in this step by filtering out all methods that do not take exactly one parameter (test cases Main23
and Main24
).
Atlas provides param
edges from the method node to the parameter node. If you didn't know this, you could discover by selecting all the current mainMethods
nodes and running the following query on the Atlas Shell. If you are being lazy, you could just use the selected
variable here (in place of mainMethods
) by clicking to select one or more methods in the graph.
show(universe.forwardStep(mainMethods) union universe.reverseStep(mainMethods))
This query shows a graph that is one step forward and one step backward along every edge type starting at the mainMethods
. Examining this graph would reveal the presence of the param
edge.
First let's create a subgraph of param
edges to work with.
Q paramEdgesInContext = appContext.edgesTaggedWithAny(Edge.PARAM).retainEdges();
Let's use param
edges to filter out all methods that don't take any parameters and then filter out any methods that take two or more parameters (leaving us methods that only take one parameter). Methods with not parameters will not have a param
edge leaving the method node.
Starting at the set of mainMethods
and walking forward one step along param
edges gives us all parameters of the mainMethods
methods.
Q mainMethodParams = paramEdgesInContext.successors(mainMethods);
If we now start from the set of mainMethodParams
and walk backwards along the param
edges we will reach all methods that have at least one parameter. Taking the difference of mainMethods
and the reachable methods from parameters (main methods with params) leaves us with a set of methods with no parameters. We can then difference out the main methods with no parameters.
Q methodsWithNoParams = mainMethods.difference(paramEdgesInContext.predecessors(mainMethodParams)); mainMethods = mainMethods.difference(methodsWithNoParams);
Note: The code above is very verbose and tries to show its work. Since paramEdgesInContext.predecessors(mainMethodParams)
returns all main methods with at least one parameter, you could shorten the code above to the following.
mainMethods = paramEdgesInContext.predecessors(mainMethodParams);
Next let's filter methods with two or more parameters (leaving only methods with a single parameter of any type). Parameter nodes in Atlas have an attribute denoting the index of the parameter. The index count starts at zero and increments for each parameter. Selecting nodes with parameter index one gives us parameters of methods that have two or more parameters.
Q mainMethodSecondParams = mainMethodParams.selectNode(Node.PARAMETER_INDEX, 1);
Walking backwards one step along the params
edge from the parameter nodes in the mainMethodSecondParams
set leads us to all method nodes with two or more parameters. We can then simply difference out the methods with two or more parameters from the set of mainMethods
.
Q methodsWithTwoOrMoreParams = mainMethodSecondParams.predecessors(mainMethodSecondParams); mainMethods = mainMethods.difference(methodsWithTwoOrMoreParams);
After completing this step you should have added the following to you evaluateEnvelope
method.
// Step 4) filter out methods that do not take exactly one parameter Q paramEdgesInContext = appContext.edgesTaggedWithAny(Edge.PARAM).retainEdges(); Q mainMethodParams = paramEdgesInContext.successors(mainMethods); // methods with no parameters will not have a PARAM edge (and won't be reachable from parameters) mainMethods = paramEdgesInContext.predecessors(mainMethodParams); // methods with 2 or more parameters will have at least one parameter node with attribute // PARAMETER_INDEX == 1 (index 0 is the first parameter) Q mainMethodSecondParams = mainMethodParams.selectNode(Node.PARAMETER_INDEX, 1); Q methodsWithTwoOrMoreParams = mainMethodSecondParams.predecessors(mainMethodSecondParams); mainMethods = mainMethods.difference(methodsWithTwoOrMoreParams);
Now would be a good time to test your analyzer and make sure it is no longer detecting methods in Main23
and Main24
.
Analysis Step 5) Select methods that take a String array
TODO
Final Implementation
public class DiscoverMainMethods extends Analyzer { @Override public String getName(){ return "Discover Main Methods"; } @Override public String getDescription() { return "Locates valid Java main methods."; } @Override public String[] getAssumptions() { return new String[]{"Main methods are methods.", "Main methods are case-sensitively named \"main\"", "Main methods are public.", "Main methods are static.", "Main methods return void.", "Main methods take a single String array parameter", "Main methods may be final.", "Main methods may have restricted floating point calculations.", "Main methods may be synchronized."}; } @Override protected Q evaluateEnvelope() { // Step 1) select nodes from the index that are marked as public, static, methods Q mainMethods = appContext.nodesTaggedWithAll(Node.IS_PUBLIC, Node.IS_STATIC, Node.METHOD); // Step 2) select nodes from the public static methods that are named "main" mainMethods = mainMethods.selectNode(Node.NAME, "main"); // Step 3) filter out methods that are not void return types mainMethods = mainMethods.intersection(Common.stepFrom(Common.edges(Edge.RETURNS), Common.types("void"))); // Step 4) filter out methods that do not take exactly one parameter Q paramEdgesInContext = appContext.edgesTaggedWithAny(Edge.PARAM).retainEdges(); // methods with no parameters will not have a PARAM edge Q methodsWithNoParams = mainMethods.difference(Common.stepFrom(paramEdgesInContext, Common.stepTo(paramEdgesInContext, mainMethods))); // methods with 2 or more params will have at least one edge with PARAMETER_INDEX == 1 (index 0 is the first parameter) Q methodsWithTwoOrMoreParams = Common.stepFrom(paramEdgesInContext, Common.stepTo(paramEdgesInContext, mainMethods).selectNode(Node.PARAMETER_INDEX, 1)); mainMethods = mainMethods.difference(methodsWithNoParams, methodsWithTwoOrMoreParams); // Step 5) filter out methods that do not take a String array // get the 1-dimensional String array type Q stringArrays = Common.stepFrom(Common.edges(Edge.ELEMENTTYPE), Common.typeSelect("java.lang","String")); Q oneDimensionStringArray = stringArrays.selectNode(Node.DIMENSION, 1); Q mainMethodParams = CommonQueries.methodParameter(mainMethods, 0); Q validMethodParams = mainMethodParams.intersection(Common.stepFrom(Common.edges(Edge.TYPEOF), oneDimensionStringArray)); mainMethods = Common.stepFrom(paramEdgesInContext, validMethodParams); return mainMethods; } }
Alternative Implementation
TODO