Discovering Valid Java Main Methods

From AtlasWiki
Jump to: navigation, search

The Toolbox Commons project defines an Analyzer interface that encapsulates the logic for traversing a program graph to extract an "envelope" (a subgraph that is either empty if a property is satisfied or non-empty containing the necessary information to locate the violation of the property). Analyzers encapsulate their descriptions, assumptions, analysis context, and analysis logic. Of course you can define your own "Analyzer" simply by writing a program with your analysis logic, but we find this abstraction helps keep code organized when contributing to a toolbox project.

Development Process

Let's start off with a simple analysis goal. Write an analyzer that discovers all valid Java main methods in a program. We might want to discover main methods to locate developer test code or alternate entry points into an application.

Step 1) Understanding the problem

The first step is always to ask yourself if you really understand the problem. Now is the time to do some background research. Can main methods be located in inner classes? Can main methods be final? Can main methods return anything other than void? This blog has enumerated through several variations on main methods.

Step 2) Developing test cases

A little upfront work to create a decent test set will likely save you a lot of time in the development of your analyzer since you will be able to quickly identify the cases you are not handling correctly. For this tutorial we've already created an application with several test cases for you! Just checkout the https://github.com/EnSoftCorp/LearningAtlas repository and import the MainMethodTestCases Eclipse project into your workspace. Since you didn't have to go through the work of developing the test cases yourself, its probably a good idea to go through the test cases now. We have created several Java classes, each with a main method. The classes with valid main methods are in the com.example.valid package, whereas the classes with invalid main methods are in the com.example.invalid package.

Step 3) Create Analyzer

From the exercise of going through steps 1 and 2, we can now create a new Analyzer and document our assumptions (at least the assumptions we've made so far).

In the Starter Toolbox create a new class in the toolbox.analysis.analyzers package. Name the class DiscoverMainMethods and extend the com.ensoftcorp.open.toolbox.commons.analysis.Analyzer base class.

Let's take this opportunity to fill out the information that we know so far. We learned that Java main methods must be public, static, void methods named "main" (case-sensitive). Main methods take a single parameter of a one-dimensional String array. The main method may optionally be marked as final, synchronized, or strictfp. After documenting this information, your analyzer should look something like the following.

package toolbox.analysis.analyzers;
 
import com.ensoftcorp.atlas.core.query.Q;
import com.ensoftcorp.open.toolbox.commons.analysis.Analyzer;
 
public class DiscoverMainMethods extends Analyzer {
 
	@Override 
	public String getName(){
		return "Discover Main Methods";
	}
 
	@Override
	public String getDescription() {
		return "Locates valid Java main methods.";
	}
 
	@Override
	public String[] getAssumptions() {
		return new String[]{"Main methods are methods.",
				    "Main methods are case-sensitively named \"main\"",
				    "Main methods are public.", 
				    "Main methods are static.", 
				    "Main methods return void.", 
				    "Main methods take a single one dimensional String array parameter", 
				    "Main methods may be final.", 
				    "Main methods may have restricted floating point calculations.", 
				    "Main methods may be synchronized."};
	}
 
	@Override
	protected Q evaluateEnvelope() {
		// TODO: Implement
		return null;
	}
 
}

Step 4) Develop and Debug Analyzer Logic

The evaluateEnvelope method is where we put our analysis logic. The Analyzer base class defines a method getEnvelope that lazily evaluates the result of your analysis defined in the evaluateEnvelope method and caches the result for later. Future calls to getEnvelope return the cached result.

We can run our analyzer on the Atlas shell by instantiating a new DiscoverMainMethods object and calling the getEnvelope method.

var analyzer = new DiscoverMainMethods()
var envelope = analyzer.getEnvelope()
show(envelope)

The show method should fail here because our getEnvelope currently returns null. We've broken out the steps to implement our analysis logic into five steps, but keep in mind there is more than one way to implement this analyzer! If you are ambitious, you could try stopping here and implementing your own analyzer then comparing your solution with ours.

Analysis Step 1) Select public static methods

Let's start off our implementation by making a set of all the public static methods we can find in the program graph. These two properties are both Atlas Tags so we can query for them directly. Note we use nodesTaggedWithAll here instead of nodesTaggedWithAny because we want nodes that are public and static and methods.

protected Q evaluateEnvelope() {
	// Step 1) select nodes from the index that are marked as public, static, methods
	Q mainMethods = Query.universe().nodesTaggedWithAll(XCSG.publicVisibility, XCSG.ClassMethod);
	return mainMethods;
}

If we run our analyzer now, we should be returning all public static methods in the universe. Let's test it out. If you haven't already, save your DiscoverMainMethods.java file. Now reload the Atlas Shell. Its important to reload the Atlas shell at this point because if you don't you will be running the version of DiscoverMainMethods that was compiled the last time you reloaded the shell. After reloading the shell, run the following query.

show(new DiscoverMainMethods().getEnvelope())

After running the query, you should see a graph similar to the following.

MainMethodsStep1.png

Analysis Step 2) Only select methods from project "MainMethodTestCases"

Currently we select all public static void methods in the universe. In our case, we probably don't care about methods in the Java runtime libraries, so let's refine our queries to exclude those results. We will use the project query to select all elements within the MainMethodTestCases project.

Q project = Query.universe().project("MainMethodTestCases").contained();

Now that we have a set of all elements withing the MainMethodTestCases project, we can use the intersection of mainMethods and project to remove all public static methods not contained in our project.

mainMethods = mainMethods.intersection(project);

Your evaluationEnvelope method should now look like this

protected Q evaluateEnvelope() {
	// Step 1) select nodes from the index that are marked as public, static, methods
	Q mainMethods = Query.universe().nodesTaggedWithAll(XCSG.publicVisibility, XCSG.ClassMethod);
 
	// Step 2) select methods from project "MainMethodTestCases"
	Q project = Query.universe().project("MainMethodTestCases").contained();
	mainMethods = mainMethods.intersection(project);
 
	return mainMethods;
}

Test your updated analyzer by running the following Atlas Shell commands and noting that all the methods that are not part of our project are no longer included in the result. Don't forget to save your analyzer code and reload the Atlas Shell first! This is the last time we will remind you to reload the shell after editing your analyzer.

show(new DiscoverMainMethods().getEnvelope())

At this point your graph should look like this.

Step1.png

Analysis Step 3) Select methods named "main"

Currently we select all public static void methods in our project, but we are also selecting methods that are not named "main". Notice that the class Main17 contains a public static method named Main that is included in our results. Let's examine the attributes of the selected nodes and filter based on name.

We are going to open a new Eclipse view to help inspect the attributes and tags of a GraphElement. Navigate to Eclipse or Window > Show View > Other... > Atlas > Element Detail View. Click on the "Main" method in the Main17 class in the graph we displayed in Step 1 (if you closed the graph already, you will need to display it again). Notice that as you click on elements in the graph, the Element Detail View updates to show you the different attributes and tags applied to the selected GraphElement. Notice also the value for the node name is different for public static method in the Main17 class. We will use this knowledge to filter out methods with the wrong method name.

ElementDetailView.png

Note: We could also access this information on the Atlas Shell using the selected variable.

var graphElement = selected.eval().nodes().getFirst()

Now we can use the selectNode query to select nodes that match the given attribute key and value. Add the following code to your evaluateEnvelope method.

// Step 3) select nodes from the public static methods that are named "main"
mainMethods = mainMethods.methods("main");

Test your updated analyzer by running the following Atlas Shell commands and noting that the method in Main17 is no longer included in the result.

Analysis Step 4) Select methods that return void

By looking at the graph our analyzer is generating at the end of Step 2, we see that we are still failing test cases related to non-void return types and incorrect method parameters. Let's tackle the return type test case (Main22) now.

Atlas defines an abstraction of a "void" type (even though a void type doesn't technically exist in Java, see JLS 14.8). We can select the void type with a query of Common.types("void"). We can then find all methods that return void, by walking one step backwards along returns edges starting at the void type. First lets create a subgraph of only returns edges.

Q returnsEdges = Query.universe().edgesTaggedWithAny(XCSG.Returns).retainEdges();

Note: We are grabbing edges from Query.universe() instead of project because project does not contain the void type and would therefore exclude the edges we are interested in.

Then within the returnsEdges graph let's find all methods that return void. A method node has a returns edge from the method to the return type. Since we are starting with the return type of interest (void) we should walk backwards one step along the edge to find the methods that return void. Since we are only interested in the method nodes, we can use the predecessors query to exclude the query origin (void) in the result.

Q voidMethods = returnsEdges.predecessors(Common.types("void"));

Now that we have a set of all voidMethods, we can use the intersection of the mainMethods set we calculated earlier and voidMethods to remove non-void methods.

mainMethods = mainMethods.intersection(voidMethods);

Your evaluationEnvelope method should now have the following code added to the previous queries.

// Step 4) filter out methods that are not void return types
Q returnsEdges = Query.universe().edgesTaggedWithAny(XCSG.Returns).retainEdges();
Q voidMethods = returnsEdges.predecessors(Common.types("void"));
mainMethods = mainMethods.intersection(voidMethods);

Test your updated analyzer on the Atlas Shell. The method in Main22 should now be removed from the resulting envelope (graph).

Analysis Step 5) Select methods that only take one parameter

Looking at the envelope our analyzer is producing after Step 3 shows that the only test cases we are failing deal with method parameters. Let's break these test cases into two smaller problems. First some test cases fail because there are the wrong number of parameters (there should only be one parameter) and the remaining test cases fail because the parameter is the wrong type. Let's deal with the wrong number of parameters in this step by filtering out all methods that do not take exactly one parameter (test cases Main23 and Main24).

Atlas provides HasParameter edges from the method node to the parameter node. If you didn't know this, you could discover by selecting all the current mainMethods nodes and running the following query on the Atlas Shell. If you are being lazy, you could just use the selected variable here (in place of mainMethods) by clicking to select one or more methods in the graph.

show(mainMethods.forwardStepOn(universe.edgesTaggedWithAny(XCSG.HasParameter)))

This query shows a graph that has all nodes that are reachable within one step of mainMethods along edges tagged with XCSG.HasParameter. Note that the HasParameter edge is not visible here because HasParameter extends from the Contains tag and edges extending from Contains are not shown. Instead, the containing node is turned into a container and all nodes that are reachable via a Contains edge are shown inside the container.

First let's create a subgraph of HasParameter edges to work with. The HasParameter edges we are interested in will be contained in the project, but you could play it safe if you weren't sure and use Query.universe() here as well.

Q paramEdges = project.edgesTaggedWithAny(XCSG.HasParameter).retainEdges();

Let's use HasParameter edges to filter out all methods that don't take any parameters and then filter out any methods that take two or more parameters (leaving us methods that only take one parameter). Methods with not parameters will not have a HasParameter edge leaving the method node.

Starting at the set of mainMethods and walking forward one step along HasParameter edges gives us all parameters of the mainMethods methods.

Q mainMethodParams = paramEdges.successors(mainMethods);

If we now start from the set of mainMethodParams and walk backwards along the HasParameter edges we will reach all methods that have at least one parameter. Taking the difference of mainMethods and the reachable methods from parameters (main methods with params) leaves us with a set of methods with no parameters. We can then difference out the main methods with no parameters.

Q methodsWithNoParams = mainMethods.difference(paramEdges.predecessors(mainMethodParams));
mainMethods = mainMethods.difference(methodsWithNoParams);

Note: The code above is very verbose and tries to show its work. Since paramEdges.predecessors(mainMethodParams) returns all main methods with at least one parameter, you could shorten the code above to the following.

mainMethods = paramEdges.predecessors(mainMethodParams);

Next let's filter methods with two or more parameters (leaving only methods with a single parameter of any type). Parameter nodes in Atlas have an attribute denoting the index of the parameter. The index count starts at zero and increments for each parameter. Selecting nodes with parameter index one gives us parameters of methods that have two or more parameters.

Q mainMethodSecondParams = mainMethodParams.selectNode(XCSG.parameterIndex, 1);

Walking backwards one step along the HasParameter edge from the parameter nodes in the mainMethodSecondParams set leads us to all method nodes with two or more parameters. We can then simply difference out the methods with two or more parameters from the set of mainMethods.

Q methodsWithTwoOrMoreParams = mainMethodSecondParams.predecessors(mainMethodSecondParams);
mainMethods = mainMethods.difference(methodsWithTwoOrMoreParams);

After completing this step you should have added the following to you evaluateEnvelope method.

Q paramEdges = appContext.edgesTaggedWithAny(XCSG.Parameter).retainEdges();
Q mainMethodParams = paramEdges.successors(mainMethods);
// methods with no parameters will not have a Parameter edge (and won't be reachable from parameters)
mainMethods = paramEdges.predecessors(mainMethodParams);
// methods with 2 or more parameters will have at least one parameter node with attribute 
// parameterIndex == 1 (index 0 is the first parameter)
Q mainMethodSecondParams = mainMethodParams.selectNode(XCSG.parameterIndex, 1);
Q methodsWithTwoOrMoreParams = paramEdges.predecessors(mainMethodSecondParams);
mainMethods = mainMethods.difference(methodsWithTwoOrMoreParams);

Now would be a good time to test your analyzer and make sure it is no longer detecting methods in Main23 and Main24.

Analysis Step 6) Select methods that take a one dimensional String array

The only test cases remaining are invalid parameter types. A valid Java main method takes a one dimensional String array. It is possible to write the method signature using varargs. Atlas abstracts varargs for you, so you don't need to consider varargs as a special case.

In Atlas, arrays are represented as array types of a given type. To get an array type for java.lang.String we should walk backwards along ArrayElementType edges that lead to the java.lang.String type. We can then select the one dimensional String array types from the set of all string array types.

Q elementTypeEdges = Query.universe().edgesTaggedWithAny(XCSG.ArrayElementType).retainEdges();
Q stringArraysTypes = elementTypeEdges.predecessors(Common.typeSelect("java.lang","String"));
Q oneDimensionStringArrayType = stringArraysTypes.selectNode(XCSG.Java.arrayTypeDimension, 1);

Now let's filter out the first parameter of the remaining mainMethod nodes that are not the right type. We can do this by selecting the first parameters of the remaining main methods. We can then check if the TypeOf edge leading from the parameter to the type matches the one dimensional String array type, by creating a set of main method parameters that are a valid type and walking backwards from that set to find the parent method nodes.

Q mainMethodFirstParams = CommonQueries.methodParameter(mainMethods, 0);
Q typeOfEdges = Query.universe().edgesTaggedWithAny(XCSG.TypeOf);
Q oneDimensionStringArrayParameters = typeOfEdges.predecessors(oneDimensionStringArrayType);
Q validMainMethodFirstParams = mainMethodFirstParams.intersection(oneDimensionStringArrayParameters);
mainMethods = paramEdges.predecessors(validMainMethodFirstParams);

You should now be able to detect all valid Java main methods in the test set. As a final sanity check we can devise an Atlas Shell query to compare the number of all methods named "main" in the com.example.valid package and the number of all methods detected by the analyzer. The result should be true.

var groundTruth = CommonQueries.declarations(universe.pkg("com.example.valid")).nodesTaggedWithAny(XCSG.Method).methods("main")
var analyzerResults = new DiscoverMainMethods().getEnvelope()
groundTruth.eval().nodes().size() == analyzerResults.eval().nodes().size()

If you got lost along the way, check out the final implementation in the next section.

Final Implementation

public class DiscoverMainMethods extends Analyzer {
 
	@Override 
	public String getName(){
		return "Discover Main Methods";
	}
 
	@Override
	public String getDescription() {
		return "Finds valid Java main methods.";
	}
 
	@Override
	public String[] getAssumptions() {
		return new String[]{"Main methods are methods.",
						    "Main methods are case-sensitively named \"main\"",
						    "Main methods are public.", 
						    "Main methods are static.", 
						    "Main methods return void.", 
						    "Main methods take a single one dimensional String array parameter", 
						    "Main methods may be final.", 
						    "Main methods may have restricted floating point calculations.", 
						    "Main methods may be synchronized."};
	}
 
	@Override
	protected Q evaluateEnvelope() {
		// Step 1) select nodes from the index that are marked as public, static, methods
		Q mainMethods = Query.universe().nodesTaggedWithAll(XCSG.publicVisibility, XCSG.ClassMethod);
 
		// Step 2) select methods from project "MainMethodTestCases"
		Q project = Query.universe().project("MainMethodTestCases").contained();
		mainMethods = mainMethods.intersection(project);
 
		// Step 3) select nodes from the public static methods that are named "main"
		mainMethods = mainMethods.methods("main");
 
		// Step 4) filter out methods that are not void return types
		Q returnsEdges = Query.universe().edgesTaggedWithAny(XCSG.Returns).retainEdges();
		Q voidMethods = returnsEdges.predecessors(Common.types("void"));
		mainMethods = mainMethods.intersection(voidMethods);
 
		// Step 5) filter out methods that do not take exactly one parameter
		Q paramEdges = project.edgesTaggedWithAny(XCSG.HasParameter).retainEdges();
		Q mainMethodParams = paramEdges.successors(mainMethods);
		// methods with no parameters will not have a Parameter edge (and won't be reachable from parameters)
		Q methodsWithNoParams = mainMethods.difference(paramEdges.predecessors(mainMethodParams));
		mainMethods = mainMethods.difference(methodsWithNoParams);
		// methods with 2 or more parameters will have at least one parameter node with attribute 
		// parameterIndex == 1 (index 0 is the first parameter)
		Q mainMethodSecondParams = mainMethodParams.selectNode(XCSG.parameterIndex, 1);
		Q methodsWithTwoOrMoreParams = paramEdges.predecessors(mainMethodSecondParams);
		mainMethods = mainMethods.difference(methodsWithTwoOrMoreParams);
 
		// Step 6) filter out methods that do not take a one dimensional String array
		Q elementTypeEdges = Query.universe().edgesTaggedWithAny(XCSG.ArrayElementType).retainEdges();
		Q stringArraysTypes = elementTypeEdges.predecessors(Common.typeSelect("java.lang", "String"));
		// array types have a arrayTypeDimension attribute
		Q oneDimensionStringArrayType = stringArraysTypes.selectNode(XCSG.Java.arrayTypeDimension, 1);
		Q mainMethodFirstParams = CommonQueries.methodParameter(mainMethods, 0);
		Q typeOfEdges = Query.universe().edgesTaggedWithAny(XCSG.TypeOf);
		Q oneDimensionStringArrayParameters = typeOfEdges.predecessors(oneDimensionStringArrayType);
		Q validMainMethodFirstParams = mainMethodFirstParams.intersection(oneDimensionStringArrayParameters);
		mainMethods = paramEdges.predecessors(validMainMethodFirstParams);
 
		return mainMethods;
	}
 
}

Alternative Implementations

There is almost always more than one way to solve a problem. Some approaches may be more efficient or accurate than others, but at the end of the day there is no single correct solution. You should consider the trade offs in your approach in terms analysis time, resources, accuracy, etc.


Back to Learning Atlas