Sunday, September 30, 2007

Review of Webspider-chiaofen-1.0.926

Review package: webspider-chiaofen-1.0.926, Package Author: ChiaFen Zielinski
1. Installation Review:
Installation is smooth. I did not encounter any problem when installing and trying to use this package.
Junit, PMD, Checkstyle, Findbugs are all passed with no errors or warnings.
Emma Coverage summary
class: 100% (1/1)
method: 100% (9/9)block: 93% (726/784)
line: 92% (140.1/152)
This is a good coverage.
ant -f dist.build.xml also BUILD SUCCESSFUL.
However, there is a small problem that when running java -jar webspider-chiaofen.jar-totallinks http://hackystat.org 100, It printed out plenty of empty lines. It may because of the logging scheme, which was implemented with println().

2. Code format and conventions review
Violations:


FilesLinesViolationComments
WebSpiderExample.java25-31EJS-39Document all members.
WebSpiderExample.java294EJS-32Document is unrelative to the method, which is copy-paste from another method.
WebSpiderExample.java241EJS-33Keep comments and code in sync.
WebSpiderExample.java299EJS-9Use meaningful names.

3. Test case review
  • Black box perspective.
traversePages(String urlLink, int numLink), findMostPopularPage() and findTotalPages(), these 3 method actually are not tested. The test method only call the main method once that will use these method and assert a meaningless assertion.

getNumLinks(), this method is tested properly.

isNum(String s), only method is called. Result is not verified properly.

logList (String urlLink, int numLink, String logging), same as above. But this is a method returns void, verify result may be difficult. However, I think do some digging into the data of the test object is able to verify as least a bit of the result, while author verify none.

  • White box perspective.
I have to say that, the test case is terrible. It actually does no meaningful test. What it does is only to run the methods and see if there is any exception or error comes out. Nearly no result is tested. 5 of the 8 test method use assertEquals ("test", testUrl, testWebSpider.startUrl) to verify the result, which is meaningless in most of the test. The test pass even when I add some bug code in the class code. I think the only propose of this test case is to pass EMMA with a high coverage.

  • Break da buggah.
When I run the program with arguments of -totallinks http://www.google.com, it ends with Array Index: java.lang.ArrayIndexOutOfBoundsException: 2.
When using -totallinks http://www.google.com 99999999999999999, it throw out a java.lang.NumberFormatException and break down.


4. Summary and Lessons Learned
We should really do out test right, use it to test our code, not to pass the QA tools.
My code does not do well when I test with the arguments above which cause problem. I didn't think of these things either. I only catch them and give out a warning. I think I can do it better by adding more useful statement that will help user use it right.

Wednesday, September 26, 2007

About the extension assignment of WebSpider

In fact, I did not do much about it. However, I found a page that mention something similar to the matter. Here is the link: http://life.neophi.com/danielr/2005/10/httpunit_and_javascript.html. I have download and tried his code, but not act well enough. The retrieving seems to be unstable. Sometime it can retrieve some javascript pages but some time gives out error Invalid URL encoding or Unexpected end of ZLIB input stream. I don't know why, but this may have something to do with the search algorithm that I store the only the link to visit later, but not the WebConersation object. But I did not take time to modify my code for a further test.
I think hack into httpunit is the only way. I did find some crawler, but most of them are implementation like our webSpider, which also base on httpunit, so of course do not fully support javascript.
I read in httpunit's website that full support for javascript is one of their aim now. Maybe next version will fixed this problem. :)

Tuesday, September 25, 2007

WebSpider Extension

webspider-shaoxuan-1.0.926.zip
Most part of what I did is about the testing.
First, I rewrote my test class and separate tests that one test method test one of the target's method. But there is one exception. I will talk about it later.
Second, I modified the method that return void. They are retrivePage(String) and getPages(int).
Both of them are modified to return an int value. The return of retrivePage(String) is the number of links found on this link. The return of getPages(int) is the number of pages that actually retrieved. After that, I found it easier to test them, but also better to know them that I did not know how much pages were actually retrieved before.
When I check for the pmd, it tell me I have to add a assert in testWebSpider() where I test the main method by only call it. In order to past PMD, I added a test of another method with a assert. So this is the only test method that test 2 methods.
I amitted that to do this in order to past PMD test is not a good way. But in the view of PMD, I think the proper way is remove that whole test method because it actually has nothing to assert. However, I would like to keep the test. I think keeping the test is more useful than satisfying PMD.

Monday, September 24, 2007

WebSpider

Download: WebSpider-Shaoxuan.zip

Task 1~3 are accomplished.

Encountered troubles:

1. To saved URL from WebLink. The trouble is becase the WebLink object's getURLString method sometimes returns a incomplete URL. I have tried to fix it manually, but did not make it. Then I tried to store the WebLink object instead, but it still does seem to work. Finally I was told that webLink.getRequest().getURL() will return what I was looking for, so I used it.

2. Deal with exceptions while accessing pages. At first I tried to modify my code so that it can deal with more and more exception. But soon I found that the exception type various as much as pages on Internet and some of them are cause from javascript which httpunit support poorly. So, alternatively, I catch and log them instead of solve them. Also, I turn off httpunit's exception throwing when counter a javascript page.

3. To log. After reading in it, I found that the logging package is really convenient. I can log different message in different level that can be easily controlled and filtered. I only need to change setting in initializing. But then I encounter a problem when trying to print out the logs to Console. I tried nearly all classes in the package when found out that a new handler must be create in order to do what I want.

In testing, I found out that command arguments in main() method can be transfer by a String array. But as the main have not a return, I don't know how to assert the running result. The test just prove the code can be execute, but not guarantee the rightness. However, the test can cover over 90% method-lines. What are not covered are the exception catching expression. But if I change the argument to make it process a job of larger scale, it is sure that these line will be covered. I don't know whether testing like this is a right way. But it seems to be the only way to test a method without a return.

In this task, I really feel that the QAs tools are not additional jobs, but facilities. Used the xml files build in previous assignment, I nearly don't need to look into it anymore. Just use it to examine my code! That feels great. I think I will be greatly benefited from them in my future work.

Sunday, September 16, 2007

Stack - Experience of Ant & QAs

Stack-shaoxuan-5.0.916 is released. Distribution package can be download from stack-shaoxuan-5.0.916.zip.

The goal of this assignment is to learn Ant build system and Open source Java automated quality assurance tools (FindBugs, Checkstyle, PMD, etc). It is an interesting experience.

Task and Problems:
There are five tasks in the assignment. All of them has been completed. I have encountered some problems when installing QA tools and creating JavaNCSS project(Task 1 & 4). The other 3 tasks are easy.
problems in these tasks are mainly because I got my Macbook this week and it is the first time I use the Mac OS. I took a long time to figure out how to set the environment variables when I was trying to install those QA tools. Then when I trying to build the JavaNCSS project, the CLASSPATH thing held me for a while again. But now, when I solved all these task, I learnt things not only about Ant and QA tools, but also Mac OS.
The most important things in these tasks is to setup environment variables for every QA tools. It is significant to make then run right. In Mac, I need to run a terminal and use vi to create/modify a file named .profile in my home directory(where cd ~ will take you to). You cannot see the .profile in Finder, that is the reason why I confused. In the file I put all the definition of all the environment variables. When doing task 4, I got confused with the CLASSPATH because there is not a CLASSPATH definition when typing printenv, but the java compiler runs well. Then I know that it is because JAVA has enhanced in Mac OS and the compiler will search default directory for java classes. I can use CLASSPATH=./:{other directory} to add new destinations to CLASSPATH. But later, I found that the class required by javancss can be defined inside javancss.build.xml. That is a better way and I use it.
When improving the unit tests, I think it is easy that just add lines to test every methods and use the try/catch expressions to test the exceptions.

Impression of Ant and QA tools:
Ant is really a great tool that facilitate the automatic building procedure. Though its command structure is not so easy to learn from the beginning, its power enable us to build however we want to.

Difference between SCLC and JavaNCSS:
Though these two tools are both use to count lines in code files, SCLC count also blank line, which I think is useless, while JavaNCSS counts CCN(Cyclomatic Complexity Number) and average of NCSS, CCN and Javadoc per project/package/class/method, which make sense in knowing about the complex of your code. Therefore, JavaNCSS is what I prefer.

Saturday, September 1, 2007

CodeRuler Revision

New version of source code: http://www2.hawaii.edu/~sz/ics613/sz_v3.zip

After studying the score rules and testing codes from UCSD Spring 2004 Programming Contest. I made a lot of changes in my strategy.

Strategy of Knight:
By testing the codes of XXX_420_XXX, the second place of UCSD Spring 2004 Programming Contest, I surprisingly discover how excellent a simple brainless strategy can perform. Its idea is very simple: capture the nearest castle. It doesn't even group up knights when attack or annihilate enemy units when captured all castles. It just let every knight attack the castle that is nearest from it and let them stand still while all castles are captured. However, in most of the test matches, it does succeed to capture all castles! No one can ever beat it in castle capturing, though its stupid stand still method makes it some trouble because it doesn't notice when enemy knights are kill its peasants, as long as they don't capture castle.
Therefore, I decide to switch my strategy to a similar one: capture the nearest castle and annihilate enemy units when all castles are captured. That works pretty good. It acts just like a blitz, and is able to take all the castle within 150 turns when fight with samples! (500 turns per match)
I think one of the reason to its success is that it naturally separates its force in groups that maximum the possibility of surviving in raids and efficiency of capturing castles. The success of such simple idea is amazing.
I have also test the code of champion of UCSD Spring 2004 Programming Contest, its AI is much better. It will score every castle depend on the distance and nearby defence and select the best one to attack. It sounds nice, but it doesn't work so well in my test. One of the reason is its knights seems hesitant when selecting target, and this makes the capture procedure much slower.
Perhaps, to be simple is just a good idea.

Strategy of producing:
By studying the score rules, I found a key of getting high score when winning the match. That is to produce knight rather than peasant when your peasants are enough. The reason is, a knight scores 2 and a peasant only scores 1 at the end of the match. And, you don't need to continue to create peasants in order to claim all land. As I tested, even 40 peasants will be enough to claim more than 4000 land. So I continue to produce knights when I have 50 peasants. It just likes a cheat! I increase my winning score by 400 by just adding such simple expressions! That makes me possible to win the highest score event in case that I lost over a half of matches, and guarantees me the champion as long as I win a half of matches.

Strategy of peasant:
Nothing special. I just extended the searching area when choosing directions to improve the efficiency of claiming land.