Wednesday, September 26, 2007

About the extension assignment of WebSpider

In fact, I did not do much about it. However, I found a page that mention something similar to the matter. Here is the link: http://life.neophi.com/danielr/2005/10/httpunit_and_javascript.html. I have download and tried his code, but not act well enough. The retrieving seems to be unstable. Sometime it can retrieve some javascript pages but some time gives out error Invalid URL encoding or Unexpected end of ZLIB input stream. I don't know why, but this may have something to do with the search algorithm that I store the only the link to visit later, but not the WebConersation object. But I did not take time to modify my code for a further test.
I think hack into httpunit is the only way. I did find some crawler, but most of them are implementation like our webSpider, which also base on httpunit, so of course do not fully support javascript.
I read in httpunit's website that full support for javascript is one of their aim now. Maybe next version will fixed this problem. :)

No comments: