Monday, December 8, 2008

Portfolio is in ICU now

Actually, Portfolio is not in ICU at all. On contrary, it is in its best shape ever. It is just a kidding and the thing I want to say is that I decided to name my research on Portfolio as Software ICU, which is used in the ICS 413 class that I do my research on.

I have spent most of the past week on the issue sensor. It is mostly done now, as I believe. But I did not test it yet, neither manually nor via JUnit. I will test it manually in the coming days and make an intial release. As there is no way to run a Google Project Hosting locally, the only way to make an unit test is by using the public Google Project Hosting. Though we have less control of data than running a local server, the data in a certain period of time seems consist while there is no obvious way to change history data in Google Project Hosting (Unless some one hack it within Google, which is extremely impossible).

Now the sensor collect the following data:
  • Author - The author of the issue will become the owner of the sensor data.
  • Issue Number - The issue number in Google Project Hosting.
  • Update Number - The number of update of the issue. If the issue is just created, it will be 0.
  • Status - Actually this may better to call new status because it record the new status in the change, if the status is unchanged, this Status will be empty.
  • Issue link - The link to the page of this issue on Google Project Hosting website.
  • Comment - The comment user make, if any.
One major problem is that, from the issue updates Atom feed provided by Google, there is no way to extract the current status of the issue such as status, priority, types if they are not changed in the issue change event. There are two way to extract these data. The first is to retrieve the issue's page. The link of the page is collected in the sensor, so there is no trouble to locate the page, however, the page always represent the newest state of the issue, which may not equal to the state the sensor data represents. The other way is to get data from Google posting sent by Google Project Hosting. The posting will keep track of all change along with the initial state. It is ideal for make sensor data, but it require more work to setup the google group.

In a second thought, the Google Group posting approach seems somehow better. If switch to this way, there is no need to use the issue updates Atom feed any more, and all data can be extract from the postings. I will see into it later.

Tuesday, December 2, 2008

Portfolio enahcement and thesis draft

Sortable table columns have been implemented. I implement this by making measure headers links that sort the internal data model according to the specific measure. It is accomplish somehow hacky because the internal data model store a list of MiniBarChart under each project and there is no name in the MiniBarChart instances. These MiniBarChart instances are generate from Telemetry stream data and the orders of the Telemtry analysis are identical in every project. Therefore, the sorting unit is not defined by name of the measure, though user think it is, but by the index of it in enabled measure.

After the intensive implementation of features of portfolio page, I took a small break around Thanksgiving. Then I work on the thesis. The draft of my thesis can be found here.

In the coming week, my major coding effort will be put on issue senor. The first experimental version should be available by the end of this week.

Monday, November 24, 2008

Big improves to Portfolio

Big improves to Portfolio
Last week I have implemented a big improvement to Portfolio Page. The feature is to color the previous uncolored measures by the member participation. The idea is from here: http://code.google.com/p/hackystat-ui-wicket/issues/detail?id=118

Most of the previous uncolored measures are implemented using the member-level telemetry except FileMetric. Therefore, it is possible and useful to color the chart according to the participation of the members to the project according to these measures, because a health project anticipates equal contribution from each member.

When fixing Issue 97, I added a interface called StreamTrendClassifier to let user select different coloring method for a measure. I thought this would faciliate future development and it does in some extent. When implementing participation coloring, I am using the Classifier interface to specify new classifier to the measure. But the problem I did not realize in the beginning is that, configuration to the new classifier, which requires text fields of member percentage, threshold value and frequency percentage, is quite different from the stream trend one, which requires higher/lower threshold and higherbetter check box. So I modify the interface greatly to let each classifier provide its own configuration panel. And this idea may be apply to the DPD page, where each data model provides its own data panel instead of letting the session to create the panel for it.

After complete the functionality, I change the interface of the configuration panel greatly to make it looks better. The old format that each measure occupys a row of table and each field use a column works fine with uniform configuration fields, but not with various field between measures. For example the first three measures have higher/lower threshold and higherbetter, and the latter three have member percentage, threshold value and frequency percentage, then the higher threshold will aligned in the same column as member percentage, but they just have nothing in common. This may cost users' confusion in both concept and vision. After some tryings, I finally arrange fields of a stream classifier veritically inside a table cell. A screen shot of the new interface is like below:

This is much clear that different configuration fields are following the classifier selection, and do not implicit any common over different measures, or say classifiers. As a result of this change, the configuration panel now lays between the input panel and detail panel because it is too high and narrow to put it above the detail panel now.

Plan for this week
Two priority jobs for this week is issue 113, which provide sortable column in portfolio table. And the issue track for ant sensor.

Monday, November 17, 2008

Portfolio and Issue sensor

Effort estimation of sortable portfolio columns

It is very useful to let the portfolio table sortable by value of the measures. There is two way to do it:
  1. Make the table headers links that will sort the table according to the value of the measure.
  2. Re-implement the portfolio detail table using DataTable, which support sorting.
Major effort of the first approach is to code the comparator, and the second's is to deploy the DataTable. The work to deploy DataTable is surely more because we have to implement the comparator after all. The benefit for that is, the DataTable looks more elegant. And we have more control over the content of each cell because each column can has its own class and html to define its display. However, as the current portfolio detail table looks good enough, I would prefer to go the first approach to add the feature with less work.

Issue Sensor

I start the sensor from the SVN sensor because it is using data from internet(svn repository) and therefore more similar to issue sensor than the other ant sensors which use data from local tools such as JUnit and EMMA. The issue sensor will grab data from feeds of Google Project Hosting, such as Feeds of hackystat-ui-wicket. I found a java api for RSS and ATOM named ROME to parse the feeds. The information of issue sensor data will include open/close an issue, change state of an issue and/or new comment to an issue. The initial implementation should be finished in this week.

Wednesday, November 12, 2008

Review of two papers

[1]A risk based economical approach for evaluating software project portfolios

[2]Portfolio management of software development projects using COCOMO II

The software project portfolios mentioned in these two papers are quite different from the software portfolio I am working and doing research on. The software portfolio they talked about are the portfolio used to evaluation investments on software projects to maximum the profits and help manager to optimize resource distribution, while our software portfolio is used to monitor the health state of software projects and help users to improve their development practices. However, there is an opportunity that these two kind of portfolio can cooperate together.

When estimate a project, [1] use a term called Risk Level and [2] introduce a set of factors to represent the similar attribute. In our portfolio, the chance a project may fail is indicated by the health state of the project. It is more accurate and objective than the risk value given by the manager. If we combine these two kind of portfolio and use our portfolio to evaluate projects' risk level, the estimation of final profits should be more reasonable and accurate.

Monday, November 10, 2008

Every thing seems going well

I am glad that students in ICS 413 are starting to use Hackystat now. I read some of their blog postings about it. They seems like the project portfolio quite a bit. Some of them did complain about the installation difficulties, but I complained it too when I am new to Hackystat, because we are all spoiled by the "click next to finish" install package.

Portfolio drill down to DPD

I was trying to add feature to portfolio that allow users to click on a value from detail table and retrieve a corresponding DPD analysis page. It will be an easy job if I can add REST API support for DPD page. But after some dig into codes of DPD page, I found it is not easy to accomplish because from there is no way to know what content sensitive menus are used in a specified DPD analysis programatically. So both when constructing the URL for a certain analysis, and parsing a URL to retrieve a page, we have to write hard-wired code to teach the system what argument to use for a certain analysis and how to parse/construct it. This will introduce a high risk of bugs for modifying DPD page. In my opinion,we should modify the logical structure of DPD page to better suit this and potential need of changes, if we have time to do so. At this momment, may be the best and fast way is to provide a incomplete functional REST API for DPD, which just include day, projects and analysis, not including the arguments. We can see how it feels before planning further steps.

Telemetry

Now users can specify the size of the chart in telemetry page. They can enter any size they want, the only restriction is the total size cannot be larger than 300,000 in pixel, which is a limitation in google chart.

Thesis

I am reading two papers and will write some review of them tommorrow probably.

Monday, November 3, 2008

To-date and portfolio

Portfolio and ToDate

As function of portfolio page comes to a relative complete level, we start the To-Date page. I started it by copying the code from portfolio page and modify it a little to make it satisfy its goal. All difference between to-date and portfolio is the end date is fixed to today, and cumulative parameters are fixed to true. When test to date, I open the relative portfolio page to compare the result. Then I suddenly realize that, all functionality of to-date can be produced from portfolio, and the code is just a redundancy. Then why we bother to create another page for to-date analysis?

So we are thinking about remove to date page and direct potential users to use portfolio to retrieve "to date" analysis. We can add a button in portfolio to provide an easy way to get to-date analysis. Or even just make some documentation to tell user how to use portfolio to achieve their purposes. Either way, we will not need to continue the to-date page anymore at this moment.

Thesis

The need for coding for project browser is now reduced a lot. So it is a good time to seriously start again the long postponed thesis research work. The major task currently is to do more literature review. Also the students evaluation is about to start, so we should just sit tied and wait for the users' feedback before do more crazy coding to our system.

Monday, October 27, 2008

Portfolio and To date analysis

Member-level telemetry in portfolio page

Now when user click a
sparkline in portfolio page, it will go to the associated member level telemetry analysis, which provides more detail about that sparkline. Actually, the analysis used to generate the sparkline is the member level telemetry. These analysis will return multiple streams, then the sparkline is generated by merging these streams. New attribute "merge" is added to portfolio definitions. Currently the system support "sum", "avg", "min" and "max" merge methods, but only "sum" is actually being used.

At the beginning of trying to implement this feature for portfolio page, I was trying to associate a telemetry analysis to the portfolio measure, and when user click the sparkline, it will invoke the associated telemetry analysis rather than the analysis used to generate the
sparkline. But after some coding I found that, the associated telemetry is most likely to have different parameters from the original analysis. That means either the associated analysis has to use default parameters, or we need to add parameter configuration for the associated telemetry. Neither of these choices satisfy us, so we discard this idea and use merging.

To Date Analysis

Portfolio page now is mostly done at current level, so it is good time to start to date analysis page now. Surprisingly, to-date page is pretty much the same as portfolio:
  • It has few input: projects only.
  • Each project we generate some analysis and list them in a table.
  • It has a configuration panel to configure these analysis.
So after copying code from portfolio, I just need to do some small change and it will work.

There are two major issue we need to address:
  1. Which analysis should use cumulative value and which should use latest value.
  2. When should the accumulation start.
The first one is easy because it can be tell from the cumulative parameter in telemetry definition. If a telemetry has cumulative parameter, that means accumulation make sense to it, therefore it should use cumulative value, and the others should use latest value.

The second one is more difficult to answer. A quick idea is to start from the start date of the project, but that will cause the serious performance issue, especially when requesting the Default project, which start date is set to
2000-01-01 by default. The other problem is, even if start from start date of the project is doable, is it make sense to do so? May be it will be better to let user select the start date? This is the first research question for to-date analysis.

Monday, October 20, 2008

Progress in project browser

Ant task improvement:

After tolerate long verify time for quite a long, I finally start to fix this. When running verify.build.xml, Unit test and EMMA are both invoked. The unit test is to ensure all unit tests are passed, and the EMMA is to get a coverage of the current system. But both of them need to run junit to execute all unit tests in the system. Exactly same tests are executed twice. It is surely a waste of time, especially when these tests takes several minutes to complete and I have to wait until it finish then I can commit the code. So the idea it to combine these two task together. In the EMMA task, it requires to run the junit and track it to get coverage data. But in our old version, the EMMA task just get the coverage data and discard the test result. So I added code to let it monitor the test result as well and generate junit reports in build directory, which can be used to generate junit results and sensor data. Then I add a task in emma for verify, which invoke emma.tool, junit.report, junit.sensor, emma.report, emma.sensor, emma.echo to complete all task in Unit test and EMMA. Now, the verify.build.xml runs almost twice as fast as before.

Telemetry Improvement:

The color issue is fixed now. The random colors are selected in a way that colors have maximum difference from each other. This is done under the color wheel conception. If we need to pick n random color. The first color will be selected randomly, the other colors will have 360/n degree away from the preceding one, so that they will be distributed on the color wheel with maximum distance.

Moreover, the stream color mechanism is improve to take more use of coloring. The new rules are:
  • If there are more than 1 unit axis, streams will be colored according to the axes.
  • If there is only one axis, but have multiple stream names, streams will be colored according to the stream name.
  • If there is only one stream name as well. All stream will be colored separately.
So when users viewing member level telemetry or one telemetry over several projects, they will see a chart will stream of the same color. There will almost always be streams will different colors.

Portfolio Improvement:

Trend interpretation has been improved. Before, only the monotonous trends will be considered as stable, increasing or decreasing. All others will be considered as unstable. This is not the way people interpret trends. People accept a certain amount of vibrate in trends and still consider stable, increasing or decreasing. So I change the system this way. If a point has difference from the preceding point with certain amount, which is calculated as 5% of the average of the stream, will be considered same as its preceding. Otherwise if the point is higher, then it is increasing, if lower, then it is decreasing. Then the system go through the stream. If there is no increasing or decreasing points, the stream is stable. If it has increasing but decreasing points, it is increasing. If it has decreasing but increasing points, it is decreasing. If it has both increasing and decreasing points, it is unstable. In this way the system can accept trends with small vibrations and still classify them as how people usually do. Also the acceptable difference is discussable.

What's more, I added a classifier class to do this job. That means user can write their own classifier to interpret the stream trends, may be using some sophisticated mathematics methods as well.

Monday, October 13, 2008

Thesis outline

The final tech report of 699 will be the draft of my thesis. The semester is half way through, it is time to think about it again to put something new to it.

Development on project portfolio page goes well. But my literature review is far behind schedule. I need to catch up with it now.

Here is the outline of my thesis:

1. Introduction

2. Related work

  • 2.1 Software Product and Process Metrics
    • 2.1.1 Product Metrics (Coverage, Complexity)
    • 2.1.2 Process Metrics (LOC, Churn, DevTime)
    • 2.2 Software Project Portfolios
    • 2.2.1 Risk Based Software Project Portfolio
    • 2.2.2 Some other Portfolios

    3. Research Questions
    • 3.1 How useful is this approach?
    • 3.2 Does the system successful achieve its goal?

    4. System Description
    • 4.1 Overview
    • 4.2 Portfolio Table
    • 4.2.1 Sparklines
    • 4.2.2 Evaluation color rules
    • 4.3 User Configurations
    • 4.4 Portfolio Definitions

    5. Experiment Design and Analysis
    • 5.1 Research Approach
    • 5.2 Classroom Evaluation
    • 5.3 Industry Approach

    6. Future directions

    Monday, October 6, 2008

    Struggling on HTML stylesheet

    We lately found that, as the width of project selection field is limited and the multiple selection is not horizontal scrollable, it is no way to tell projects with long name such as hackystat-sensor-ant, hackystat-sensor-eclipse and hackystat-sensor-emacs, so we have to change it. And I found this interesting page Select Multiple Form Fields. It make me think more about the solution. But for Wicket, the multiple selection check box is supported natively. So that will be the easiest and acceptable solution. In order to scale the selection field properly into the input table, I got struggling on HTML stylesheet again. But I am going better with it now. There are too many properties in stylesheet that are overwhelming to me at the beginning. But finally I found this page http://www.w3.org/TR/CSS2/propidx.html to be very helpful. It is a full list of stylesheet properties and quite easy to find what you need.

    Also, the user configuration persistence using UriCache and initial portfolio definition XML are implemented. Now the portfolio become more user friendly especially you used to customize your portfolio configuration. But currently there is no working sample for the XML definition, so I want to move some of the hard-wired measures into the sample XML, may be those uncolored process analyses such as Build, Commit and UnitTest.

    As students in ICS 413 will use Hackystat soon, the logger for user usage will have the higher implementation priority. Hopefully we will obtain a useful set of evaluation data.

    Monday, September 29, 2008

    XML-based portfolio measure definition

    Software project portfolio page is under good developing progress. After a week of effort, most HTML issues in project browser pages are solved.

    The portfolio measure definition XML has been added. Now user can use XML file to customize measure definitions in portfolio page. When new telemetry chart is available, user can just add it into the portfolio measure definition XML file and the portfolio page will be ready to use it. As the system is designed for easy configure, the work is quite straight forward to use a XML approach to replace the previous code-based one. The default setting remains the same. So if user don't create the setting file, he will not notice any difference.

    I will implement the persisting the user's last configuration settings this week.

    Wednesday, September 24, 2008

    A week passes fast

    After parameter list is added to portfolio configuration panel, portfolio's setting becomes more similar to telemetry. Thus, at last, we change all portfolio page's setting to match that in telemetry page. Time interval in configuration panel has been replaced by start date and end date in input panel. The granularity is moved to input panel as well. And the order of input elements in input panel of both portfolio and telemetry has been modified so that the layout become more consistent over pages.

    Before, I was anxious to give user the capacity to select any start/end date in portfolio page. Because I want to create an alternative mechanism that can emphasize the week granularity and its purpose of up-to-date state analysis. But I now realize that, even in stock portfolio system, where most attention is focus on current and future price, provides function to view historical data. Giving user more control power over the system will not harm the initial purpose of the system. Furthermore, users may discover usage that we never think of.

    When arranging elements in input panel, I got a chance to look into the panels' layout in the page, and accidentally figure out how to control the layout. Now the pages are more compact and nice.

    After added the date selections to portfolio page, the SimDate scenario of portfolio can be put in any time period as I wish. So SimDate is changed.

    Tuesday, September 16, 2008

    Weekly progress report 0915

    The parameter list in portfolio configuration panel is finish! The layout may still need to change, but the functionality is complete. I finally choose to use the well constructed methods in telemetry session to get the parameter definition list. The parameter definition will be needed when showing the configuration panel and when verifying parameters before retrieve data for detail panel. Therefore, if the definition is retrieved every time when it is needed, the reaction of the page will be slow down, especially when open the configuration panel. While there already a map of telemetry definitions, which contain the parameter definitions, managed in telemetry session, it is easier to just use it rather than manage another map of instances in portfolio sessions. But telemetry page and portfolio page are designed to be separated, is it better to keep the class in each page separated? However, there are too many connection between telemetry and portfolio page, both logically and physically, e.g. charts in portfolio has links to the associated telemetry pages. So it should be acceptable to connection the session between them.

    After this critical implementation, it is much easier to finish the delayed portfolio simulation data. Before, I got trouble in constructing useful coverage data using the existed methods in SimData, because the granularity of the constructed coverage data is "line" and the default in parameter definition is "method", which is used for portfolio page. There is no reasonable way to change parameters in portfolio page before, so if I insist to do it before adding the parameter configuration, I will have to add some dirty code such as if (measure.name == "Coverage") {...}. I am glad that I don't have to worry about it any more.

    For the literature review, I started last week, but did not do well so far. I only finish one article in the week. That article took me over 5 hours to finish. I really need to find more time for it and improve my reading speed.

    Thursday, September 11, 2008

    Literatures Review: Development governance for software management

    The article: Development governance for software management

    Summary:
    Governance is an interesting concept that is different from management. Governance is the act of exerting management control to guide development practices to compliance, while management normally consist of overseeing inward personnel and operations and a set of outward-facing responsibilities: planning, budgeting and forecasting etc.

    To measure that to governance, the article use the term key performance indicator(KPI). The two KPIs mainly discuss about are volatility KPI and volume KPI -- both are process KPIs. The volatility is good to measure and predict development processes. Monitoring the volume will help to forecast project to adjust resource distribution; assure quality; and measure the efficacy of a software design in programming aspect. It also talked a little bit about work-product KPIs such as coding guidelines and complexity.


    Relevance to my research:
    In Hackystat, volatility is measured as Churn and volume is measured as FileMetric. Hackystat also provides other measures like coupling, coverage, build, test, commits and code issue. They are now equal. But from the enlightening from the article, I realize they should be group into two: process measures and work-product measures. The formers show the performance of the develop team while laters show the quality of the product. The understanding of these two groups are different thus research to them should be somehow differentiated.

    Process measures will include:
    Work-product measures will include:

    An important idea from the article is that all these measures have to be monitor over time to give significant meaning. Even when talking about volume, what make sense is the change of the volume over time -- LOC increase relatively slow indicates efficacy design and bloated code usually brings negative impacts. The idea of looking at these measures is to look into their trend, in order to predict the processes. This match our idea in Project Portfolio. We show not only the current state of each measure, but also the historical trends. And we estimate projects performance from their analysis trend as well.

    Monday, September 8, 2008

    Weekly progress report

    Last week I have mainly worked on project portfolio. Validators has been added to configuration panel to ensure that higher threshold is always higher that lower threshold.

    Now I am working on a "simpleportfolio" simulated dataset in SimData for the use of Tutorial Guided Tour of Portfolio analysis. Hopefully will finish today or tomorrow.

    Future improvements for portfolio will include:
    • telemetry parameters for each measure
    • new measures from telemetry: dev time, build & unit test
    • show member of each project -- may not necessary to be an analysis, just show it near the project name, or in active members measure if available.
    • new analysis: active members -- inactive members are those have significantly less dev time than others, others are all active members.
    I will start literature review this week. Read and summarize at least 2 articles per week. Each article will have a blog entry to summarize it.

    Monday, September 1, 2008

    Plan of thesis resreach

    This fall I am starting the research on my master thesis.

    The topic of the thesis is about how to compare projects with the metrics provided by Hackystat.

    In large companies, it is possible to have hundreds of projects at a same time. The abilities to understand and compare this large amount of project is a great challenges and opportunities for both project managers and developers. Managers will want to figure out how those projects are doing as well as how the develop groups are doing. Developers will like to find similar projects or groups for further communication on technique, tools, etc.

    Utilizing Hackystat's auto-collected data of software development procedure, we build a Software Project Portfolio Management page on Project Browser. It demonstrates projects using a set of software procedure metrics with both present value and historical chart. These values and charts are colored according to the thresholds and trends. Users are able to quickly scan this portfolio and get fast understand from colors and narrow down their interest to several projects. Then they can have further investigation into those projects.

    However, understanding of such a software project portfolio is still far from sufficient. We don't know if all metrics we using are useful, or there may be some interesting metrics we should introduce to the system. How to color the charts and value is also uncertain. So we give the system the capacity to adapt different situation and need. Users can define the value thresholds for each metrics, determine if the higher value means better, and select the color for good, bad and so so values and trends. In the future, users may even define more than 3 color spaces for value and more colors for more different trends.

    To study this system, we are planning first evaluate it in classroom. We will:
    • let the students in ICS413 to use this system during their course program development
    • do several survey about their opinion of the system and their preferred customization
    • track down their usage of the system
    Hopefully after research with these data, we will acquire deeper understanding of this Hackystat powered Software Project Portfolio Management.

    Wednesday, August 20, 2008

    Lessons Learned from GSoC 2008

    This summer I have participated the Google Summer of Code 2008(GSoC) program under the Project Hackystat. The project I do is to develop Telemetry and Project Portfolio Management pages for Hackystat Project Browser, the overall viewer for Hackystat, which uses Wicket web app framework.
    It is now end and I can summarize it a little bit.

    * Technical skills acquired

    I have been doing Wicket development for Hackystat Project Browser before GSoC. After this summer with Wicket, I get more familiar with Wicket and love it more.
    I learn lots of new things in Wicket during GSoC, while the most important one I think is unit testing on Wicket. Unit test on Wicket is said to be facilitating but I have not dig into it for quite a long time because I can't find a way to manipulate forms or assert values in table that base on ListView. During the summer, I dig into this field again and finally found the way to access component in ListView, using the path like "projectTable:1:projectStream:0:streamCheckBox". Therefore, I can now fully control the page with wicket tester and test most parts of the page. And test coverage has increased greatly since then.

    * Lessons on software development

    This is the first time I work in distance with my mentor. I gained lots of helpful experience during it.

    The most important thing during this is keep good communication with my mentor. We have choose the weekly skype phone call but it never works because the time zone issue keep make us miss the right time of the meeting. That delay the communication and delay the progress as well. Finally, we settle down to email approach. Asynchronising way seems much easier when communicate cross time zones.

    * Other

    As it is a program during the summer vacation, I initially did it as I am really in vacation: lazy for work. I did not accomplish much until I realize that progress is behind schedule quite a bit. If I have an advise for future GSoC students, it will be don't consider you are really in vacation that summer, take the program as a full time one. Another advise will be to start familiar as soon as possible, even before GSoC started.

    Blog entries of Shaoxuan's GSoC progress

    I have been back to China for the whole summer while I am working on the Google Summer of Code project for Hackystata.

    I have kept writing blogs of my project's progress, but unfortunately I cannot access google blogger at my home. So I put all my blogs in one of the project's wiki pages. Here is the link:
    http://code.google.com/p/hackystat-ui-wicket/wiki/ShaoxuanBlog

    Wednesday, April 2, 2008

    Brief review of Informative-workspace

    Informative workspace is a project to utilize a 9 monitors LCD screen to provide development team some kind of software engineering information. Detail can be found here.

    It is mainly a Java web app project using Wicket framework. I am just start learning Wicket and found it amazingly cool. It turns web app elements such as page, form and text field into java classes. Developers can handle these components by handling java classes, which they are much more familiar with. Then the code turn out to be more resemble to Swing application. Another cool thing is Wicket does not introduce specially HTML tag. all its tags are simple standard HTML, which means we can use all kind of HTML editor such as Dreamweaver to design our pages. This is extremely cool.

    Back to the informative workspace project. Their main component is a project overviewer, which intend to show some overview of a project we are interested in. The viewer looks pretty nice, and its code is good written, with rich documentation. But when importing the project into Eclipse, I found that the project's build path is empty. I have to add all the Wicket and Hackystat libraries manually. It is not a big deal to me, but might be a big deal to some new developer that not quite familiar of Wicket or Hackystat. Hope the development team will settle this problem. 

    Another issue I notice in that the developer just left his Hackystat account and password in their code. It is OK for a on going project does not have a external configuration component. But at least they should change or remove their account information temporarily before making the release distribution. It might not be a big deal in this case. It is just all about good developing habit.

    About the functionality of the viewer, its idea, which shows the files editing state of the project, is good, but the implementation, based on file tree, does not emphasis the goal. I will suggest to separate the viewer into two: a file viewer and a developer viewer. 
    The file viewer shows when and by whom is the file last modified and how often it is modified, and the files should be able to group by the last modified date. Then we can tell what files are active and what are stable from change. 
    The developer viewer shows what file they are working on, and highlight it if there is some other developers are working on it as well.

    Tuesday, April 1, 2008

    Execution Permission in Macintosh/Unix

    Today I have downloaded a latest tomcat and tried to install it. But surprisingly I cannot get the tomcat run. The error messages are shown as followed:
    delia:ambienthackystat ZsxKiNG$ startup.sh
    -bash: /Applications/Develop/apache-tomcat-6.0.16/bin/startup.sh: Permission denied
    delia:ambienthackystat ZsxKiNG$ catalina.sh
    -bash: /Applications/Develop/apache-tomcat-6.0.16/bin/catalina.sh: Permission denied
    delia:ambienthackystat ZsxKiNG$ sudo /Applications/Develop/apache-tomcat-6.0.16/bin/catalina.sh
    Password:
    sudo: /Applications/Develop/apache-tomcat-6.0.16/bin/catalina.sh: command not found
    delia:ambienthackystat ZsxKiNG$ 
    That's quite weird to me because I used to install tomcat before my last system reinstallation and it worked just fine. It seems to be some permission issue in Unix like OS. 

    After some google searching, I finally got the answer: I need to change the binary files' mode to executable for me. It is done by calling chmod a+x catalina.sh . Chmod is a Unix command that change files' mode, such as r(read), w(write) and x(execute). Users are only able to perform these action when the file is set to permit them to do so.
    After this command, I did not get the permission denied error any more. Instead, I got a error from tomcat saying BASEDIR variable is not set correctly. This variable come out to should be set by default equal to CATALINA_HOME, but something seems prevent it from doing so. As there are lots of files in bin directory, it is reasonable to guess it is because tomcat need to execute some other file that does not have execution permission. So I just type in chmod a+x *.* and try again, tomcat ran. 

    That is a very important experience to me, a new comer to Macintosh from Windows. In windows, authorization is poor and trivial, but in Mac, it becomes dominating. Even the administrator/power user will not be allow to read, write or execute a file if they have not the permission to that file.  It need a lot of practice before get used to it. Knowing how to change files' permission is an essential step. 

    Wednesday, March 19, 2008

    Ideas of Thesis on Ambient Device

    About the Project
    I am now working on a Hackystat related project call AmbientHackystat, which utilizes Hackystat with ambient device to show users some useful project states that indicated by data collected by Hackystat. For example, turn the ambient orb into red when a build fail and let the nabaztag(a bunny) tell you when new changes are committed to code repository. I am think of how will the thesis be if it is based on this.

    The system is defined by a set of trigger action pairs, where trigger get data from Hackystat database and check it, then invoke some ambient devices' action when the data satisfy some defined condition, such as last build fail.

    Current progress
    For the current moment, the functionality is quite simple. The orb can change color and pulse speed, and the bunny can only talk. There are only two triggers available: SensorData and Coverage. Though SensorData trigger is able to configure to monitor all kind of sensordata from Hackystat, sensordata is too low level that only a few type of data will be meaningful to be monitored. My partner is working on the bunny and it will be fully utilized soon hopefully. I still keep working on the triggers and more will be built in the future.

    Thesis research point
    There are two research point within this project: 

    1. What kind of trigger will be useful
    This part is more like a data mining process. How to generate meaningful data for the thousands of raw data. What kind of information will be interesting to developers and project managers. How much detail should be provided, such as should it tell the number of coverage or just show if the coverage is within a certain level. It has to be careful when abstracting information, too much detail will make it verbose while too little detail will make it meaningless. However, this part is mostly similar to another project in my lab that call Boswell, which is a auto micro biographer that generate informative message and send it to all kinds of information platform such as twitter, email, text message or even facebook. The core parts of these two project are the same: abstract information from raw data of hackystat. So make be we should work together in this part for research.

    2. How to present a project state/development event on ambient devices.
    This part is much unique. How to present things is really a matter. It has to be noticeable enough to make people to be able to aware of it. But it should not be too distracting. People need to be focus on their work and don't want to be scared by ambient devices. Hence color and animation seem more fit this situation than sounds and speech. However, some particular events may do worth a vocal alarm, such as build fail may be one of them. These issues are needed to be settled. 

    Problems of research
    The most important issue for this research is the user. We do need a group of user to really try this out to tell if a certain function is useful. It might be OK even if we are the users ourselves. But in order to evaluate the system, we have to be a group that working on a same project and work with some certain behaviors that will make these ambient devices useful. For example, we should have people working at the same time that they will not be too easy to have the whole view of their project, then we can evaluate if it is useful to let the ambient devices tell them if the build and test fail or if someone is opening/editing the same file his is working on.

    At this moment, we only have 3 people working on this project. We usually work at different time. And I, the only one get easy access to these ambient device, is the one write most of the project. So I already know what is happening before the devices change their state. I hope we could have some real users to try this mechanism and provide feedback so that we can evaluate and improve. I think this will be an essential part of the thesis.

    Ant, Hudson and Continuous Integration

    Use Hudson
    I have just hooked my project onto Hudson, a continuous integration build system. The process is amazingly easy that I don't even need to read any guideline before finish it. Just click the new job, field out the configuration form, save and I am done. The configuration form is good self-documented, and every entry is followed by a small question mark where you can click on it for more help information.

    In my project's job, I use subversion for source code management and invoke ant task for build process. It is pretty straight forward that I have already these tools for quite a while.

    Write Ant Task for Hudson
    After hooked my project onto Hudson, I become more anxious about the used way my project being verified: run junit task for unit testing and emma task for coverage separately because emma does not fail even though the unit test fail and it does not send junit sensor data. Though I already feel it is not a good way to do it for a long time but I just did not brother to look into the ant file to fix it. But now as the Hudson is looking for new commit every minute and the Hudson's server is hooking up over 20 project in the same time, I am become kind of worried about wasting the machine's power for redundant work. Also, for one of the purpose of using Hudson is to let it generate useful project data for my AmbientDevice project, it is better to get the data sooner so that the triggers can react earlier. So I tried to look into the junit.build.xml and emma.build.xml to look for some solution.

    What I found is, what makes the build fail is not directly because of the junit tests fail, but because a variable is set to false when a test fails. The variable is set to the failureproperty attribute in junit task, and it is not set in the junit task that inside the emma task. So I just add it back to emma one and make the build fail when tests fail. It make more sense because coverage of fail tests will not be accurate. Also I add the junit sensor to the task dependence to send data of unit tests. But then I noticed that even though the build fails when test fails, the coverage data is still computed and emma sensordata is sent. The horrible thing is this incorrect coverage data will corrupt the project data. For example a false alarm of coverage drop by 20% may be raised because of the junit tests fail. That will be quite misleading and confusing. The solution for this is to add unless="junit.failed" in emma.report and emma.sensor to stop it from running. But the unless attribute is actually occupied by a variable called emma.disable already. However, it does not have any usability about this variable so far. So I just removed it. I am also wondering if it is possible and how to put an AND operation in the attribute verification such as if and unless

    But anyway, the project is now verified in the way that I want: just run the unit test once and get both unit test results and coverage result if test pass. And emma will fail when test fails so that it will not generate some misleading coverage data. I just remember that I have met this kind of matters before that I run the emma after some small changes and suddenly found the coverage is much lower than I thought. I was thinking that I might have forget to put some test on some method then I looked into the test cases. But I found everything are there. It took me quite a while before noticing that it is because a test fail just when it start so all the rest of the test is not executed. That is quite a terrible experience and I am glad  that I have fixed it by the way.

    Code Review for iHacky

    Review project: iHacky for hackystat
    This is a facebook application for software developers to generate their professional profile and get connected with other developers. It is based on the software engineering toolkit framework called Hackystat which collect and analyze software development process and product data in order to get a better view of the state of software development projects.

    For user installation, there is a InstallationGuide wiki page in their project site. The process is described pretty clear in it. Moreover, the installation itself is good self documented that I finished the installation smoothly without go through the InstallationGuide.

    iHacky is written in php, which I am not that familiar with. So I did not put much time to go over the code. I will focus in the functionality of the application.

    Inside iHacky, I have to say that it is now almost empty. No actual useful function is provided. I did not use facebook much and not quite familiar with what it is capable of. Therefore, it is hard for me to determine if iHacky is useful or not. It is still under developing, however, it is still not proper to release a empty framework for code view. Though one of our view's goal is to provide some advise of what kind of functions will be suite for this application, if the developers(and also users) themselves have no idea of what can the application provide, how can they assume that other user will have? In my aspect, I find it useless at this moment, and have no idea what it might be capable of other than those of Hackystat ProjectViewer and TelemetryViewer.

    For advise, the idea of showing what kind of tools, IDEs and programming languages are my favorite sounds cool. Then maybe I can find more people that using a particular tool and go ask for help and discuss things. 

    Monday, February 25, 2008

    What's a good design of XML schema

    With the help of JAXB, playing with XML becomes quite a simple thing in Java. 
    But when doing my project ambienthackystat, I find it not that simple now. Because though using JAXB classes are easy, you have to had a well design XML schema to generate those facilitating classes. XML schema design is a thing that I had never think about before.
    I am using XML to configure the system, to define the system's functions with trigger and action pairs. It works well in our first milestone. But when we going further, I find it somehow circumscribe our design. When I thinking of some interesting functions, I cannot help thinking how to describe it with our XML file. Then it become quite frustrating when I found it quite hard to do it. The XML file is limiting my design now... Maybe it is time to modify the XML schema, but it the system, which base on that XML, will probably need to redesign too. It comes out to be a huge task all in a sudden. 
    Though I still want the schema change, this time we will do it more carefully.