Wednesday, December 22, 2010

Comic: Public Key Cryptography in a fairy tale!

The comic shown above shows a PKC scenario of a commonly known fairly tale "The Frog Prince". This was done by a group of ten people including me. Try and see whether you can understand it!! :) (click on the image to see it in original size (obviously)!)

HINT: This involves a Witch, a princess, and a frog (prince), and this shows the encryption scheme of PKC.

This comic was done as a requirement for the "Computer Security" module in my department, Dept. of Computer Science & Engineering, University of Moratuwa.

Sunday, December 12, 2010

How to Use docbkx-tools as Maven DocBook plugin

When it comes to documentation of open source projects, DocBook is the favorite choice among many developers. As you all know, DocBook provides a comprehensive XML schema to document your project, which you can transform to any format such as html, pdf, webhelp etc. Here, I'm going to show you how to integrate DocBook in to your Maven project via docbkx-tools. This docbkx-maven-plugin makes docbook integration to Apache Maven much easier.

Reference DocBook XSLs from your customization layer
Probably you are not planning to use DocBook standard stylesheets alone, but have a little more customization to it. For that, you have to create a new customization layer, and import the standard docbook.xsl stylesheet. 
It's a little tricky how it's done in this plugin. You should give a symbolic reference rather than an actual file location when importing the stylesheets. You should use <xsl:import href="urn:docbkx:stylesheet/docbook.xsl" /> to give the symbolic reference.
What "urn:docbkx:stylesheet" refers will depend on what customization (html, xhtml, etc.) you want to use. If you want to use xhtml/chunk.xsl customization, first specify it in your XSL by     <xsl:import href="urn:docbkx:stylesheet/chunk.xsl"/> and then, in your POM, specify the <xhtmlCustomization>. Further details are explained below.

Integrating dockbkx-tools to Maven projects
OK, now let's see how to integrate this nice Docbook plugin to your maven project.
  • First, you have to add the docbook plugin to your parent POM file pom.xml. The docbook xml documentation should be on path trunk/src/docbkx.

As you can see, the plugin depends on the docbook-xml artifact. After adding this structure to your POM, you just have to customize it according to your needs.
  • Then, you have to set the configurations in <configuration> tag. In that, you should menion the xsl customization file, if you have a customized docbook XSLs. Else, you can just use the generic XSLs provided by DocBook. At this point, the docbkx's version is 2.0.11 and it uses DocBook-XSL-NS 1.75.2  version. Mind that it's the namespace version, so make sure you make your customized version with NS too. Else, you may encounter errors which could be hard to track. For the customizations, for html, use <htmlCustomization> tag to mention the customizations. For PDF, it's <foCustomization>. and <xhtmlCustomization> for, obviously, xhtml. A sample is below. You may put configurations under <executions> too, if you wish.
  • The new customization to the DocBook-XSL family, DocBook WebHelp, is not yet included on this version, but the people are working on it to make it available as soon as possible. I'm eagerly waiting for it, mainly because I contributed to it a lot; plus it's a nice customization which you can actually put in the web for your audience to see.
  • Now it's to set the 'executions' you need to run. Executions specify what goals you run, copy CSS/JavaScript files you might have etc. A sample execution code is given below.

                        <copy todir="target/site/userguide/">
                            <fileset dir="src/docbkx/template">

What happens there? First, it specifies the phase 'pre-site', which tells to Maven that, this execution should be run first when invoking 'mvn site' command. 
    • <goal> - is set to generate-xhtml which will generate a xhtml output. Other options are generate-html and generate-pdf.
    • <includes> - This tag specifies which document should be processed. The path was trunk/src/docbkx/userguide.xml.
    • <configuration> - Then it comes to the configuration part. Most of it are self understandable. As you can see I've set the XSL customization I wish to use there. You can put it anywhere, both worked fine for me. 
      • <chunkedOutput> will generate set of html files rather than one BIG html file which are broken down based on the tags <chapter>, <section> etc. in your docbook xml file.
      • I copied all the CSS/JavaScript/Images to target folder by using the <postProcess> tag. It's easier. The source files were at src/docbkx/template.
  • You may write another <execution> to generate the PDF output. The whole code I used is given below.

                                    <copy todir="target/site/userguide/">
                                        <fileset dir="src/docbkx/template">


Thursday, December 9, 2010

Heard of 2D bar codes? It's awesome!

QR code for the URL
While crawling through web, I found this awesome technique called QR codes. It is short for "Quick Response Code", a 2D bar code system, which first introduced in Japan way back in time (1994 precisely, says wikipedia :) ). These codes could be the future, where every magazines and papers carry out these to give some important details or the URL of it to their readers. These codes can be easily read by an Apple iPhone, Android... or basically from any phone with a camera and a QR reader app!

So, what is it exactly?
You might be wondering is it a way to communicate secret message? or just another bar code to be used in your grocery store. Well, it's not a secret cipher where you have to know the key to decode it. Well then, what's the use of it? Suppose you are this big guy/gal who has a business card for yourself. How much time will customers of yours will keep your card in their pocket when you give it to them; a month? say two months max, then surely they might lose it somewhere or may be end up in laundry! But if you print your QR code in it, they can scan it, and store your info as a 'contact' in their phone. Problem solved! One other occasion is, you can show your web sites URL from that image, and interested people can quickly scan it and go to your site. For instance, the above QR image opens this weblog. QR codes are expected to be a popular technology especially between smartphone users.

How to create them

I used QR-Code Generator to create them. There are plenty of free sites available for creating these QR codes. From there, you can create an image out of a URL, which when detected, will redirect you to that URL in your mobile browser. Else, you can simply store a text you like. For example, here's my business card. Decode it and see ;)

Here's a video of using QR codes in business cards.

Tuesday, October 12, 2010

Français: Vote JavaScript pour la langue française

Ceci est la version (google) traduit de l'anglais blog-post "Stemmer JavaScript pour la langue française". Je demande pardon s'il ya des erreurs présentes dans la version traduite. Il ya un mois, j'ai écrit un port JavaScript pour l'algorithme de Porter français Issu dans le web-site Snowball. Algorithme a été assez claire de sorte que le codage a été juste une journée de travail:) Je n'ai ce port pour une exigence de la Google Summer of Code projet DocBook Webhelp qui j'ai travaillé dans les derniers mois.
Voici une brève introduction au sujet de formes dérivées, si vous n'êtes pas familier avec ce qu'est un Stemmer est. :) Qu'est-ce qu'un Stemmer fait essentiellement consiste à extraire la racine d'un mot donné. Formes dérivées sont très utiles pour moteurs de recherche tels que les utilisateurs peuvent entrer requête de recherche dans n'importe quelle variété, mais afficher le contenu de la racine du mot. C'est ce que les utilisateurs a probablement voulu dire. (Google ne présente;)) L'exemple suivant montre ce qu'est une stem est:

contractait     =
      ====> contract
Comme les langues de l'homme sont très complexes, il est vraiment Difficuly de mettre au point un algorithme pour extraire la racine exacte. Par conséquent, pour certains mots, le mot ne peut être extraite de la racine exacte, mais légèrement différentes les unes. Mais à des fins de calculs et les usages dans les applications, c'est suffisant. :) Ce problème ne concerne pas seulement le français, et est commune à toutes les formes dérivées dans d'autres langues. 

Il s'agit de la Stemmer pour le français. L'égrappoir est maintenant ajouté au site de l'Porter Snowball qui a écrit les algorithmes ainsi que d'autres contributeurs les maintient. Télécharger le Stemmer à partir de:
Le Stemmer
Pour appeler l'égrappoir, appelez l'égrappoir fonction avec la chaîne de mots pertinents.
ex: var stem = stemmer("foobar"); 
J'ai couru la donnée de test cas pour vérifier l'exactitude de la mise en œuvre. Il découle de près de 19.500 correctement mots de 21.000 mots. La précision est de plus de 90%.
Stemmer anglais est déjà disponible sur le site Porter. Vous pouvez consulter toutes les implémentations existantes Stemmer au Snowball site. Lot de mise en œuvre de Java et C + + sont disponibles, mais manque de JavaScript port. Cela a été la principale raison pour moi d'écrire ça. J'espère que vous avez trouver cela utile! Si écrire une version française de ce post trop .... :)

Friday, October 8, 2010

JavaScript Stemmer for French Language

About a month ago, I wrote a JavaScript port for the Porter French Stemming Algorithm in Snowball. Algorithm was pretty clear so, that was just a day of work :) I did this port for a requirement of the Google Summer of Code DocBook Webhelp Project which I worked on in the last few months.

If you are not familiar with what a Stemmer is, here's a brief introduction :). What a Stemmer basically does is extracting the root form of a given verb. Stemmers are very useful for Search engines such that users can enter search query in any variety, but view the content for the root word, which the users probably meant. ( Google does this ;) ) Following example shows what a stem is:
Playing     =
Played      ====> Play
Plays       =
Play        =

As the human languages are very complex, it is really difficuly to devise an algorithm to extract the exact root. Therefore, for some words, the extracted word may not be the exact root, but slightly different one. But for computations purposes and usages in applications, it is sufficient.  :) This issue is not just related to French, and is common for all the stemmers in other languages.

This is the Stemmer for French. The stemmer is now added to the Porter's Snowball site who wrote the algorithms along with other contributors maintains them. Download the Stemmer from:
The Stemmer:
To invoke the stemmer, call the stemmer function with the relevant word string.
ex: var stem = stemmer("foobar"); 
I ran the given test-cases to verify the accuracy of the implementation. It correctly stemmed nearly 19,500 words out of 21,000 words. The accuracy is more than 90%.

English Stemmer is already available on the Porter site. You can view all the existing Stemmer implementations at Snowball Site. Lot of implementation of Java and C++ are available, but lacks JavaScript port. That was the main reason for me to write this. Hope you all find this useful! Should write a french version of this post too.... :)

Saturday, August 28, 2010

DocBook Version 5.1b2 and DocBook-XSL Version 1.76.0 released

On 27th August 2010 DocBook version 5.1b2 and DocBook-XSL version 1.76.0 was released. DocBook V5.1b2 is the second test release of DocBook V5.1. Version 5.1b2 is available in RELAX NG and non-normatively in DTD and W3C XML Schema formats.

Summary of Changes in DocBook V5.1
The largest change is the introduction of support for topic-based authoring through the addition of the topic element and the assembly structure. For more information about assemblies, see DocBook Assemblies.

DocBook V5.1 also addresses the following requests for enhancement:
  • RFE 1679665 Add better support for modular documentation
  • RFE 1722935 Add a proofreader value to the class attribute for othercredit
  • RFE 1770787 Add givenname as an alternative for firstname
  • RFE 1899655 Allow more elements to be the root of a DocBook document
  • RFE 2100736 Allow constant in initializer
  • RFE 2791288 Added several additional elements, including quote, to the ubiquitous inlines
  • RFE 2820190 Add a topic element
  • RFE 2821653 Remove the constraint that indexterm elements must not appear in footnotes
  • RFE 2907124 Allow personal name components directly in bibliomset.
  • RFE 2907125 Allow all inlines in remark
  • RFE 2907131 Allow simplesect in colophon
  • RFE 2964576 Fix the bug that allowed table to appear inside entry
Element References can be found at DocBook 5: The Definitive Guide - Element Reference

DocBook-XSL Release Notes: 1.76.0

This release includes important bug fixes and adds the following significant feature changes:

  • Gentext: Many updates and additions to translation/locales thanks to Red Hat, the Fedora Project, and other contributors.
  • Common: Faster localization support, as language files are loaded on demand.
  • FO: Support for SVG content in imagedata added.
  • HTML: Output improved when using 'make.clean.html' and a stock CSS file is now provided.
  • EPUB: A number of improvements to NCX, cover and image selection, and XHTML 1.1 element choices

Saturday, August 21, 2010

DocBook WebHelp Project

DocBook WebHelp was the project I worked on for the Google Summer of Code 2010 program. Pencil down date for it was on 16th August, 2010, which means the Coding officially finished on that day. So, I with my mentor David Cramer finished all the requirements planned, and wrote all the documentation needed. Results was announced today, 21st August by Google Open Source Program team; I successfully finished the project :)

The demo of the output produced by WebHelp XSL customization is available on following link. The demo shown is the documentation of DocBook WebHelp.

The latest output in the snapshots have lot more features and looks quite beautiful compared to the released version. Do have a look -

WebHelp Output

WebHelp Search tab
You can download DocBook installation from,

The Webhelp customization is available under, docbook-xsl-ns-1.76.1/webhelp. Following is some brief details about the DocBook WebHelp customization.

A common requirement for technical publications groups is to produce a Web-based help format that includes a table of contents pane, a search feature, and an index similar to what you get from the Microsoft HTML Help (.chm) format or Eclipse help. If the content is help for a Web application that is not exposed to the Internet or requires that the user be logged in, then it is impossible to use services like Google to add search.

DocBook WebHelp provides a browser-independent, platform-independent documentation “Web Help” output format for DocBook files. WebHelp provides a sophisticated but inexpensive web publishing option for DocBook.

  • Full text search.
    • Stemming support for English, French, and German. Stemming support can be added for other languages by implementing a stemmer.
    • Support for Chinese, Japanese, and Korean using code from the Lucene search engine.
    • Search highlighting shows where the searched for term appears in the results. Use the H button to toggle the highlighting on and off.
  • Search results can include brief descriptions of the target.
  • Table of Contents (TOC) pane with collapsible toc tree.
  • Autosynchronization of content pane and TOC.
  • TOC and search pane implemented without the use of a frameset.
  • An Ant build.xml file to generate output. You can use this build file by importing it into your own or use it as a model for integrating this output format into your own build system.

So, what do you think of the output? Are you interested to give it a try?

Follow up of my posts related to DocBook WebHelp are here
Follow up of my posts related to DocBook are here

PS: For discussions, please subscribe to my comment feed such that you won't miss my replies. Alternatively contact us via docbook-apps list -

Sunday, July 4, 2010

Colombo Ride 3D

Following is the trailer of Colombo Ride 3D, a Sri Lankan 3D mobile game developed by Gamos Technology Solutions (GTS). This will bring a new world of experience to Sri Lankan mobile game lovers. Further details can be found at

Friday, July 2, 2010

Install/Update Mozilla Firefox 3.6.6

Mozilla launched a new update for Mozilla Firefox. This new release is a update for crash protection feature, which was first introduced in Mozilla Firefox 3.6.4. Mozilla blog says the crash protection feature,

"protects Windows and Linux users from crashes and freezes caused by third party plugins such as Flash and Silverlight. When a plugin crashes, users can reload the Web page to restart the plugin and continue browsing. When a plugin freezes, making the whole browser unresponsive, Firefox 3.6.4 terminates the unresponsive plugin after waiting 10 seconds. These changes were tested with a beta audience of close to one million users."
 I just installed on my machine, which runs Ubuntu 10.04 Lucid Lynx. I found this nice article which describes how to do a update for it: How to update Firefox to 3.6.6 on Ubuntu 10.04, Lucid Lynx

NOTE: When updating if you don't want to have all updates (for other packages) you should uncheck them. This will reduce the size of the download. The easiest way to uncheck all at once is double clicking the update titles in "Update Manager". For example: "Recommended Updates", "Proposed Updates" etc. Then, just click on check-boxes Firefox packages.

Sunday, June 27, 2010

Wordle word clouds

Here are some word clouds I created using Wordle. Wordle is a service which generates a cloud with a given text. It gives greater prominence for words appearing frequently, making those words to appear bigger. Following clouds were created using the content in this blog.
Try it:

Thursday, June 24, 2010

Test Post from GoogleCommandLine: GoogleCL

Hello, I'm posting this from Ubuntu 10.04 terminal using the GoogleCL tool. It's awesomeeee...... :D Should learn the full api and command line arguments. This is just a test post to check how it works.

Command I'm using is: 'google blogger post --tags tagsInQuotes,GoogleCL,goole --title TitleInQuotes TheBodyContent.

Added p html tag just to check how it would look like! You can download GoogleCL from
That would be all. Good Bye!'

Wednesday, May 26, 2010

Remote Debugging Java packages in WSO2 Carbon using IntelliJ IDEA

Here I'm going to tell you an easy way of debugging java packages deployed under any WSO2 product. So, you don't need to go wondering and go through thousands of lines looking what the bug was in your package.
What you need:
  • IntelliJ IDEA
Thats it. Now, open your Java package/bundle from IDEA.
Then, go to Run -> Edit Configurations
Add a remote debug configuration by clicking on '+' sign (at top left corner) and then remote.
It comes with a default set of settings. Keep them intact. get to know the port number(generally it's 5005)

Now, start the WSO2 Product you are using with the parameter of debug followed by the port number.
sh debug 5005

It waits and listens on port 5005 until remote debugging client is started. Start the client by, (in IDEA)
Run -> Debug
All done. Now you can do normal debugging by putting break-points in your .java files.

That's how I got it done. Does this help you to succeed in remote debugging?

Tuesday, May 18, 2010

Remote Desktop for Linux Using VNC

Here, I'm telling you a easy way to do remote desktop from Linux. You can access Linux machine through another machine with Linux Distribution or from Microsoft Windows as well.
The method used for this is VNC (Virtual Network Computing).
Before, trying anything, you need to know this. The machine that you use to access a Remote machine is called the client. And the Remote Desktop is the server. In VNC what happens is, we install a VNC server software in server side (Remote machine) and access it though a client software.
Let's get in to business. It's pretty easy. Here I'm telling doing remote desktop for a Linux machine from any environment. I did this Ubuntu 10.04 Lucid Lynx but applicable to other distros as well.
  • In Server (Remote machine), first install vnc4server package from Synaptic Package Manager. (Go to System -> Administration -> Synaptic Package Manager) Search for vnc.

  • Install the package named vnc4server. Or, in terminal,
sudo apt-get install vnc4server
  •  Then, go to client machine and install xvnc4viewer. By terminal or from Synaptic. This is accesses the server and displays the virtual desktop.
apt-get install xvnc4viewer
Now the installation is over. How to run?
  • First, create a server instance in the Remote machine you want to connect to by, entering command 'vnc4server' in terminal. Your output would look similar to below if everything went smooth.
kasun@uom:~$ vnc4server
New 'uom:1 (kasun)' desktop is uom:1
Starting applications specified in /home/kasun/.vnc/xstartup
Log file is /home/kasun/.vnc/uom:1.log
NOTE: In the first run it will ask to set a password. Give a strong password to it as anyone with it can access your server if it's turned on.
You can create any number instances as you like. They will be numbered as <host>:<id> which will be distinguised by a port number. You can know the port number by looking at the log file from ~/.vnc/<host>:<id>.log. ex. cat ~/.vnc/uom\:1.log . You need port to access the server instance. Port number is generally 5901 or nearby. And server IP address (do ifconfig)

Now go to Client.
  • Run xvnc4viewer .  It will ask for the server to connect to: give it in the format <ip>:<port> ex.
  • OR, there are Microsoft Windows alternatives available such as RealVNC. Download it from here. (Filling the form is not compulsory!)
    If nothing went wrong, now a window will be opened with the remote machine desktop. Here I'm running a Red Hat Linux Instance from my Ubuntu Lucid Lynx machine. :-)

    A comment about your experience or anything is appreciated. Good luck!

    Saturday, May 15, 2010

    Fix for 'Ubuntu Lucid Touchpad not working'

    Ubuntu 10.04 is intended as a Long Term Support (LTS) release, but surely it's not less of bugs. Touchpad stopped working when disable button is pressed, is surely one major bug of Lucid by usability wise. My friend Subash and I both faced this problem when we install Lucid. We both have HP Pavillion Series Laptops. In theory, this should work on any laptop/touchpad. Based on user comments, this fix was successful in both Acer and Lenovo, (and of course HP). Does this applies to later versions of Ubuntu as well? Just let me know. I'm using Gentoo Linux now for a change!

    There's a fix for Ubuntu 11.04, 11.10 (Unity) as well. Read to the end. 

    The Synaptic Touchpad driver of touch-pad is installed, and it's Recognized. The problem is sometimes it gets disabled for no reason. Subash's Touchpad didn't work just after the fresh install, forcing him to reinstall again, as the reason was unclear at the moment. Later we found the problem and the fix.

    What we have to do is enable the touchpad by setting the touchpad_enabled property to true. Here's the command to do it in one step.
    gconftool --type Boolean --set /desktop/gnome/peripherals/touchpad/touchpad_enabled true
    This changes the touchpad_enabled property to true in ~/.gconf/desktop/gnome/peripherals/touchpad/%gconf.xml file. This will fix the issue.

    Alternative to above, you may edit the file manually and restart gnome. Change the touchpad_enabled property to true (see screenshot), and enter following command in terminal:
    killall gnome-session

    As I've said, this has fixed the issue in my laptop, HP DV-5, Synaptic Touchpad, Ubuntu 10.04. Share the model of your laptop as well if this fix worked for you too.

    The above solution is for GNOME, and probably won't work on Unity Desktop Environment (ie. Ubuntu 11.04, 11.10). For that, try the following.

    • unity --replace
    Wait for few minutes until it loads all back in to place. If this didn't work, try the following command. My issue was solved by the following command on Ubuntu 11.10. But I tried the above one first.

    • sudo modprobe -r psmouse && sudo modprobe psmouse

    Does this fix work for you? I haven't extensively checked this. So, value your feedback.

    UPDATE 1: After the touchpad issue, it's quite difficult to navigate. So, I created a simple script for it. Save it to your home(~/)  folder. Then, you can run it by following command from terminal.
    sh ~/

    (You can get the terminal by pressing ALT+F2. Type "gnome-terminal" without quotes and press Enter. Or just press Ctrl + Alt + T)

    UPDATE 2: Sometimes, when touchpad disable button is pressed, Keyboard starts to not work. But, still (fortunately) some key combinations works. The reason for keyboard not functioning is unknown for me, but there's a workaround for this problem. The work around which fixed this keyboard issue is the following.

    That is, go to console tty1 and come back! Strange, but it does work.
    • Press CTRL+ALT+F1  (This will bring you out of X Window for a moment)
    • Then, CTRL+ALT+F7  (This brings back the X )

    Wednesday, May 12, 2010

    Fixing the error in installing Maven plugin in Netbeans 6.8

    I yesterday installed Ubuntu family newest OS release Ubuntu Lucid Lynx (Ubuntu 10.04) and then gone through the step of installing softwares. (Tried upgrade option of Ubuntu from Karmic Koala, but got screwed up! anyway it's another story.)
    Ubuntu Lucid comes with Netbeans version 6.8. But there's a bug in it which prevents users from installing Maven plugin in it. It gives an error saying,
     Some plugins require plugin Utilities API to be installed. The plugin Utilities API is requested in version >= 8.0 but only was found.

     This is a bug Netbeans and it is reported in Launchpad. Luckily, a workaround is available. It's listed under #2. What you have to do is,
    1. Click on "Settings" tab in "Plugins"
    2. Activate "NetBeans" (if not activated) and deactivate "Ubuntu Specific" (or "NetBeans for Ubuntu" in case of 6.8-0ubuntu4)
    3. Go back to "Available Plugins" and Reload the catalog.
    4. Then install Maven.

    Wednesday, April 28, 2010

    Google Summer of Code 2010

    Finally this year’s Google Summer of Code (GSoC) 2010 students are announced. I am Glad to announce, I am one of them. :-)

    I chose DocBook as my organization. DocBook provides a set of standards and tools for technical documentation. It was initially and is primarily intended for technical documentation, but has been extended for use in other domains as well.

    The proposal I submitted "Web Help Output for DocBook" was to add a browser-independent, platform-independent documentation “Web Help” output format using a combination of HTML, CSS, and JavaScript with a search index created at build time by an indexer application written in Java.

    Finally, one of my dreams come true. There is big enthusiasm about the program in our university, University of Moratuwa specially in Deparatment of Computer Science & Engineering. Our University lead last 3 years for the number of accepted students consecutively. You can find more info Open Source at Google Blog.
    Unofficial stats for this year for University of Moratuwa includes,

    CSE Batch '07: 12 students
    CSE Batch '06: 10 students

    ENTC Batch '07: 1 student
    Engineering Faculty: 23

    Faculty of IT: 3

    I represent CSE-Batch '07. My project proposal can be found here. The selected Student List is at

    I should thank my mentor David Cramer, for motivating me and who guided me for this success. And I would like to thank  Jirka Kosek who was a co-mentor to me, who later resigned as a mentor to contest as a student along with Richard Hamilton who administrates DocBook project for GSoC. Sincere thanks to the Head of Department, Vishaka Nanayakkara and the dearest staff members for guiding us to reach the success.

    I'm really excited and hoping to have a great experience! :-)

    Proposal "Web Help Output for DocBook" for GSoC 2010

    UPDATE: DocBook WebHelp XSL Customization is now integrated DocBook XSL release starting from version 1.76.0. The release is available for download at Release notes are at DocBook WebHelp Project (22/10/2010)

    UPDATE: Google Summer of Code 2010 program finished on 20th August. See DocBook WebHelp Project for the end notes, features and to view the demo of the beta release. 

    The modified schedule can be found in WebHelpGsoc2010. Though the schedule is not necessary at this moment it may give an idea of the development process went on which might help for a new developer coming in to WebHelp.
    - 23/08/2010

    Google Summer of Code 2010 - Project Proposal

    ProjectWebHelp Output for DocBook
    Student NameKasun Gajasinghe
    IMkasun (irc://
    Time zone
    MentorsDavid Cramer, Jirka Kosek

    DocBook is a set of standards and tools for technical documentation. A vital requirement for technical publications is to produce a Web-based help format that is synchronized with the content. So the documentation is up-to-date making site maintenance easier. This will contain client-side searching with support for stemming, table of contents, Index and a HTML export ability. The main idea is to generate a Web Help Output from the DocBook content XML files using an Ant build.

    About me
    Participating in GSoC
    I am passionate about Open Source World and love contributing to free software. I hope that Google Summer of Code will be a great opportunity for me to become part of another open source community, contribute for the development of the project, make new friends, and develop new skills. I believe that GSoC will be a excellent starting point for this.
    Why DocBook

    DocBook is a leading format for documentation and is especially popular with Open Source projects. So, I am particularly interested in DocBook and hope to become a permanent member of DocBook project.
    I have planned to devote 35-40 hours per week for this project.

    I have researched with suggestions from my co-mentors on ways to implementing client-side searching and came up with with following options.

    Use Lucene QueryParser
    Use the Java Indexer of the htmlsearch demo plugin as a base and add needed features

    As Lucene works in Server-side, we have to compile it into JavaScript to make it work in client-side. For that Jirka suggested the use of GWT. But unfortunately Lucene isn't ported to GWT yet. I've looked at it and found that Luke, the Lucene Index toolbox has ported to GWT. Then, I went to Lucene IRC channel to get further details (#lucene @ freenode). Their I found that it is not possible to use Lucene only in client-side. They said that having queryparser in JavaScript can not be done and Luke uses a Java back-end in server-side for searching. So have to give up this option.
    Java indexer is a good starting point. It does basic indexing and stores it in js files with keys (words) and their relevant file names. Then, it does basic searching based on given key words. This could be used as the base and improve the code and add new features. I have downloaded the source code and studied it. The proposed enhancements are listed under Proposal section.
    For Table of Content tree generation, Considered,

    Frameset approach with the tree included in a separate file.
    Generate complete toc for every generated files and make it appear to be a pane

    I chose the second method. As with that, "Deep linking" happens automatically and will be functional to some good extent under a no javascript environment. And this is the method mentor recommended. Further, researching will happen in the following days.

    DocBook is a set of standards and tools for technical documentation. It was initially and is primarily intended for technical documentation, but has been extended for use in other domains. The current DocBook schema is available in several languages including RELAX NG and DTD and is maintained by the DocBook Technical Committee of OASIS. The DocBook Open Repository is a project hosted on SourceForge that maintains a set of XSL stylesheets for converting a DocBook instance into a variety of output formats, including various html formats, pdf (via XSL-FO), man pages. The currently supported html output formats include monolithic html, chunked html, Microsoft HTML Help (.chm), Eclipse documentation plugins, and Java Help.

    This proposal is to add a browser-independent, platform-independent documentation “Web Help” output format using a combination of HTML, CSS, and JavaScript with a search index created at build time by an indexer application written in Java.

    Search is done in client-side. For that, I plan to use the “htmlsearch1.04” demo plugin from DITA Open Toolkit as a base and enhance it with the needed features. As DocBook is included as one of their supported products, it will be compatible for this project. Further, it's license allows the use of it in commercial applications as well.

    The proposed design for searching is, first generate JavaScript files which contains all glossary terms of the html files with matched file locations (i.e. as key, value pairs). Then, for a given query, keywords are extracted and then by the use of generated glossary/index, the output will be displayed.
    The enhancements currently planned are,

    • Support for stemming and lemmatization for a given query
    • Search with Boolean operators (AND, OR)
    • Meta-data such as 'Prev' and 'Next' in the content page will be ignored when indexing.
    • Improve support for Asian Languages (Japanese and other Asian languages, meta tag content is used.)
    • As searching in client-side may slow-down the application, necessary optimization will be adopted.

    I plan to use YUI library for the TOC tree generation. I will abandon the frameset-based approach and instead use a CSS-based mechanism in which the TOC is generated in every page and CSS is used to properly format it for viewing. With this approach, synchronization with the content file happens automatically. Further with this mechanism deep-linking happens automatically.
    UI design will be developed using CSS and other technologies and will be little similar to Eclipse Help.
    The Planned development schedule is given below.

    Development Schedule
    I am already proficient in Java, JavaScript, XML and CSS, but will start studying XSL in the bonding period and continue to learn it while doing the programming.

    Community Bonding Period: April 26 - May 24Get to know the mentor and the community
    Study the required API and features for WebHelp
    Preparing the development environment 
    Look for a good searching approach.
    Start designing a good model
    Interim Period: May 24 - July 12Dividing the development process into stages with the help of the mentor
    Developing the TOC tree using a CSS-based mechanism (YUI)
    Implementing the synchronization with the content
    Adding an index with the help of DocBook schema
    Designing client-side search mechanism with all the things such as stemming and lemmatization into consideration and start coding. 
    Designing a better user interface.
    July 12 - July 16Submitting mid-term evaluations and continue with the development
    Interim Period: July 16 - August 9Completing TOC with synchronization
    Continue developing the search mechanism
    Testing the synchronization and searching
    Developing the User Interface
    August 9 - August 16Refine the code and testing the code and doing necessary improvements.
    August 20Final evaluation deadline
    August 30Submitting required code to Google

    References and Resources

    [1] My Blog: Kasun's Tech Thoughts
    [2] Twitter:
    [3] My Google Code Hosting Profile
    (Projects hosted: documentation-aggregation-application, KFinder:A file searcher, cse-checkers(Java), cse-l3-2009-070137m:A Firefox extension)
    [4] DocBook 5.0: The Definitive Guide
    [5] DocBook XSL: The Complete Guide
    [6] dita-users · DITA users yahoo group
    [7] YUI Library
    [8] Documentation Aggregation Application
    [9] Delicious Extension for Google Chrome
    [10] University of Moratuwa, Sri Lanka
    [11] Deparment of Computer Science & Engineering, Faculty of Engineering