Saturday, August 21, 2010

DocBook WebHelp Project


DocBook WebHelp was the project I worked on for the Google Summer of Code 2010 program. Pencil down date for it was on 16th August, 2010, which means the Coding officially finished on that day. So, I with my mentor David Cramer finished all the requirements planned, and wrote all the documentation needed. Results was announced today, 21st August by Google Open Source Program team; I successfully finished the project :)

The demo of the output produced by WebHelp XSL customization is available on following link. The demo shown is the documentation of DocBook WebHelp.
http://docbook.sourceforge.net/release/xsl/current/webhelp/docs/ch01.html
http://www.thingbag.net/docbook/gsoc2010/doc/content/ch01.html

The latest output in the snapshots have lot more features and looks quite beautiful compared to the released version. Do have a look -
http://snapshots.docbook.org/xsl/webhelp/docs/index.html
http://vulture.gentoo.org/~kasun/docbook/docbook-webhelp-snapshot-current/content/ch01.html


WebHelp Output

WebHelp Search tab
You can download DocBook installation from,
http://sourceforge.net/projects/docbook/

The Webhelp customization is available under, docbook-xsl-ns-1.76.1/webhelp. Following is some brief details about the DocBook WebHelp customization.

A common requirement for technical publications groups is to produce a Web-based help format that includes a table of contents pane, a search feature, and an index similar to what you get from the Microsoft HTML Help (.chm) format or Eclipse help. If the content is help for a Web application that is not exposed to the Internet or requires that the user be logged in, then it is impossible to use services like Google to add search.

DocBook WebHelp provides a browser-independent, platform-independent documentation “Web Help” output format for DocBook files. WebHelp provides a sophisticated but inexpensive web publishing option for DocBook.

Features
  • Full text search.
    • Stemming support for English, French, and German. Stemming support can be added for other languages by implementing a stemmer.
    • Support for Chinese, Japanese, and Korean using code from the Lucene search engine.
    • Search highlighting shows where the searched for term appears in the results. Use the H button to toggle the highlighting on and off.
  • Search results can include brief descriptions of the target.
  • Table of Contents (TOC) pane with collapsible toc tree.
  • Autosynchronization of content pane and TOC.
  • TOC and search pane implemented without the use of a frameset.
  • An Ant build.xml file to generate output. You can use this build file by importing it into your own or use it as a model for integrating this output format into your own build system.

So, what do you think of the output? Are you interested to give it a try?

Follow up of my posts related to DocBook WebHelp are here
Follow up of my posts related to DocBook are here

PS: For discussions, please subscribe to my comment feed such that you won't miss my replies. Alternatively contact us via docbook-apps list - http://www.oasis-open.org/mlmanage/index.php

50 comments:

  1. Hi Kasun. Congratulations for your work! I belong to the Learning Technologies of the Open University of Catalonia (Barcelona, Spain) and we work in projects also related with DocBook, like translatord to epub and mobipocket and so on. If you are interested in any international collaboration, please, contact us at jrivera@uoc.edu.
    Best regards

    ReplyDelete
  2. Excellent, excellent! I've been thinking of something like this for years, and now you've done it!

    ReplyDelete
  3. Hello Kasun. Do you have in mind to make more 'smart' the search of words. I mean to make a scoring of the search. As example:
    if you look for 'test' word and this word is in a that this word will have a better scoring of the same word found in a <p></p >
    In this manner the list of the founded words will be sorted by the scoring.

    ReplyDelete
  4. Hi tinti,
    thanks for your interest in docbook webhelp project.
    Currently, only smartness it have is the stemming support and international language support. I suppose you know what I meant by this.
    Yes, it would be great to have scoring for the search terms like Lucene. But it'll take a little more time. The limitation in our case is that the solution should be compatible with both Java and JavaScript, which is hard to achieve. Let's see.

    BTW, I heard that oXygen editor is implementing a search with scoring. If there's a chance to use it (no licensing issues!) and compatible with ours, we'll include it to webhelp.
    http://www.mail-archive.com/oxygen-user@oxygenxml.com/msg02757.html

    ReplyDelete
  5. Nice work.
    Are you aware of a project that could allow to edit docbook via a web interface ?
    Because there are many export formats, but very few editors for Docbook.

    ReplyDelete
  6. Thanks for creating this nice DocBook output!

    ReplyDelete
  7. Hi Kasun,
    This is great. I've been trying it out.
    Two questions:
    * Search: Can it limit search result to a specific multi-word search like "Java 1.6"?
    * Handling Includes and Entities with Saxon: My XML files use includes and entities. I can generate the files from the command line using Saxon, etc. as shown here: http://www.sagehill.net/docbookxsl/Xinclude.html. However, I'm wondering if you know of an easy way to modify the build.xml to handle these. Not sure if you can use Ant's xslt or java tags for this purpose.

    Thanks for your help, and great job!

    ReplyDelete
  8. To anser the question from Bruno V. As someone who have been through testing all commercially available XML editors my recommendaion (for what it is worth) is to try Oxygen XML editor. It's Java based so it is available on all platforms. As far as I'm concerned that is the best Docbook editor available. YMMV

    ReplyDelete
  9. First of all, thanks for your great work!
    I've been playing around with docbook and webhelp. I've the problem if I click at the left navigation tree sometimes the #content runs way to far. That means the jump to the anchor doesn't work proper. Sometimes #content scrolls to much... Do you have an fix for that?
    Thanks a lot, Benjamin

    ReplyDelete
  10. Hi Benjamin,
    This is in fact is fixed in the current DocBook trunk. Have a look at it. Specifically, add the first few lines of code just after the "$(document).ready(function()" line in template/common/main.js

    https://docbook.svn.sourceforge.net/svnroot/docbook/trunk/xsl/webhelp/template/common/main.js

    It's commented, so you won't be able to miss it.

    Here's a preview of webhelp as of this moment - http://vulture.gentoo.org/~kasun/docbook/docbook-webhelp-snapshot-current/content/ch01.html

    PS: I hope you will get my comment. I suggest you or any new comers to subscribe to my comment feed such that you won't miss my replies. Alternatively contact us via docbook-apps list. http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook

    ReplyDelete
  11. And, now we have word scoring for search. So, for the title / a bold word etc. will get a higher rank. The relevance is shown in for each search result using a 1-5 scale graphically. It seems to work great! Contributed by Oxygen guys.

    tinti has asked about this a while ago, guess others will find this useful. We are rolling out a new Docbook release soon. So... stay tuned!

    ReplyDelete
  12. Bill Burns (bburns@hp.com)November 30, 2011 at 1:52 AM

    Hi, Kasun.

    We have implemented your stemmers in our DocBook environment in a custom web-based help solution. We seem to be having a little trouble, though, with some of the terms that are returned by the tokenizer (always returning alwai, array returning arrai, using returning us, and similar problems). Can you point me to any current work that might resolve some of these issues?

    Thanks,

    ReplyDelete
  13. Hi Bill,

    Those tokens you mentioned are quite normal cases. That's because stemming does NOT produce English words always; and it doesn't need to. It only produces a word that could be considered as the root-word. But this isn't supposed to be a problem, because the client-side stemmer stems the words in the user queries as well.

    Basically, the Java stemmer, and the client-side JavaScript stemmer comes to a common ground by these root words (like alwai etc.). This does NOT affect the user experience. But there could be some trivial errors here and there since there are minor incompatibilities between these two stemmers.

    If you are still interested in having the real words in the index, you may start by looking at the given link. Currently, there isn't any plan to implement this in to docbook webhelp because current implementation is more than sufficient.

    What's rationale to have the said functionality of yours?

    http://en.wikipedia.org/wiki/Stemming#Additional_algorithm_criteria

    ReplyDelete
  14. Bill Burns (bburns@hp.com)December 1, 2011 at 5:55 AM

    Well, the problem is that in some cases, the search is missing important terms. I work with an organization that produces servers, so not bbeing able to search on the term "array" is a problem. Some of the issues with gerunds and other fixed word categories aren't such a problem. However, we would just like to minimize the cases in which endings are arbitrarily removed. I don't see how -ay wound up on the list of postpositional morphemes to be removed. I can't think of any context in which it's used as an internal morphemic change. So the removal of this particular morpheme bugs me.... and makes me a little wary of other possible gotchas.

    Overall, I'm happy with the performance. It was a great feature to add to DocBook XSL. Thanks.

    ReplyDelete
  15. I see. I've had a look at the issue. Based on the rationale made in the Tartarus site, -ay should be converted to -ai. It specifically says,

    /* step2() turns terminal y to i when there is another vowel in the stem. */ at http://tartarus.org/martin/PorterStemmer/java.txt

    The issue in here is in the Java English stemmer, which apparently does not implement the specified algorithm correctly there. It seems to me that the Tartarus Snowball generated code (which we use from http://snowball.tartarus.org/download.php) is a little different from the dedicated English Stemmer (in the first link).

    We better reach the upstream snowball project and use the most correct one.

    ReplyDelete
  16. Bill Burns (bburns@hp.com)December 1, 2011 at 7:44 PM

    Thanks, Kasun. That helps a lot.

    ReplyDelete
  17. Great! :) I'll see what I can do from my end.

    ReplyDelete
  18. Thank for your work, Kasun. It's superb!

    Let me ask something. Where can I find the features of DocBook supported by your conversor? I mean, does it support DocBook 4 or 5? article or book? Does support XInclude? I ask that because I having some problems to correctly see the webhelp output when using a DocBook 5 book with XInclude and xml files distributed by some directories...

    ReplyDelete
  19. Hi Paco,

    David has just committed support for xinclude to the build.xml that's in svn. You could have give it a try using DocBook snapshots, but currently it's broken. A new release of DocBook-XSL (including webhelp) will come anytime now. So, you may use it for your production purposes when it's available.

    http://docbook.svn.sourceforge.net/viewvc/docbook/trunk/xsl/webhelp/build.xml?r1=9110&r2=9170

    You could also just use Docbkx since xinclude is supported there too.

    DocBook 4.x and 5.x are both supported already. If you're using 5.x,
    the -ns- version of the xsls are better, but the regular ones will
    work too by stripping the namespace first.

    This answer was compiled with the help of David Cramer.

    ReplyDelete
  20. Thank you, Kasun.

    I have posted my problem in the Oxygen Forum (http://www.oxygenxml.com/forum/topic6528.html) and they noted that my book had no title. After including a title, everything went smooth.

    By the way, I suppose that Oxygen is using your latest converter. Do you have any contact with them?

    Thank you again for your great work.

    ReplyDelete
  21. Good to hear that your issue is now fixed. Hope your experience with webhelp will go smooth.

    Oxygen is using the latest _released_ version of webhelp XSLs. That means the development on trunk is not there. I will write another blog post when the current trunk get released (with new snapshots!).

    We are in contact with Oxygen where we worked with them for integrating this to OxygenXML. What do you have in mind?

    Please share us your experience with webhelp as well when you are ready. It'll help us improve!

    ReplyDelete
  22. We are trying to write the help of an academic software. We have investigated several formats and it seems that webhelp is the more suitable by now.

    I will let you know the project webpage as soon as it becomes available.

    ReplyDelete
  23. That'll be great. Keep an eye on docbook-apps mailing list for the release of new version as well.

    ReplyDelete
  24. I'm new to DocBook and WebHelp, and I would like to know if it is possible to use DocBook+WebHelp to easily set up a "context-sensitive" help system for a web application using your contribution to the project.
    Thanks

    ReplyDelete
    Replies
    1. See
      "Cross platform, web-based WebHelp from DocBook" for contextual help, and much more.

      www.janetswisher.com/?itemid=273

      Delete
  25. Perhaps you could clarify something for me -- looking at the webhelp docs on sourceforge, it appears that to replace the jqueryui theme and/or positioning.css that I'll need to copy webhelp.xsl and make the changes in that . Then I'll need to point to that new webhelp, as opposed to just setting a parameter somewhere. Is that correct?

    ReplyDelete
    Replies
    1. Hi,
      I just saw your comment. Somehow I missed it before. Sorry for the late reply.
      Currently, that is the case. We will parameterize the jquery themes. You may just edit the current positioning.css file if it's ok with you. Since webhelp is already a highly customized output, this need arises sometimes.

      Delete
  26. I'm converting all the docs for the Phing project (http://www.phing.info/trac/) and I was wondering about one thing. We have a fairly long index and when you tap a title and the corresponding page is loaded in the right frame all the toc stuff in the left frame is scrolled to top. It seems that the left frame is reloaded and shown from the beginning.

    Is there any chance of getting the left frame to stay put and not reload/scroll to top? It would make mush more sense when navigating in along document if you don't loose the position each time you click a toc entry.

    Otherwise this is really great. In Phing we used to have a home-grown solution that does the left/right toc/content frame stuff (but without the search)

    Cheers
    Johan

    ReplyDelete
    Replies
    1. Hi Johan,

      We haven't notice the scrolling thing of toc. This is an issue in webhelp which should be fixed. This design is fully based on CSS, so a few CSS tweaks (may be with use of cookies) would fix the problem. We don't use outdated frame concept for webhelp because of its large number of drawbacks.
      And, since the toc is embedded into every html page, reloading will happen.

      Why frames are bad? FuseSource which use Webhelp says http://goo.gl/4akeZ
      "Some people complained that the frames didn't work in their browsers. We got complaints that it was hard to bookmark pages or to get links to send in e-mail. There were also complaints that the documentation looked very 1999."

      May be you should consider on filing a bug at docbook bug tracker just to make sure that this won't go un-noticed in the next release.

      --Kasun

      Delete
    2. I will file an error. In the meantime you can view an example of the issue here:

      http://aditus.nu/phing-doc/output/webhelp/content/

      which is an example of the phing-manual rendered with WebHelp.

      Just expand the menu (for example "Optional tasks" so it grows over the window height. Scroll down and then click on some section. The section will show correctly but the toc will be scrolled back to top which is nuance.

      (The rendering was done by the help of plain, non-customized, Oxygen built-in stylesheets)

      Delete
    3. I have added this as issue "3490902" in SourceForge DocBook tracker.

      Cheers!
      Johan

      Delete
    4. Thanks for reporting the issue.
      In the meanwhile, if you are interested in patching your code, you may use following scripts as appropriate.

      var x = document.getElementById('leftnavigation').scrollTop //x = how far toc has gone down. [1]

      $('#leftnavigation').animate({scrollTop: x}, 200); //scrolls down by x [2]
      NOTE: In the next release, #leftnavigation should be replaced by #treeDiv.

      Add a onclick method for the links, and set a cookie for the value of [1]. Then at main.js document.ready, do [2] with the relevant cookie value.
      This is probably a little vague. So, you may wait until next release. :)

      Delete
  27. Hi,
    I have an issue while compiling the indexer with Webhelp and I can't get rid off. The error is:

    java.lang.NoClassDefFoundError: com/nexwave/nquindexer/IndexerMain
    [java] Caused by: java.lang.ClassNotFoundException: com.nexwave.nquindexer.IndexerMain

    I am using ant 1.8.2, jdk 1.6.0. In my Classpath I have: webhelpindexer.jar, lucene-analyzers-3.0.0.jar, lucene-core-3.0.0.jar, xercesImpl.jar, xml-apis.jar. And I am using the class com.nexwave.nquindexer.IndexerMain by using a java task for the implementation.

    I would really appreciate if you can help me on this because I spent a hell of time without a result!

    Riad

    ReplyDelete
    Replies
    1. Can you tell me what version of DocBook XSLs you are using? Is it 1.76.1 or a snapshot? It's recommend to use latest snapshot if you are in development stage because a release is coming soon.

      And, well, 1.76.1 doesn't have com.nexwave.nquindexer.IndexerMain class. It's a new addition after that release. Just open up the the webhelpindexer.jar and see whether the class is there.

      To make sure, are you compiling the webhelpindexer source, or just using the jar to generate the search contents? For a user, there's no need of compiling the source.

      Delete
    2. Thank you a lot for the rapid answer.

      I am using 1.76.1 but added the java task instead of using the indexertask (it was also not working) and I am just using the jar file for webhelpindexer.

      Where can I have access to the latest snapshot and how to compile the webhelpindexer source or obtain a jar file containing the class com.nexwave.nquindexer.IndexerMain?

      Delete
    3. http://snapshots.docbook.org/

      You may simply replace the webhelpindexer.jar in 1.76.1 with extensions/webhelpindexer.jar in the extracted snapshot. And, you should carefully copy the relevant contents in build.xml file.

      I haven't tested this by myself. Let me know how it goes.

      Delete
    4. It is working perfectly with the snapshot version. For this, I had to:
      -Add webhelpindexer.jar and tagsoup.jar to my indexer-classpath
      -Copy some content (java parameters) to the index target in build.xml file
      And replace the webhelp.xsl file and the Search foler under templat/contents/ with the new ones from the snapshot

      Thank you for the precious help :D

      Delete
  28. Do you plan to incorporate following additional features as part of current or future plan?

    Allow phrase searching as well as word matches
    Support use of wildcards
    Support for case-sensitive and case-insensitive Search
    Standard Boolean Search using AND / OR/ NOT
    Support for related terms or synonyms
    Support for auto-complete for search terms
    Enable proximity search (terms located near each other; e.g., within 2 words, not just exact matches)
    Provide fuzzy AND (for ranking)

    ReplyDelete
  29. Has anyone implemented context-sensitive help with DocBook+WebHelp using a header / map file rather than hard-coded URLs? Or with some equivalent functionality that provides an intermediate step between the application and the help?

    Note that a CSH map file is different from a DITA map file. Here's a good explanation of CSH map files:

    http://www.webworks.com/Documentation/Reverb/index.html#page/03.Preparing%2520and%2520Publishing%2520Content/Preparing%2520DITA%2520Files.3.11.htm

    ReplyDelete
  30. I had this working for a while on Mac OS 10.8, then one day I got an error, "No more DTM IDs are available." There is a fix here:

    http://habrahabr.ru/company/alawar/blog/193726/

    Long story short, edit docbook-xsl/webhelp/build.xml and change

    classpath="${xercesImpl.jar}"

    to

    classpath="${xslt-processor-classpath}"

    ReplyDelete
  31. Can anyone point me to the part of webhelp/xsl/webhelp-common.xsl that controls chunking? I can't relate what I find in this doc to the code I see in that file:

    http://www.sagehill.net/docbookxsl/Chunking.html

    I want to modify the XSL so that it chunks at instead of .

    ReplyDelete
    Replies
    1. I believe you found a solution using the suggestions in http://stackoverflow.com/questions/20980281/docbook-xsl-chunking

      Delete
  32. This comment has been removed by a blog administrator.

    ReplyDelete
  33. Does profiling for webhelp works with the 1.76.1 style sheets? Is xhtml/profile-chunk.xml the right template to start with?

    ReplyDelete
    Replies
    1. Hi Michael,

      DocBook 1.78.1 has support for profiling. You can check the build script in here - http://sourceforge.net/p/docbook/code/HEAD/tree/trunk/xsl/webhelp/build.xml

      Delete
    2. Hi Kasun,

      I am working with DocBook 1.78.1 through Maven Dockbkx plugin and cannot make profiling to work (works perfectly for FO and HTML),

      Can you point me to necessary steps in Maven and/or XSL customization setups to fix it? I tried to search the solution but nothing works for me (maybe I apply instructions for Ant incorrectly to Maven infrastructure).

      Delete
    3. Hi Kasun,

      I am working with DocBook 1.78.1 through Maven Dockbkx plugin and cannot make profiling to work (works perfectly for FO and HTML),

      Can you point me to necessary steps in Maven and/or XSL customization setups to fix it? I tried to search the solution but nothing works for me (maybe I apply instructions for Ant incorrectly to Maven infrastructure).

      Delete
  34. It took me 20 hours to render an entire project database (xml generated by a code crawler) and the end result blows me away (Please see http://red1.org/idempiere) TMany awesome kudos to you and your team and i am willing to stand as goodwill reference and ambassador to your wonderful work. (Feel free to quote me or point me to where i can post further reviews.) High ten!

    ReplyDelete