DocBook WebHelp Project : Kasun's Tech Thoughts

Saturday, August 21, 2010

DocBook WebHelp Project

DocBook WebHelp was the project I worked on for the Google Summer of Code 2010 program. Pencil down date for it was on 16th August, 2010, which means the Coding officially finished on that day. So, I with my mentor David Cramer finished all the requirements planned, and wrote all the documentation needed. Results was announced today, 21st August by Google Open Source Program team; I successfully finished the project :)

The demo of the output produced by WebHelp XSL customization is available on following link. The demo shown is the documentation of DocBook WebHelp.
http://docbook.sourceforge.net/release/xsl/current/webhelp/docs/ch01.html
~~http://www.thingbag.net/docbook/gsoc2010/doc/content/ch01.html~~

The latest output in the snapshots have lot more features and looks quite beautiful compared to the released version. Do have a look -
http://snapshots.docbook.org/xsl/webhelp/docs/index.html
~~http://vulture.gentoo.org/~kasun/docbook/docbook-webhelp-snapshot-current/content/ch01.html~~

WebHelp Output

WebHelp Search tab

You can download DocBook installation from,
http://sourceforge.net/projects/docbook/

The Webhelp customization is available under, docbook-xsl-ns-1.76.1/webhelp. Following is some brief details about the DocBook WebHelp customization.

A common requirement for technical publications groups is to produce a Web-based help format that includes a table of contents pane, a search feature, and an index similar to what you get from the Microsoft HTML Help (.chm) format or Eclipse help. If the content is help for a Web application that is not exposed to the Internet or requires that the user be logged in, then it is impossible to use services like Google to add search.

DocBook WebHelp provides a browser-independent, platform-independent documentation “Web Help” output format for DocBook files. WebHelp provides a sophisticated but inexpensive web publishing option for DocBook.

Features

Full text search.

Stemming support for English, French, and German. Stemming support can be added for other languages by implementing a stemmer.

Support for Chinese, Japanese, and Korean using code from the Lucene search engine.
Search highlighting shows where the searched for term appears in the results. Use the H button to toggle the highlighting on and off.

Search results can include brief descriptions of the target.

Table of Contents (TOC) pane with collapsible toc tree.

Autosynchronization of content pane and TOC.
TOC and search pane implemented without the use of a frameset.
An Ant build.xml file to generate output. You can use this build file by importing it into your own or use it as a model for integrating this output format into your own build system.

So, what do you think of the output? Are you interested to give it a try?

Follow up of my posts related to DocBook WebHelp are here
Follow up of my posts related to DocBook are here

PS: For discussions, please subscribe to my comment feed such that you won't miss my replies. Alternatively contact us via docbook-apps list - http://www.oasis-open.org/mlmanage/index.php

50 comments:

Josep RiveraAugust 29, 2010 at 2:48 PM
Hi Kasun. Congratulations for your work! I belong to the Learning Technologies of the Open University of Catalonia (Barcelona, Spain) and we work in projects also related with DocBook, like translatord to epub and mobipocket and so on. If you are interested in any international collaboration, please, contact us at jrivera@uoc.edu.
Best regards
ReplyDelete
Replies
UnknownSeptember 25, 2010 at 12:42 PM
Excellent, excellent! I've been thinking of something like this for years, and now you've done it!
ReplyDelete
Replies
tintiOctober 1, 2010 at 7:45 PM
Hello Kasun. Do you have in mind to make more 'smart' the search of words. I mean to make a scoring of the search. As example:
if you look for 'test' word and this word is in a that this word will have a better scoring of the same word found in a <p></p >
In this manner the list of the founded words will be sorted by the scoring.
ReplyDelete
Replies
UnknownOctober 1, 2010 at 7:57 PM
Hi tinti,
thanks for your interest in docbook webhelp project.
Currently, only smartness it have is the stemming support and international language support. I suppose you know what I meant by this.
Yes, it would be great to have scoring for the search terms like Lucene. But it'll take a little more time. The limitation in our case is that the solution should be compatible with both Java and JavaScript, which is hard to achieve. Let's see.

BTW, I heard that oXygen editor is implementing a search with scoring. If there's a chance to use it (no licensing issues!) and compatible with ours, we'll include it to webhelp.
http://www.mail-archive.com/oxygen-user@oxygenxml.com/msg02757.html
ReplyDelete
Replies
BrunoOctober 15, 2010 at 7:06 PM
Nice work.
Are you aware of a project that could allow to edit docbook via a web interface ?
Because there are many export formats, but very few editors for Docbook.
ReplyDelete
Replies
UnknownNovember 12, 2010 at 7:22 PM
Thanks for creating this nice DocBook output!
ReplyDelete
Replies
DukeMarch 15, 2011 at 1:43 AM
Hi Kasun,
This is great. I've been trying it out.
Two questions:
* Search: Can it limit search result to a specific multi-word search like "Java 1.6"?
* Handling Includes and Entities with Saxon: My XML files use includes and entities. I can generate the files from the command line using Saxon, etc. as shown here: http://www.sagehill.net/docbookxsl/Xinclude.html. However, I'm wondering if you know of an easy way to modify the build.xml to handle these. Not sure if you can use Ant's xslt or java tags for this purpose.

Thanks for your help, and great job!
ReplyDelete
Replies
AnonymousMay 19, 2011 at 7:52 PM
To anser the question from Bruno V. As someone who have been through testing all commercially available XML editors my recommendaion (for what it is worth) is to try Oxygen XML editor. It's Java based so it is available on all platforms. As far as I'm concerned that is the best Docbook editor available. YMMV
ReplyDelete
Replies
BenjaminNovember 8, 2011 at 8:41 PM
First of all, thanks for your great work!
I've been playing around with docbook and webhelp. I've the problem if I click at the left navigation tree sometimes the #content runs way to far. That means the jump to the anchor doesn't work proper. Sometimes #content scrolls to much... Do you have an fix for that?
Thanks a lot, Benjamin
ReplyDelete
Replies
UnknownNovember 9, 2011 at 4:02 PM
Hi Benjamin,
This is in fact is fixed in the current DocBook trunk. Have a look at it. Specifically, add the first few lines of code just after the "$(document).ready(function()" line in template/common/main.js

https://docbook.svn.sourceforge.net/svnroot/docbook/trunk/xsl/webhelp/template/common/main.js

It's commented, so you won't be able to miss it.

Here's a preview of webhelp as of this moment - http://vulture.gentoo.org/~kasun/docbook/docbook-webhelp-snapshot-current/content/ch01.html

PS: I hope you will get my comment. I suggest you or any new comers to subscribe to my comment feed such that you won't miss my replies. Alternatively contact us via docbook-apps list. http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook
ReplyDelete
Replies
UnknownNovember 9, 2011 at 4:18 PM
And, now we have word scoring for search. So, for the title / a bold word etc. will get a higher rank. The relevance is shown in for each search result using a 1-5 scale graphically. It seems to work great! Contributed by Oxygen guys.

tinti has asked about this a while ago, guess others will find this useful. We are rolling out a new Docbook release soon. So... stay tuned!
ReplyDelete
Replies
Bill Burns (bburns@hp.com)November 30, 2011 at 1:52 AM
Hi, Kasun.

We have implemented your stemmers in our DocBook environment in a custom web-based help solution. We seem to be having a little trouble, though, with some of the terms that are returned by the tokenizer (always returning alwai, array returning arrai, using returning us, and similar problems). Can you point me to any current work that might resolve some of these issues?

Thanks,
ReplyDelete
Replies
UnknownNovember 30, 2011 at 2:24 AM
Hi Bill,

Those tokens you mentioned are quite normal cases. That's because stemming does NOT produce English words always; and it doesn't need to. It only produces a word that could be considered as the root-word. But this isn't supposed to be a problem, because the client-side stemmer stems the words in the user queries as well.

Basically, the Java stemmer, and the client-side JavaScript stemmer comes to a common ground by these root words (like alwai etc.). This does NOT affect the user experience. But there could be some trivial errors here and there since there are minor incompatibilities between these two stemmers.

If you are still interested in having the real words in the index, you may start by looking at the given link. Currently, there isn't any plan to implement this in to docbook webhelp because current implementation is more than sufficient.

What's rationale to have the said functionality of yours?

http://en.wikipedia.org/wiki/Stemming#Additional_algorithm_criteria
ReplyDelete
Replies
Bill Burns (bburns@hp.com)December 1, 2011 at 5:55 AM
Well, the problem is that in some cases, the search is missing important terms. I work with an organization that produces servers, so not bbeing able to search on the term "array" is a problem. Some of the issues with gerunds and other fixed word categories aren't such a problem. However, we would just like to minimize the cases in which endings are arbitrarily removed. I don't see how -ay wound up on the list of postpositional morphemes to be removed. I can't think of any context in which it's used as an internal morphemic change. So the removal of this particular morpheme bugs me.... and makes me a little wary of other possible gotchas.

Overall, I'm happy with the performance. It was a great feature to add to DocBook XSL. Thanks.
ReplyDelete
Replies
UnknownDecember 1, 2011 at 4:47 PM
I see. I've had a look at the issue. Based on the rationale made in the Tartarus site, -ay should be converted to -ai. It specifically says,

/* step2() turns terminal y to i when there is another vowel in the stem. */ at http://tartarus.org/martin/PorterStemmer/java.txt

The issue in here is in the Java English stemmer, which apparently does not implement the specified algorithm correctly there. It seems to me that the Tartarus Snowball generated code (which we use from http://snowball.tartarus.org/download.php) is a little different from the dedicated English Stemmer (in the first link).

We better reach the upstream snowball project and use the most correct one.
ReplyDelete
Replies
Bill Burns (bburns@hp.com)December 1, 2011 at 7:44 PM
Thanks, Kasun. That helps a lot.
ReplyDelete
Replies
UnknownDecember 1, 2011 at 10:54 PM
Great! :) I'll see what I can do from my end.
ReplyDelete
Replies
PacoDecember 18, 2011 at 10:00 PM
Thank for your work, Kasun. It's superb!

Let me ask something. Where can I find the features of DocBook supported by your conversor? I mean, does it support DocBook 4 or 5? article or book? Does support XInclude? I ask that because I having some problems to correctly see the webhelp output when using a DocBook 5 book with XInclude and xml files distributed by some directories...
ReplyDelete
Replies
AnonymousDecember 19, 2011 at 10:43 AM
Hi Paco,

David has just committed support for xinclude to the build.xml that's in svn. You could have give it a try using DocBook snapshots, but currently it's broken. A new release of DocBook-XSL (including webhelp) will come anytime now. So, you may use it for your production purposes when it's available.

http://docbook.svn.sourceforge.net/viewvc/docbook/trunk/xsl/webhelp/build.xml?r1=9110&r2=9170

You could also just use Docbkx since xinclude is supported there too.

DocBook 4.x and 5.x are both supported already. If you're using 5.x,
the -ns- version of the xsls are better, but the regular ones will
work too by stripping the namespace first.

This answer was compiled with the help of David Cramer.
ReplyDelete
Replies
PacoDecember 20, 2011 at 3:58 PM
Thank you, Kasun.

I have posted my problem in the Oxygen Forum (http://www.oxygenxml.com/forum/topic6528.html) and they noted that my book had no title. After including a title, everything went smooth.

By the way, I suppose that Oxygen is using your latest converter. Do you have any contact with them?

Thank you again for your great work.
ReplyDelete
Replies
AnonymousDecember 20, 2011 at 6:03 PM
Good to hear that your issue is now fixed. Hope your experience with webhelp will go smooth.

Oxygen is using the latest _released_ version of webhelp XSLs. That means the development on trunk is not there. I will write another blog post when the current trunk get released (with new snapshots!).

We are in contact with Oxygen where we worked with them for integrating this to OxygenXML. What do you have in mind?

Please share us your experience with webhelp as well when you are ready. It'll help us improve!
ReplyDelete
Replies
PacoDecember 21, 2011 at 2:39 PM
We are trying to write the help of an academic software. We have investigated several formats and it seems that webhelp is the more suitable by now.

I will let you know the project webpage as soon as it becomes available.
ReplyDelete
Replies
UnknownDecember 21, 2011 at 2:43 PM
That'll be great. Keep an eye on docbook-apps mailing list for the release of new version as well.
ReplyDelete
Replies
TonyJanuary 24, 2012 at 12:47 AM
I'm new to DocBook and WebHelp, and I would like to know if it is possible to use DocBook+WebHelp to easily set up a "context-sensitive" help system for a web application using your contribution to the project.
Thanks
ReplyDelete
Replies
AnonymousFebruary 1, 2012 at 2:46 AM
Perhaps you could clarify something for me -- looking at the webhelp docs on sourceforge, it appears that to replace the jqueryui theme and/or positioning.css that I'll need to copy webhelp.xsl and make the changes in that . Then I'll need to point to that new webhelp, as opposed to just setting a parameter somewhere. Is that correct?
ReplyDelete
Replies
johan162February 22, 2012 at 3:30 AM
I'm converting all the docs for the Phing project (http://www.phing.info/trac/) and I was wondering about one thing. We have a fairly long index and when you tap a title and the corresponding page is loaded in the right frame all the toc stuff in the left frame is scrolled to top. It seems that the left frame is reloaded and shown from the beginning.

Is there any chance of getting the left frame to stay put and not reload/scroll to top? It would make mush more sense when navigating in along document if you don't loose the position each time you click a toc entry.

Otherwise this is really great. In Phing we used to have a home-grown solution that does the left/right toc/content frame stuff (but without the search)

Cheers
Johan
ReplyDelete
Replies
UnknownMarch 16, 2012 at 8:11 PM
Hi,
I have an issue while compiling the indexer with Webhelp and I can't get rid off. The error is:

java.lang.NoClassDefFoundError: com/nexwave/nquindexer/IndexerMain
[java] Caused by: java.lang.ClassNotFoundException: com.nexwave.nquindexer.IndexerMain

I am using ant 1.8.2, jdk 1.6.0. In my Classpath I have: webhelpindexer.jar, lucene-analyzers-3.0.0.jar, lucene-core-3.0.0.jar, xercesImpl.jar, xml-apis.jar. And I am using the class com.nexwave.nquindexer.IndexerMain by using a java task for the implementation.

I would really appreciate if you can help me on this because I spent a hell of time without a result!

Riad
ReplyDelete
Replies
Samir DesaiNovember 22, 2013 at 4:26 PM
Do you plan to incorporate following additional features as part of current or future plan?

Allow phrase searching as well as word matches
Support use of wildcards
Support for case-sensitive and case-insensitive Search
Standard Boolean Search using AND / OR/ NOT
Support for related terms or synonyms
Support for auto-complete for search terms
Enable proximity search (terms located near each other; e.g., within 2 words, not just exact matches)
Provide fuzzy AND (for ranking)
ReplyDelete
Replies
Robert LauristonJanuary 7, 2014 at 12:21 AM
Has anyone implemented context-sensitive help with DocBook+WebHelp using a header / map file rather than hard-coded URLs? Or with some equivalent functionality that provides an intermediate step between the application and the help?

Note that a CSH map file is different from a DITA map file. Here's a good explanation of CSH map files:

http://www.webworks.com/Documentation/Reverb/index.html#page/03.Preparing%2520and%2520Publishing%2520Content/Preparing%2520DITA%2520Files.3.11.htm
ReplyDelete
Replies
UnknownJanuary 7, 2014 at 3:43 AM
I had this working for a while on Mac OS 10.8, then one day I got an error, "No more DTM IDs are available." There is a fix here:

http://habrahabr.ru/company/alawar/blog/193726/

Long story short, edit docbook-xsl/webhelp/build.xml and change

classpath="${xercesImpl.jar}"

to

classpath="${xslt-processor-classpath}"
ReplyDelete
Replies
Robert LauristonJanuary 8, 2014 at 8:06 AM
Can anyone point me to the part of webhelp/xsl/webhelp-common.xsl that controls chunking? I can't relate what I find in this doc to the code I see in that file:

http://www.sagehill.net/docbookxsl/Chunking.html

I want to modify the XSL so that it chunks at instead of .
ReplyDelete
Replies
UnknownMarch 14, 2014 at 2:49 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
UnknownApril 10, 2014 at 4:13 PM
Does profiling for webhelp works with the 1.76.1 style sheets? Is xhtml/profile-chunk.xml the right template to start with?
ReplyDelete
Replies
Redhuan D. OonJanuary 24, 2015 at 1:09 AM
It took me 20 hours to render an entire project database (xml generated by a code crawler) and the end result blows me away (Please see http://red1.org/idempiere) TMany awesome kudos to you and your team and i am willing to stand as goodwill reference and ambassador to your wonderful work. (Feel free to quote me or point me to where i can post further reviews.) High ten!
ReplyDelete
Replies

Add comment