Ruby JavaDoc Doclets with JRuby

While sitting in front of the fireplace, during long dark winter nights. You have probably wondered, as most minds have at some point in their existence, or even dreamed. Would humanity live to see the day? The day we come so close to ascension that we would be able to join the wonders of the JavaDoc API with the fun of Ruby.

Wait, what? No?

Well, in the Arquillian project we have some interesting ideas which leads to some interesting solutions. One of them is to automate documentation. One part of that automation is the "No duplication" rule.

One of the issues that has been outstanding for some time is to extract the documentation found in the Java Source code itself. This is especially interesting for the Configuration JavaBean objects for Containers and Extensions. While it would be easy enough to manually copy pasting the documentation from the source to the website, we would break the "No Duplication" rule. And let’s be honest, as this would not be the most desirable task for someone to do, the code and documentation would come out of sync faster then you can say; automation, automation, automation.

We need some way to automate it.

Parse the JavaDoc output

The obvious solution is, just configure Maven to run JavaDoc on the modules. Simple, they are published to Maven Central and we could fetch them from there. One problem is that the output of JavaDoc is HTML and not usable for what we want. And I don’t want to smoke whatever the person who created the html output did, because it’s close to impossible to parse anything sensible out of it.

And we want to avoid forcing build time plugins on our community builds. Instead we want to do some post processing of the repository data when we generate the website. This also allow us to update the data structure and parser as we please and simply rerun it on the old source.

Grab the Source

It felt like it would be a fairly simple task to just grab the java source, run it through a few regexp and be done with it. This is how the backend data for the Reference Dictionary was extracted. Now there is a reason why this is still in staging, and not pushed upstream. The seemingly simple solution ended up very fragile and became a unmaintainable ball of mud with more holes and edge cases then swiss cheese.

There seems to be some truth in this statement after all:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Ruby and a Lexer

At some point using pure Ruby with a Java Language Lexer on the source code came up as an option. But the Lexer for the Java Language I could find in Ruby only parsed the Java Statements and ignored the bits I was looking for, the comments. That day I didn’t feel like creating a Lexer myself, so the idea was scrapped.

Java and JavaDoc

Another alternative is to rely on the JavaDoc framework and create special Doclets for what we need. The advantage here is that you leave all parsing and file handling to the JavaDoc framework. We used this strategy to extract the backing report text for the Container TCK Report. While this work, it doesn’t come without some problems as well.

Our site generation tool, Awestruct, is based on Ruby. JavaDoc and Doclets are Java. While we could call out to the command line to execute the JavaDoc command and fetch the result we would also need to compile the Java Source and include it in the site generation somehow.. This seemed like too much hassle.


Enter the beautiful world of JRuby.

I couldn’t find a way to dynamically load Ruby classes in Java using the normal Class.forName which is used by JavaDoc to load Doclets, but we can always just go down a level and create our own JavaDoc Starter class.

require 'java'

module Java
  module Doc

    def self.parse(source_path, &block)
      context =

      options = context
      options.put '-sourcepath', source_path context, "javadoc"
      tool = context

      sub_packages = 'org', 'com'
      options_list =
      empty =
      filter =

      root = tool.getRootDocImpl('en', 'utf-8', filter, empty, options_list, false, sub_packages, empty, false, false, false) if block
      return root


API Usage

Call the method with the location of the source and an optional Ruby Block. From here on you can interact with the normal JavaDoc Doclet API, RootDoc.

require 'javadoc'

path = '/home/aslak/dev/source/testing/arquillian-tck/container/src/test/java/'

Java::Doc.parse path do |root|

  root.classes.each do |c|
    puts "= Class: #{}"
    if c.comment_text.length > 0
      puts "****"
      puts c.comment_text
      puts "****"

    c.fields.each do |f|
      puts "== Field: #{}"
      if f.comment_text.length > 0
        puts "****"
        puts f.comment_text
        puts "****"

    c.methods.each do |m|
      puts "== Method: #{}#{m.signature}"
      if m.comment_text.length > 0
        puts "****"
        puts m.comment_text
        puts "****"


The only downside is that you need to add the tools.jar to the CLASSPATH when executing the program. This is needed because the JavaDoc API’s are not part of the Runtime JVM.

CLASSPATH=$JAVA_HOME/lib/tools.jar jruby test.rb

Console Output

= Class: EchoServlet
Simple Servlet that echo the given @text@ request parameter
== Field: serialVersionUID
== Field: TEXT_PARAM
The Query parameter to echo
== Method: doGet(HttpServletRequest, HttpServletResponse)
Echo the given text.