Ruby JavaDoc Doclets with JRuby

Parse the JavaDoc output

The obvious solution is, just configure Maven to run JavaDoc on the modules. Simple, they are published to Maven Central and we could fetch them from there. One problem is that the output of JavaDoc is HTML and not usable for what we want. And I don’t want to smoke whatever the person who created the html output did, because it’s close to impossible to parse anything sensible out of it.

And we want to avoid forcing build time plugins on our community builds. Instead we want to do some post processing of the repository data when we generate the Arquillian.org website. This also allow us to update the data structure and parser as we please and simply rerun it on the old source.

Grab the Source

It felt like it would be a fairly simple task to just grab the java source, run it through a few regexp and be done with it. This is how the backend data for the Reference Dictionary was extracted. Now there is a reason why this is still in staging, and not pushed upstream. The seemingly simple solution ended up very fragile and became a unmaintainable ball of mud with more holes and edge cases then swiss cheese.

There seems to be some truth in this statement after all:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Ruby and a Lexer

At some point using pure Ruby with a Java Language Lexer on the source code came up as an option. But the Lexer for the Java Language I could find in Ruby only parsed the Java Statements and ignored the bits I was looking for, the comments. That day I didn’t feel like creating a Lexer myself, so the idea was scrapped.

Java and JavaDoc

Another alternative is to rely on the JavaDoc framework and create special Doclets for what we need. The advantage here is that you leave all parsing and file handling to the JavaDoc framework. We used this strategy to extract the backing report text for the Container TCK Report. While this work, it doesn’t come without some problems as well.

Our site generation tool, Awestruct, is based on Ruby. JavaDoc and Doclets are Java. While we could call out to the command line to execute the JavaDoc command and fetch the result we would also need to compile the Java Source and include it in the site generation somehow.. This seemed like too much hassle.

RubyDoclets

Enter the beautiful world of JRuby.

I couldn’t find a way to dynamically load Ruby classes in Java using the normal Class.forName which is used by JavaDoc to load Doclets, but we can always just go down a level and create our own JavaDoc Starter class.

require 'java'

module Java
  module Doc

    def self.parse(source_path, &block)
      context = com.sun.tools.javac.util.Context.new

      options = com.sun.tools.javac.util.Options.instance context
      options.put '-sourcepath', source_path

      com.sun.tools.javadoc.Messager.preRegister context, "javadoc"
      tool = com.sun.tools.javadoc.JavadocTool.make0 context

      sub_packages = com.sun.tools.javac.util.List.of 'org', 'com'
      options_list = com.sun.tools.javac.util.List.nil
      empty = com.sun.tools.javac.util.List.nil
      filter = com.sun.tools.javadoc.ModifierFilter.new com.sun.tools.javadoc.ModifierFilter::ALL_ACCESS

      root = tool.getRootDocImpl('en', 'utf-8', filter, empty, options_list, false, sub_packages, empty, false, false, false)

      block.call(root) if block
      return root
    end

  end
end

API Usage

Call the method with the location of the source and an optional Ruby Block. From here on you can interact with the normal JavaDoc Doclet API, RootDoc.

require 'javadoc'

path = '/home/aslak/dev/source/testing/arquillian-tck/container/src/test/java/'

Java::Doc.parse path do |root|

  root.classes.each do |c|
    puts "= Class: #{c.name}"
    if c.comment_text.length > 0
      puts "****"
      puts c.comment_text
      puts "****"
    end

    c.fields.each do |f|
      puts "== Field: #{f.name}"
      if f.comment_text.length > 0
        puts "****"
        puts f.comment_text
        puts "****"
      end
    end

    c.methods.each do |m|
      puts "== Method: #{m.name}#{m.signature}"
      if m.comment_text.length > 0
        puts "****"
        puts m.comment_text
        puts "****"
      end
    end
  end
end

Execution

The only downside is that you need to add the tools.jar to the CLASSPATH when executing the program. This is needed because the JavaDoc API’s are not part of the Runtime JVM.

CLASSPATH=$JAVA_HOME/lib/tools.jar jruby test.rb

Console Output

= Class: EchoServlet
****
Simple Servlet that echo the given @text@ request parameter
****
== Field: serialVersionUID
== Field: TEXT_PARAM
****
The Query parameter to echo
****
== Method: doGet(HttpServletRequest, HttpServletResponse)
****
Echo the given text.
****