[clug] Watercooler - was Open Source Developers in CBR

Brenton Ross rossb at fwi.net.au
Sun Jan 17 05:29:18 UTC 2021


An update to this thread we started a few months back.

For the last few months I have been working on another part of my grand
project. One of the components that it will probably require is a
"natural language generator" to turn some small fragments of an RDF
knowledge base into something a bit more English like.

I found a Java library that seems to be a good fit. It won't generate
Shakespeare but is probably good enough for what I have in mind. Of
course I had to have a look at its internals since it was immediately
obvious that it was going to need a few enhancements. Two things became
clear: First I was going to have to study it very closely to understand
how to extend it, and secondly the authors had created a Java program
in the style of a C program - completely ignoring things like virtual
functions.

Hence a C++ rewrite ensued. Rewriting would force me to get a much
better understanding of how it worked, and I could use a much more
conventional object oriented design while preserving the language
processing logic. After more time that I really wanted to spend I now
have it working so that it produces the same output as the Java version
for about 400 test cases. Those enhancements will come later, but
should be easy enough to add.

If you want to know more about what an NLG does do a web search for
SimpleNLG. My version doesn't do all the the Java one does, just the
core functionality.

My C++ RDF library seems to be working quite nicely. It has acquired a
couple of interesting enhancements - submodels and a catalogue. The
catalogue is a small RDF database of all the RDF databases used in a
project. It allows you to disconnect all the details of a database from
the programs that use it. Hence it no longer matters if the database is
in a file, on the web, or stored in a relational database.

The second enhancement is submodels. This allows knowledge bases to be
decomposed into smaller reusable components - something that makes RDF
a lot more useful. It was not a simple enhancement since submodels are
not supported by the underlying Redland C library.

Next task is to create the interface from RDF to the NLG. I am starting
by extending the test harness I created for testing NLG so that it
generates the RDF test data.

Links:
https://sourceforge.net/p/ocratato-sassy/nlg/code/ 

https://sourceforge.net/p/ocratato-sassy/rdfxx/code/

https://sourceforge.net/p/ocratato-sassy/rdfxx/code/

Brenton







More information about the linux mailing list