SAX parser – nasty behaviour

The other day an office colleague was looking for a strange error in his SAX-parsing class: Every now and then the data he got in his endElement() method was crippled, resulting in conversion problems.

A few searches revealed that the SAX-Api has a nasty behaviour: it does not garantee rules for buffer-handling. This is to be done by the client-application.

Instead of relying on complete data delivered to the characters method my colleague had to buffer the data hiomself. Using a bit of sample code it was an easy fix… but you first have to get the notion of this kind of reason for an otherwise seemingly unrelated problem…

This page shows the sample code that we recycled:

package some.pkg;
public void characters(char buf[], int offset, int len)
throws SAXException
  String s = new String(buf, offset, len);
  if (textBuffer == null) {
    textBuffer = new StringBuffer(s);
  } else {

The textBuffer can be reset in the startElement method.