Discussion:
How to write UTF-8 encoded XML data to XML column?
(too old to reply)
Nigel Charman
2005-07-19 09:29:25 UTC
Permalink
I am having trouble writing XML data to an XML column using a
PreparedStatement, when the XML data contains the XML declaration
<?xml version="1.0" encoding="UTF-8"?>.

As an example, the following java code succeeds:

Statement stmt = con.createStatement();
stmt.execute("Insert into xmltab VALUES('<?xml version=\"1.0\"
encoding=\"UTF-8\"?><root><a/></root>')");

while this version fails:

PreparedStatement ps = con.prepareStatement("Insert into xmltab
VALUES(?)");
ps.setString(1, "<?xml version=\"1.0\"
encoding=\"UTF-8\"?><root><a/></root>");
ps.execute();

with the exception:
com.microsoft.sqlserver.jdbc.SQLServerException: XML parsing: line 1,
character 38, unable to switch the encoding


The PreparedStatement succeeds if either:
i) the encoding in the XML declaration is set to "UTF-16", or
ii) the encoding attribute in the XML declaration is omitted (in my
understanding, the XML document encoding should then default to UTF-8.
It seems strange that I am seeing different results between having no
encoding attribute and having the encoding set to UTF-8).


In the bigger picture, I am trying to read the XML data from file, or
other source, and would like to be able to pass the XML data as a stream
to the PreparedStatement. I am getting the same "unable to switch the
encoding" exception when using PreparedStatement.setCharacterStream()
method, and am getting the exception "Operand type clash: image is
incompatible with xml" when using PreparedStatement.setBinaryStream(
).

Can anyone shed any light on how I can write UTF-8 encoded XML data to
SQL Server when using a PreparedStatement?


I am using the SQL Server Express June CTP with default setup, and the
JDBC driver with the URL
"jdbc:sqlserver://serverName/trustedAuthentication=true".

Regards,
Nigel Charman
Nigel Charman
2005-07-22 12:14:44 UTC
Permalink
In the absence of any response, how can I file this as a potential fault
with the JDBC driver?

It appears that there may be 2 faults:

1) The exception "XML parsing: line 1, character 38, unable to switch
the encoding" when writing UTF-8 data to the database using prepared
statements.

2) XML documents with no BOM and no explicit encoding is being treated
differently than documents with an explicit encoding of UTF-8. (This is
inferred since data with an explicit encoding fails and data with no
encoding succeeds).
Michael Rhys covers a similar issue in his posting "Heads up on upcoming
change to passing UTF-16 encoded XML via SNCLI and ADO.Net" on
microsoft.private.sqlserver2005.xml.

Regards
Nigel
Post by Nigel Charman
I am having trouble writing XML data to an XML column using a
PreparedStatement, when the XML data contains the XML declaration
<?xml version="1.0" encoding="UTF-8"?>.
Oliver Geisser
2005-07-25 14:48:15 UTC
Permalink
Hi.
Post by Nigel Charman
I am having trouble writing XML data to an XML column using a
PreparedStatement, when the XML data contains the XML declaration
<?xml version="1.0" encoding="UTF-8"?>.
Statement stmt = con.createStatement();
stmt.execute("Insert into xmltab VALUES('<?xml version=\"1.0\"
encoding=\"UTF-8\"?><root><a/></root>')");
PreparedStatement ps = con.prepareStatement("Insert into xmltab
VALUES(?)");
ps.setString(1, "<?xml version=\"1.0\"
encoding=\"UTF-8\"?><root><a/></root>");
ps.execute();
com.microsoft.sqlserver.jdbc.SQLServerException: XML parsing: line 1,
character 38, unable to switch the encoding
[...]

I can't give you any help with your specific JDBC driver problem.
But maybe I can give you a hint regarding XML encoding.

IMHO you are mixing two layers - character encoding and strings.

If you have bytes (byte[] or InputStream) you will need to decode
the bytes to get text, e.g. a String.

If you have a String then there is nothing like an "encoding".
OK, under the hoods the JVM encodes every String in memory as UTF-16
but this is an implementation detail.

So something like:

String foo = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";

is a contradiction. You allready have String (= sequence of unicode
characters) and do not need an encoding any more.

The XML Specification defines the encoding as a help for XML parsers
which parses _binary_ data. There a a lot of XML parser APIs which allow
to parse Strings, but in most cases the behaviour is undefined or
simply broken.

So to be sure I would allways recommend to not parse XML Strings.

Back to the JDBC problem. It seems that the driver is parsing the string
value and it stumbles over the exact problem - a mismatch between the
XML declaration and the String (or the internal String encoding).

Have you tried to use a byte[] with the text UTF-8 encoded instead of
the String?

Ciao, Olli
Nigel Charman
2005-09-05 22:53:54 UTC
Permalink
Post by Oliver Geisser
Hi.
Thanks for your response. Sorry I've been off the air for a while.
Post by Oliver Geisser
Have you tried to use a byte[] with the text UTF-8 encoded instead of
the String?
This would be my preferred option, but the driver fails with the
exception "Operand type clash: image is incompatible with xml" when
using PreparedStatement.setBinaryStream().

Cheers
Nigel

Loading...