Author |
Message
|
EricCox |
Posted: Thu May 03, 2012 8:15 am Post subject: Finding Low Values in Invalid XML |
|
|
Master
Joined: 08 Apr 2011 Posts: 292
|
To all,
I'm finding low values in some XML which I believe may be preventing the successful INSERT'ion into an oracle XMLTYPE column via SPROC.
Here is what the original developer has coded:
Code: |
DECLARE bBody BLOB ASBITSTREAM (InputRoot.XML CCSID
InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);
--Getting the Message body and Convert in to Character Format
SET MESSAGE_CONTENT_TXT = CAST (bBody AS CHARACTER CCSID InputRoot.Properties.CodedCharSetId
ENCODING InputRoot.Properties.Encoding);
SET Environment.Variable.OriginalMessage = MESSAGE_CONTENT_TXT; --v1.1 --Overwrite
CALL CALLDB(SERVICE_TYPE_DESC,CREATE_BY_NM,CREATE_BY_TS,LAST_UPDATE_BY_NM,LAST_UPDATE_BY_TS,
MESSAGE_TYPE_DESC,REQUEST_UU_ID,EMPLOYEE_NUM,SEQUENCE_NUM,REVERSAL_SEQUENCE_NUM,
TRANSACTION_TS,DEBIT_ACCOUNT_ID,CREDIT_ACCOUNT_ID,MESSAGE_CONTENT_TXT); |
I'm wondering what the effect of the first two statements will be, if that is correct and/or if it could be producing the .. = 46 46 = 2E 2E in hex which is a numeric value in hex and may be preventing the INSERT.
What do you guys think about the way he coded the first two statements to prepare the data to be inserted?
Could this be the culprit?
Thanks,
EMC |
|
Back to top |
|
 |
kimbert |
Posted: Thu May 03, 2012 10:22 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
The effect of the first two statements is
1. Get hold of the original bitstream as a BLOB. Assuming that the message tree has not yet been changed by the flow, this will simply return the bytes that arrived on the input queue.
2. Convert the bytes to a stream of characters.
The resulting stream of characters should be identical to the original XML document, so I don't think this will cause any problems. My guess is that the incoming XML really does contain these characters.
On an unrelated subject, but still quite important:
If you are modifying this message flow then you should consider switching to a non-deprecated XML domain. XMLNSC is the recommended choice, but if you're moving from the old XML domain then XMLNS is a valid choice, and somewhat simpler for migration. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu May 03, 2012 10:24 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
kimbert wrote: |
On an unrelated subject, but still quite important:
If you are modifying this message flow then you should consider switching to a non-deprecated XML domain. XMLNSC is the recommended choice |
And if you switch to XMLNSC and enable validation on the input, you will immediately know if the incoming message has bad characters, as they won't be processed and instead backed out. |
|
Back to top |
|
 |
kimbert |
Posted: Thu May 03, 2012 12:33 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
And if you switch to XMLNSC and enable validation on the input, you will immediately know if the incoming message has bad characters |
That's true. But it's not dependent on validation. If the characters are not legal for XML 1.0 then the document is not well-formed. XMLNS would reject it, and XMLNSC would reject it even if validation was not enabled. |
|
Back to top |
|
 |
EricCox |
Posted: Fri May 04, 2012 5:45 am Post subject: |
|
|
Master
Joined: 08 Apr 2011 Posts: 292
|
The original XML transaction and Msg Flow works. If that .. occured in that main flow the transaction would fail with invalid xml. But it doesn't it works fine. That Msg Flow puts the original request to a queue. That logging queue then has a seperate flow that picks up the msg and INSERTs it the message to a logging table via a Stored Proc. When that logging flow tries to put occasionally we are getting the .. in the XML. |
|
Back to top |
|
 |
mqjeff |
Posted: Fri May 04, 2012 5:49 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
So you might be running into issues where the XML domain, rather than the XMLNSC domain, is improperly keeping line endings in the data.
So, again, if you switch to properly using the XMLNSC domain in all your flows, you may find that this error goes away. |
|
Back to top |
|
 |
EricCox |
Posted: Fri May 04, 2012 6:06 am Post subject: |
|
|
Master
Joined: 08 Apr 2011 Posts: 292
|
Out of 218 messages waiting to be logged 11 have the spurious .. and don't get logged to the Oracle XMLType column. |
|
Back to top |
|
 |
mqjeff |
Posted: Fri May 04, 2012 6:48 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Yeah, so look at the messages on the queue using amqsbcg and see if there are bad characters in the message there.
More accurately, see if you can identify what the bad characters are. |
|
Back to top |
|
 |
EricCox |
Posted: Fri May 04, 2012 7:20 am Post subject: |
|
|
Master
Joined: 08 Apr 2011 Posts: 292
|
The bad characters show as two ..
When I look at it in a hex editor I see 2E 2E which is hex for 46 46 and is visible as ..
This is not a valid value in XML. XML can only hold data values of string/character. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sat May 05, 2012 12:51 pm Post subject: Re: Finding Low Values in Invalid XML |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
EricCox wrote: |
I'm finding low values in some XML which I believe may be preventing the successful INSERT'ion into an oracle XMLTYPE column via SPROC. |
What does "low values" mean? For which version of WMB? Not all versions support the Oracle XMLTYPE datatype. Or does the stored procedure handle the conversion to XMLTYPE?
EricCox wrote: |
Code: |
--Getting the Message body and Convert in to Character Format
SET MESSAGE_CONTENT_TXT = CAST (bBody AS CHARACTER CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding); |
|
A usertrace would display the contents of MESSAGE_CONTENT_TXT. Does it have the ".."?
EricCox wrote: |
When I look at it in a hex editor I see 2E 2E which is hex for 46 46 and is visible as ..
This is not a valid value in XML. XML can only hold data values of string/character. |
"." is a character, and ".." is a string. It is a legal XML character. |
|
Back to top |
|
 |
mqsiuser |
Posted: Mon May 07, 2012 12:53 am Post subject: Re: Finding Low Values in Invalid XML |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
You should really switch from "XML" domain to "XMLNSC"... there are strange things happening on the "XML" domain (where we used to have non intuitive workarounds for).
rekarm01 wrote: |
What does "low values" mean? |
You mean low "ASCII" values, so 46 is "low" within the range from 0 to 255 !?... then the 'problematic' "low" control characters are between 0 and 31.
rekarm01 wrote: |
EricCox wrote: |
Code: |
--Getting the Message body and Convert in to Character Format
SET MESSAGE_CONTENT_TXT = CAST (bBody AS CHARACTER CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding); |
|
A usertrace would display the contents of MESSAGE_CONTENT_TXT. Does it have the ".."? |
"." and ".." are suspicious in the context of codepage problems. If e.g. RFH-Util can't display a byte (as ASCII... since it would always just display ASCII, independent of CCSID and encoding) it displays a dot "." (thats for bytes between 0 and 31).
rekarm01 wrote: |
EricCox wrote: |
When I look at it in a hex editor I see 2E 2E which is hex for 46 46 and is visible as ..
This is not a valid value in XML. XML can only hold data values of string/character. |
"." is a character, and ".." is a string. It is a legal XML character. |
I guess somehow the original (unknown by a/the codepage converter) low (or control, e.g. LF (10), CR (13), ETX (3), but also DEL(127)) characters where replaced by "." and/or high (e.g. ä ö ü in UTF-8) by ".." and they actually managed to make it into the message (replacing the original character(s)).
Probably its the (CR and) LF which occur in between the tags... then the solution is: Put your xml into a single line :-)
So the OP's problem (as mqjeff points to) might be that the XML looks like
<xml>
<tag1>data</tag1>
</xml>
And the improperly "converted":
<xml>..<tag1>data</tag1>..</xml>
So "." and ".." are valid strings, but not (necessarily) if they have been created by replacing "CR" and "LF" with "." ("..") in between the xml-tags.
There certainly/should also be/are also other approaches/solutions: E.g. (probably) write a "replace"-function on CHAR-level (not on xml-level) to fix the xml: e.g. replace ">..<" with "><". _________________ Just use REFERENCEs |
|
Back to top |
|
 |
EricCox |
Posted: Mon May 07, 2012 5:31 am Post subject: Need to confirm Oracle Error on INSERT |
|
|
Master
Joined: 08 Apr 2011 Posts: 292
|
Oracle is reporting a PK Violation when trying to INSERT from the Stored Proc. Today I am going to confirm with the DBA through a Trace that these characters are the culprit of the messages not being INSERT'ed.
We'll go from there once we confirm the true nature of the root cause the SPROC can't INSERT the row.
I agree with all you guys are saying. My thoughts also go to developing a routine to replace/remove these characters.
Thanks for all your thoughts. |
|
Back to top |
|
 |
EricCox |
Posted: Mon May 07, 2012 5:57 am Post subject: Seeing the dots |
|
|
Master
Joined: 08 Apr 2011 Posts: 292
|
I was copying an error for someone and thought I'd show you the way I see it:
<?xml version="1.0"?>..<abc-env:Envelope xmlns="http:// |
|
Back to top |
|
 |
mqjeff |
Posted: Mon May 07, 2012 6:00 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
And that looks like someone built an xml document like
Code: |
<?xml version="1.0"?>
<abc-env:Envelope xmlns="http:// |
Notice the line feed?
And then whatever has processed or shown you this document has converted the <CR> and <LF> to "..".
And I'm also not sure that it's legal to have mixed content outside the root of the document. |
|
Back to top |
|
 |
mqsiuser |
Posted: Mon May 07, 2012 6:11 am Post subject: |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
mqjeff wrote: |
And I'm also not sure that it's legal to have mixed content outside the root of the document. |
Its not allowed: "Content is not allowed in prolog."
"<xml>..<tag1>data</tag1>..</xml>" is actually valid ("mixed content")
But: Both are probably not with intention. _________________ Just use REFERENCEs |
|
Back to top |
|
 |
|