Author |
Message
|
prasadpav |
Posted: Wed Nov 17, 2010 5:26 pm Post subject: Joining multiple CDATA sections into one |
|
|
 Centurion
Joined: 03 Oct 2004 Posts: 142
|
Hi all,
I've a requirement to audit incoming messages and other key points within a series of message flows. While auditing, I store the input message (mostly XML messages) as character string within CDATA section of an audit message element. Ex:
Quote: |
<AuditMessage>
<MsgFromClient><![CDATA[<ClientMsg><Element1>Some data</Element1></ClientMsg>]]>
</AuditMessage> |
But sometimes, the client message has itself has CDATA sections within the message. So, I end up with nested CDATA sections, which violates XML standards. I overcome this by splitting nested CDATA sections into individual CDATA sections i.e. check for ']]>' and replace with ']]><![CDATA['. Ex:
If this is nested CDATA message:
Quote: |
<AuditMessage>
<MsgFromClient><![CDATA[<ClientMsg><Element1><![CDATA[Some data]]></Element1></ClientMsg>]]>
</AuditMessage> |
is converted to:
Quote: |
<AuditMessage>
<MsgFromClient><![CDATA[<ClientMsg><Element1><![CDATA[Some data]]><![CDATA[</Element1></ClientMsg>]]>
</AuditMessage>
|
A seperate message flow reads these audit messages and loads the data in database.
My questions are:
1) Is it possible to recreate the splitted CDATA sections into one? At the moment, if I use XMLNSC or XMLNS and read "InputBody.AuditMessage.MsgFromClient", then I get
Quote: |
<ClientMsg><Element1><![CDATA[Some data</Element1></ClientMsg> |
Which is not what I want. I want
Quote: |
<ClientMsg><Element1><![CDATA[Some data]]></Element1></ClientMsg> |
2) I've 2 work arounds which works but want to know if there is a more elegant way of dealing with this.
Quote: |
Work around #1 - When I split the CDATA section, I include comments which I replace later Ex: "<!-- SPLIT END -->]]><![CDATA[<!-- SPLIT END -->" |
Quote: |
Work around #2 - Use XMLNSC opaque parsing and specify "AuditMessage/MsgFromClient" as opaque element. But the value of "InputBody.AuditMessage.MsgFromClient" in this case returns as "<![CDATA[<ClientMsg><Element1><![CDATA[Some data]]><![CDATA[</Element1></ClientMsg>]]>". I then replace the outer CDATA with blanks and inner ]]>< |
mqjeff |
Posted: Wed Nov 17, 2010 6:40 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Uhm.
I'd just base64 encode the whole message and stick that in the audit message. |
|
Back to top |
|
 |
prasadpav |
Posted: Wed Nov 17, 2010 7:07 pm Post subject: |
|
|
 Centurion
Joined: 03 Oct 2004 Posts: 142
|
The requirement for us is to make the message human readable. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Nov 18, 2010 2:31 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Your general approach is correct, but I think you have implemented it incorrectly. The algorithm should be:
- create a CDATA section extending from the start of the message to the start of the first nested CDATA section
- add the first nested CDATA section
- create a CDATA section extending from the end of the first nested CDATA section to the start of the second nested CDATA section
- add the second nested CDATA section
...
- create a CDATA section extending from the end of the final nested CDATA section to the end of the input message
So the correct result for your example would be:
Quote: |
<AuditMessage>
<MsgFromClient><![CDATA[<ClientMsg><Element1]]><![CDATA[Some data]]><![CDATA[</Element1></ClientMsg>]]>
</AuditMessage> |
Quote: |
1) Is it possible to recreate the splitted CDATA sections into one? At the moment, if I use XMLNSC or XMLNS and read "InputBody.AuditMessage.MsgFromClient", then I get... |
If you correct your algorithm as suggested, you should get three CDATA sections under the AuditMessage element. If you ask for the value of InputRoot.XMLNSC.AuditMessage you should get the concatenated content of all three CDATA sections. |
|
Back to top |
|
 |
prasadpav |
Posted: Thu Nov 18, 2010 6:55 am Post subject: |
|
|
 Centurion
Joined: 03 Oct 2004 Posts: 142
|
Thanks for correcting the algorithm. I read online that the XML parsers look for ']]>' and it is sufficient to replace this with ']]><![CDATA['. But clearly it didn't worked in my case.
I tried your algorithm of replacing:
Quote: |
nested '<![CDATA[' with ']]><![CDATA[' |
and
Quote: |
the end tag of the above nested CDATA ']]>' with ']]><![CDATA[' |
After doing this, I get the result as:
Quote: |
<AuditMessage>
<MsgFromClient><![CDATA[<ClientMsg><Element1]]><![CDATA[Some data]]><![CDATA[</Element1></ClientMsg>]]>
</AuditMessage> |
However, after parsing the output message in XMLNS/XMLNSC domains, if I read the value of "InputBody.MsgFromClient.ClientMsg", I get concatenated string of all CDATA sections (as you have said). But the original CDATA sections (i.e. the nested CDATA's are not restored).
This is what I get:
Quote: |
<ClientMsg><Element1>Some data</Element1></ClientMsg> |
But this is what I'm expecting:
Quote: |
<ClientMsg><Element1><![CDATA[Some data]]></Element1></ClientMsg> |
|
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Nov 18, 2010 7:01 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
So in view of what you get, you can detach what is below element1, use ASBITSTREAM to create the BLOB. Cast it to CHAR and change the type of element1 to be a CDATA type, and finally assign your CHAR variable to the element1... right?
In other words when you assign the combined value of the CDATA sections did you first change the type of element1 to be a CDATA type?
 _________________ MQ & Broker admin |
|
Back to top |
|
 |
prasadpav |
Posted: Thu Nov 18, 2010 7:45 am Post subject: |
|
|
 Centurion
Joined: 03 Oct 2004 Posts: 142
|
Quote: |
In other words when you assign the combined value of the CDATA sections did you first change the type of element1 to be a CDATA type? |
No I did not change, quite frankly, it is not possible to change or assign CDATA type for "element1". Because by the time I assign the value, I lost the information about which element was originally encased within CData structure. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Nov 18, 2010 8:08 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
You really need to push back against the requirement that the message be human readable.
Anyone who is looking at these messages is going to be doing so through some kind of display program. This program can then easily incorporate presentation logic to make the contents understandable.
XML is not intended or designed to be human readable, it is intended to provide programs with a structured and programatically readable format for data. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Nov 18, 2010 8:30 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
However, after parsing the output message in XMLNS/XMLNSC domains, if I read the value of "InputBody.MsgFromClient.ClientMsg", I get concatenated string of all CDATA sections (as you have said). But the original CDATA sections (i.e. the nested CDATA's are not restored). |
Then do not simply 'read the value'. If the syntax element has child nodes of type 'PCDataValue' or 'CDataValue' then you should copy them over to the output, instead of just reading the value. |
|
Back to top |
|
 |
prasadpav |
Posted: Sat Nov 20, 2010 2:34 am Post subject: |
|
|
 Centurion
Joined: 03 Oct 2004 Posts: 142
|
Quote: |
Then do not simply 'read the value'. If the syntax element has child nodes of type 'PCDataValue' or 'CDataValue' then you should copy them over to the output, instead of just reading the value. |
I'm reading the value because I'm planning to store the value in database, so I've to read the value instead of tree copy.
Quote: |
You really need to push back against the requirement that the message be human readable.
Anyone who is looking at these messages is going to be doing so through some kind of display program. |
I'll keep this mind. At the moment, I'm having "Workaround #1" which is working fine, but if I end up in further problems, then I might go for a safe approach of storing the data as BLOB or base 64 encoded.
Thanks for all your suggestions.
Prasad |
|
Back to top |
|
 |
|