Author |
Message
|
ruturajw |
Posted: Tue May 29, 2012 10:30 pm Post subject: How to handle unescaped control characters |
|
|
Newbie
Joined: 05 Jan 2010 Posts: 8
|
Hi,
I'm dealing with data that has been migrated to a database from a legacy source. The data during migration wasn't scrubbed and now has control characters e.g. RS, SUB without < > around them.
I've a message flow which reads from this database and presents as XML output. The XML parser throws an exception when it encounters this data. The error is BIP5117 - Text = XMLHandler::error reported from the Xerces parser.196.Null pointer.1.1170.Invalid character (Unicode: 0x1A).
I've tried casting using CCSID 1208 (UTF- but no luck.
Would appreciate if you can offer any ideas.
Using WMB v7.0.0.3.
Cheers,
Ruturaj. |
|
Back to top |
|
 |
smdavies99 |
Posted: Tue May 29, 2012 10:46 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
so the data you are working with isn't clean?
Think of it this way,
You are on a raft heading down the Niagra river towards the falls. no matter how hard you paddle and how close you get to shore, there is always a new obstacle getting in your way and stopping you from reaching safety
RS & SUB ? It sounds like the legacy data was in Ascii ( possibly even 7bit )
and was inserted into the DB is some random CCSID.
You could read the data and treat it as a blob and work out what bad (non printable)characters are in it and remove them but unless you go through the whole DB you can never be sure that you have got everything. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kimbert |
Posted: Wed May 30, 2012 12:51 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I've a message flow which reads from this database and presents as XML output |
So what's wrong with the data that is in the database? It looks to me as if the data is fine - but your message flow is putting that data, including the illegal-for-XML characters, into an XML message. That's a problem that your message flow needs to solve.
I think you need to put some code into your message flow that checks each string before assigning it to OutputRoot.XMLNSC. Remove or replace any characters that are not legal for XML. |
|
Back to top |
|
 |
ruturajw |
Posted: Wed May 30, 2012 7:29 pm Post subject: |
|
|
Newbie
Joined: 05 Jan 2010 Posts: 8
|
kimbert wrote: |
Quote: |
I've a message flow which reads from this database and presents as XML output |
So what's wrong with the data that is in the database? It looks to me as if the data is fine - but your message flow is putting that data, including the illegal-for-XML characters, into an XML message. That's a problem that your message flow needs to solve.
I think you need to put some code into your message flow that checks each string before assigning it to OutputRoot.XMLNSC. Remove or replace any characters that are not legal for XML. |
And that's what I'm struggling with i.e. check for illegal for XML characters. My last recourse (I think) is to check each char if it lies in a-z, A-Z etc. range. If not, drop it. This is cumbersome and not sure if will work. |
|
Back to top |
|
 |
ruturajw |
Posted: Wed May 30, 2012 7:49 pm Post subject: |
|
|
Newbie
Joined: 05 Jan 2010 Posts: 8
|
smdavies99 wrote: |
You could read the data and treat it as a blob |
Hi, tried this and it failed too. Casting to blob throws an exception. |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed May 30, 2012 10:41 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
ruturajw wrote: |
smdavies99 wrote: |
You could read the data and treat it as a blob |
Hi, tried this and it failed too. Casting to blob throws an exception. |
What I meant was that you read the message as a BLOB. Work on that to remove the bad characters and then parse it into something that can be output as XML. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
|