Author |
Message
|
sudaltsov |
Posted: Mon May 31, 2021 7:17 am Post subject: Bad chars : reason to fail |
|
|
Voyager
Joined: 02 May 2012 Posts: 82
|
Is there any way to force failure on FileInput node if the characters are not ASCII?
I set charset to 367 on that node in the message flow, I have default CCSID 367 in the message set - but still the file with non-ascii chars got read without exception (and has 367 in the message properties!).
Anything else I can do to cause the failure on bad chars? |
|
Back to top |
|
 |
timber |
Posted: Wed Jun 02, 2021 3:23 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
A few questions:
- what do you mean by 'bad chars'?
- is your message flow a simple test flow which only tests this 'character validation' feature?
- what do you mean by 'message set'. Are you really using a message set? If yes, then which version of IIB are you using? |
|
Back to top |
|
 |
sudaltsov |
Posted: Wed Jun 02, 2021 3:31 pm Post subject: |
|
|
Voyager
Joined: 02 May 2012 Posts: 82
|
1. Bad chars = Non-ASCII chars. Anything that is not in 367
2. No, it is real flow used in production. It failed on non-ASCII chars late in processing - and I want it to detect that case as early as possible, preferably in FileInput node
3. Yes Message Set, IIB v10. Why does it matter? Not everything is DFDL yet.
Thank you |
|
Back to top |
|
 |
abhi_thri |
Posted: Wed Jun 02, 2021 9:36 pm Post subject: |
|
|
 Knight
Joined: 17 Jul 2017 Posts: 516 Location: UK
|
sudaltsov wrote: |
2. No, it is real flow used in production. It failed on non-ASCII chars late in processing - and I want it to detect that case as early as possible, preferably in FileInput node
|
hi...the fact that non-ASCII chars are ending up in the file suggests that the Source system is using a different CCSID in which case ideally you should use the Source system's CCSID in whichever input node used to receive the message. Then think of how best to convert it to the target system's CCSID, while this will work fine for some of the scenarios (eg:- Ascii -367 to UTF8-1208) the conversion will not work the other way around.
So you need to think of different approaches,
- get Source system to send the data in Ascii which may not be practical as they will have the same challenge at their end.
- let messages with non-Ascii chars fail like what is happening now with some manual process in place to correct/resend the data.
- As you are already using message sets rely on the MRM parser feature of replacing any unsupported char with a substituation one (eg:- x'1A')
- put some explicit translation logic in place via esql to convert chars from source to target code page (not the cleanest option as it may not cover all chars) |
|
Back to top |
|
 |
sudaltsov |
Posted: Thu Jun 03, 2021 12:40 am Post subject: |
|
|
Voyager
Joined: 02 May 2012 Posts: 82
|
abhi_thri wrote: |
- let messages with non-Ascii chars fail like what is happening now with some manual process in place to correct/resend the data.
|
Tha is exactly what we need. But that does not answer my question - how to make it fail early, preferably on FileInput node, with the help of MessageSet. It seems there is no way... |
|
Back to top |
|
 |
abhi_thri |
Posted: Thu Jun 03, 2021 2:25 am Post subject: |
|
|
 Knight
Joined: 17 Jul 2017 Posts: 516 Location: UK
|
sudaltsov wrote: |
Tha is exactly what we need. But that does not answer my question - how to make it fail early, preferably on FileInput node, with the help of MessageSet. It seems there is no way... |
hi...yes, don't think there is way to force a failure at the FileInput node itself but why does it matter if it fails at the point of parsing to the target message?
Source system -->Receive message in source ccsid-->Transform-->parse/convert to target cssid (fails here)--> Target system
You are not really saving any time by doing some extra processing to detect the issue upfront as processing shouldn't really take that long unless you are dealing with very large files. Also if you think of it, say if the input message had some non-Ascii chars in some fields which are not used for the target mapping wouldn't it be better to let that through, so i would leave it as-is and put some process in place to deal with the error. |
|
Back to top |
|
 |
sudaltsov |
Posted: Thu Jun 03, 2021 2:29 am Post subject: |
|
|
Voyager
Joined: 02 May 2012 Posts: 82
|
abhi_thri wrote: |
hi...yes, don't think there is way to force a failure at the FileInput node itself but |
That is what I think too. I just wanted to spare a bit of esql code but that is not so critical. |
|
Back to top |
|
 |
abhi_thri |
Posted: Thu Jun 03, 2021 2:42 am Post subject: |
|
|
 Knight
Joined: 17 Jul 2017 Posts: 516 Location: UK
|
sudaltsov wrote: |
That is what I think too. I just wanted to spare a bit of esql code but that is not so critical. |
hi...have you tried using Parsing-->Immediate/Complete at the FileInput with ccsid as 367 |
|
Back to top |
|
 |
sudaltsov |
Posted: Thu Jun 03, 2021 2:44 am Post subject: |
|
|
Voyager
Joined: 02 May 2012 Posts: 82
|
abhi_thri wrote: |
sudaltsov wrote: |
That is what I think too. I just wanted to spare a bit of esql code but that is not so critical. |
hi...have you tried using Parsing-->Immediate/Complete at the FileInput with ccsid as 367 |
Of course. That was my first idea. It parsed the entire message. And in debugger I see amazing - properties show 367 and the characters in the messages are non-ASCII (using bad file to test) |
|
Back to top |
|
 |
timber |
Posted: Thu Jun 03, 2021 7:16 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Just a thought...if I take this literally
Quote: |
Source system -->Receive message in source ccsid-->Transform-->parse/convert to target cssid (fails here)--> Target system |
...then no parsing happens until after your have done the Transform stage. What format is your input message, and what domain are you using in the FileInput node? |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Jun 06, 2021 1:12 pm Post subject: Re: Bad chars : reason to fail |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
sudaltsov wrote: |
Is there any way to force failure on FileInput node if the characters are not ASCII? |
No. The FileInput node just copies bytes from the file to assemble the source message. Some downstream component has to trigger a conversion from the source ccsid to a target ccsid. For example, the MRM parser could do this, when constructing character elements from the source message.
sudaltsov wrote: |
I set charset to 367 on that node in the message flow, I have default CCSID 367 in the message set - but still the file with non-ascii chars got read without exception (and has 367 in the message properties!). |
Please post one or more small excerpts from a Trace node, illustrating this.
The broker may rely on the underlying platform to convert data, which could either try to preserve round trip integrity, (from source to target and back), or irreversibly replace characters with similar characters, (for example, stripping diacritical marks from alphabetical characters), or substitute a generic character to indicate loss, such as a SUB control character, instead of throwing an exception. The message flow may not always have direct control over how strict or lax the platform is when converting data.
sudaltsov wrote: |
It failed on non-ASCII chars late in processing - and I want it to detect that case as early as possible, preferably in FileInput node |
So, it does fail then; just not soon enough? Please post the complete error message.
sudaltsov wrote: |
Anything else I can do to cause the failure on bad chars? |
One simple option is to parse the source message body as a BLOB, use the ESQL TRANSLATE() function to delete bytes with 7-bit values, and if any bytes remain, then generate an error. |
|
Back to top |
|
 |
sudaltsov |
Posted: Tue Jun 08, 2021 7:34 am Post subject: |
|
|
Voyager
Joined: 02 May 2012 Posts: 82
|
timber wrote: |
What format is your input message, and what domain are you using in the FileInput node? |
FileInput is using MRM domain, Text format |
|
Back to top |
|
 |
sudaltsov |
Posted: Tue Jun 08, 2021 7:38 am Post subject: Re: Bad chars : reason to fail |
|
|
Voyager
Joined: 02 May 2012 Posts: 82
|
rekarm01 wrote: |
So, it does fail then; just not soon enough? Please post the complete error message.
|
It fails later trying to insert that data into database that cannot handle those characters. I would like to have it earlier...
What actually works is this small ESQL:
Code: |
SET contents = ASBITSTREAM(InputRoot.MRM
CCSID InputRoot.Properties.CodedCharSetId
ENCODING InputRoot.Properties.Encoding
SET InputRoot.Properties.MessageSet
TYPE InputRoot.Properties.MessageType
FORMAT InputRoot.Properties.MessageFormat);
-- Validation - fails if not ASCII
DECLARE l_BodyAsChar CHAR CAST(contents AS CHAR CCSID InputRoot.Properties.CodedCharSetId);
|
That generates exception. |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue Jun 08, 2021 2:41 pm Post subject: Re: Bad chars : reason to fail |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
sudaltsov wrote: |
What actually works is this small ESQL:
Code: |
SET contents = ASBITSTREAM(InputRoot.MRM
CCSID InputRoot.Properties.CodedCharSetId
ENCODING InputRoot.Properties.Encoding
SET InputRoot.Properties.MessageSet
TYPE InputRoot.Properties.MessageType
FORMAT InputRoot.Properties.MessageFormat);
-- Validation - fails if not ASCII
DECLARE l_BodyAsChar CHAR CAST(contents AS CHAR CCSID InputRoot.Properties.CodedCharSetId); |
|
That's probably the easiest way to force a conversion exception, then. The parse timing shouldn't matter for ASBITSTREAM() though, since it reads the whole message anyway; on-demand parsing should work just as well.
The ASBITSTREAM() function also has an EmbeddedBitstream option, which can do the same thing, but more concisely:
Code: |
-- Validation - fails if not ASCII
DECLARE l_BodyAsChar CHAR
CAST(ASBITSTREAM(InputBody OPTIONS EmbeddedBitStream)
AS CHAR CCSID InputProperties.CodedCharSetId); |
|
|
Back to top |
|
 |
sudaltsov |
Posted: Wed Jun 09, 2021 7:47 am Post subject: Re: Bad chars : reason to fail |
|
|
Voyager
Joined: 02 May 2012 Posts: 82
|
rekarm01 wrote: |
That's probably the easiest way to force a conversion exception, then. The parse timing shouldn't matter for ASBITSTREAM() though, since it reads the whole message anyway; on-demand parsing should work just as well.
The ASBITSTREAM() function also has an EmbeddedBitstream option, which can do the same thing, but more concisely:
Code: |
-- Validation - fails if not ASCII
DECLARE l_BodyAsChar CHAR
CAST(ASBITSTREAM(InputBody OPTIONS EmbeddedBitStream)
AS CHAR CCSID InputProperties.CodedCharSetId); |
|
Thank you will try that too! |
|
Back to top |
|
 |
|