Author |
Message
|
longng |
Posted: Mon Mar 04, 2013 8:24 pm Post subject: TDS Fixed Length message but it's not fixed length! |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
I've come across a rather perplex problem and may have to pursue it with a PMR. In the meantime I just want to know if someone has come across the same thing.
For simplicity, I have trivialized the actual issue but it may still be a bit complex for some readers! You've been warned!
Here's an XML input that has a single element containing text in German
Code: |
<?xml version="1.0" encoding="UTF-8"?>
<Data1Msg>
<Data1>
<E1>Software Maint. für GPFS</E1>
</Data1>
</Data1Msg>
|
Just in case, the contents of E1 element in hex (I purposedly insert a space between every four bytes for visual convenience)with a total length of 27 bytes.
Code: |
"536F6674 77617265 204D6169 6E742E20 66C383C2 BC722047 504653"
|
Someone knows German may be able to tell me the meaning of the above but I am not exactly particularly interested in the meaning right now.
Again, the above data has a count of 27 in length.
An input Data1InMsg and an output Data1OutMsg messages have been defined (WMBv7 with message set!) for the above data using the same data type of
Code: |
Data1Type
** sequence of TDS Data Element Separation=Fixed Length
Data1 (Local complexType) Render=XMLElement, XML Name = 'Data1'
{Local complexType} TDS Data Element Separation = Fixed Length
E1 Render=XMLElement, XML Name = 'E1', TDS Length = 40
|
To test the above definition, I have a flow that has
1. A MQInput with Input Message Parsing set to a format of XML1 of the above definition
2. A Compute node feeding off the above MQInput that basically sets the output to TDS format of the input XML (note the UTF-8 explicit setting to 1208)
Code: |
SET OutputRoot.Properties.MessageSet = 'DEG1BMK002001';
SET OutputRoot.Properties.MessageType = 'Data1OutMsg';
SET OutputRoot.Properties.MessageFormat = 'Text1';
SET OutputRoot.MQMD.CodedCharSetId = 1208;
SET OutputRoot.MRM.Data1.E1 = InputRoot.MRM.Data1.E1;
|
3. A MQOutput just for the sake of completeness!
The output of execution of the above flow and the above input
Code: |
00000000 Software Maint. 536F6674 77617265 204D6169 6E742E20
00000016 f├â┠¬â•.r GP 66E2949C C3A2E294 ACE2959D 72204750
00000032 FS 46532020 20202020 20202020 202020
|
If you're counting, then that's 47 bytes against the TDS Length of 40 as in the definition!
Using the same data definitions and the same flow but I substitute the input with 'pure' English characters (X's in this case) for every German character:
Code: |
<?xml version="1.0" encoding="UTF-8"?>
<Data1InMsg>
<Data1>
<E1>Software MaintXXXXXXXGPFS</E1>
</Data1>
</Data1InMsg> |
I would now get the expected output of
Code: |
00000000 Software MaintXX 536F6674 77617265 204D6169 6E745858
00000016 XXXXXGPF S 58585858 58475046 53202020 20202020
00000032 20202020 20202020 |
The above is exactly 40 in length as defined in the TDS definition!
Even with a trivial example above, I hope that you would agree with me that the contents of the fixed length message has been shifted... Imagine the above single field is a part of of a big message that has many fields. How about the fields defined after the above field, which has been shifted....?
Please share your input! |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue Mar 05, 2013 2:42 am Post subject: Re: TDS Fixed Length message but it's not fixed length! |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
longng wrote: |
Here's an XML input that has a single element containing text in German
Code: |
...<E1>Software Maint. für GPFS</E1>... |
|
This isn't really German. It's garbled text, most likely due to a bad input ccsid (windows-1252?). It should probably look more like:
Code: |
...<E1>Software Maint. für GPFS</E1>... |
If the input message is UTF-8, the input ccsid should be, too.
longng wrote: |
An input Data1InMsg and an output Data1OutMsg messages have been defined (WMBv7 with message set!) for the above data using the same data type of
Code: |
Data1Type
** sequence of TDS Data Element Separation=Fixed Length
Data1 (Local complexType) Render=XMLElement, XML Name = 'Data1'
{Local complexType} TDS Data Element Separation = Fixed Length
E1 Render=XMLElement, XML Name = 'E1', TDS Length = 40 |
|
TDS Length = 40 what? bytes? characters? "Fixed Length" depends on how the message set defines the element's 'Length Units'.
longng wrote: |
The output of execution of the above flow and the above input
Code: |
00000000 Software Maint. 536F6674 77617265 204D6169 6E742E20
00000016 f├â┠¬â•.r GP 66E2949C C3A2E294 ACE2959D 72204750
00000032 FS 46532020 20202020 20202020 202020
|
|
The 'ü' just keeps growing ... this looks like another bad input ccsid (msdos-437?) |
|
Back to top |
|
 |
kimbert |
Posted: Tue Mar 05, 2013 2:45 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Your first big mistake is to use MRM XML in a new message flow. I would be interested to know why you have done this.
You have provided lots of useful info, but we need more ( as always). What is the CCSID used by the input message. What is the CCSID used by the output message? What is 'Length Units' set to in the TDS message definition for this field? |
|
Back to top |
|
 |
longng |
Posted: Tue Mar 05, 2013 5:53 am Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
@rekarm01 & @kimbert: It's to do with me being sloppy and the combination of using RFHUtil and Windows' cut & paste that messes up the actual contents of the field. At least I provide hex values!
Anyway, after posting the original query, I tinkered around further with the message definition and was able to get the expected output of 40 BYTES!
Code: |
00000000 Software Maint. 536F6674 77617265 204D6169 6E742E20
00000016 für GPF S 66C3BC72 20475046 53202020 20202020
00000032 20202020 20202020
|
The above expected output has been achieved by having the field's Length Units set to Bytes instead of Characters. If this is how things work then I am reluctantly OK with the changes that need be done. On the other hand, our legacy message sets have hundreds (if not thousands) of fields that need to be changed as to accommodate languages other than just English and German! @kimbert: I may have indrectly answered your question about the rationale of using MRM!
Technically, I believe that the parser should either truncate the data as to conform to the definition or to throw an exception indicating the length has been exceeded or do both. It's a fixed length setting after all. As thing stands currently, it's not acceptable for the parser just simply grows the field beyond its setting and shift everything else out. I will initiate a PMR.
In defensive programming, it sounds like we should use the Length Units in Bytes regardless of the data being in Binary or Characters. Care to comment?  |
|
Back to top |
|
 |
kimbert |
Posted: Tue Mar 05, 2013 6:00 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Technically, I believe that the parser should either truncate the data as to conform to the definition or to throw an exception indicating the length has been exceeded or do both. It's a fixed length setting after all. As thing stands currently, it's not acceptable for the parser just simply grows the field beyond its setting and shift everything else out. I will initiate a PMR. |
I suggest that you put the PMR on hold. The MRM parser is doing *exactly* what you have asked. The output is 40 *characters* in length. The problem is that characters do not always occupy a fixed number of bytes. Your COBOL application was probably originally designed for single-byte EBCDIC characters, in which case the distinction between characters and bytes would not matter. It is now being expected to handle UTF-8 data, and it is breaking. This is not IBM's problem - it is a problem that crops up continually all over the world when programmers fail to take into account the facts explained here: http://www.joelonsoftware.com/articles/Unicode.html |
|
Back to top |
|
 |
kimbert |
Posted: Tue Mar 05, 2013 6:01 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I note that you still have not given any good reason why you are using the MRM parser to output XML. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Mar 05, 2013 6:05 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
longng wrote: |
@kimbert: I may have indrectly answered your question about the rationale of using MRM! |
Not to me. So you've got a lot of legacy message sets; that's nice. What's that got to do with not using XMLNSC? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
longng |
Posted: Tue Mar 05, 2013 7:13 am Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
Vitor wrote: |
longng wrote: |
@kimbert: I may have indrectly answered your question about the rationale of using MRM! |
Not to me. So you've got a lot of legacy message sets; that's nice. What's that got to do with not using XMLNSC? |
I hear you, I hear you! Who am I to argue ? Sometime in the past (before my time), it was decided to have a common message flow to serve as a single entry point to a portfolio of hundreds of downstream flows. And yes, the development effort started back in V6. Apart from doing other things, the common flow also set up RFH2 and explicitly set the domain to 'mrm'... Is there a magic wand that I can wave as to make wholesale changes to all the downstream flows?  |
|
Back to top |
|
 |
Vitor |
Posted: Tue Mar 05, 2013 7:36 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
longng wrote: |
Is there a magic wand that I can wave as to make wholesale changes to all the downstream flows?  |
Yes - it's called "money".
All these flows should have been identified & migrated into a more modern model as you moved version. This is the cheap & easy way of doing it. The longer you stay with this outmoded model the more problems you're going to hit going forwards and the more money you'll need to spend in a fire-fighting timeframe.
Mention this to your budget holder. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
longng |
Posted: Tue Mar 05, 2013 7:37 am Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
kimbert wrote: |
Quote: |
Technically, I believe that the parser should either truncate the data as to conform to the definition or to throw an exception indicating the length has been exceeded or do both. It's a fixed length setting after all. As thing stands currently, it's not acceptable for the parser just simply grows the field beyond its setting and shift everything else out. I will initiate a PMR. |
I suggest that you put the PMR on hold. The MRM parser is doing *exactly* what you have asked. The output is 40 *characters* in length. The problem is that characters do not always occupy a fixed number of bytes. Your COBOL application was probably originally designed for single-byte EBCDIC characters, in which case the distinction between characters and bytes would not matter. It is now being expected to handle UTF-8 data, and it is breaking. This is not IBM's problem - it is a problem that crops up continually all over the world when programmers fail to take into account the facts explained here: http://www.joelonsoftware.com/articles/Unicode.html |
Thanks Kimbert, for your input and perspective. I don't intend to defend something being been done in the past, but I still maintain that the parser should observe the definition as opposed to 'silently' expanding a fixed length field. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Mar 05, 2013 7:44 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I still maintain that the parser should observe the definition as opposed to 'silently' expanding a fixed length field. |
Two points here:
1. It *is* observing the definition. It is outputting a fixed number of characters. There is no defect here, and this behaviour is actually required by some users.
2. The TDS parser could have tried harder to explain what it was doing, and why. This is one reason why DFDL is a better choice going forward - DFDL is pretty good at explaining its actions. |
|
Back to top |
|
 |
longng |
Posted: Tue Mar 05, 2013 7:51 am Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
Vitor wrote: |
longng wrote: |
Is there a magic wand that I can wave as to make wholesale changes to all the downstream flows?  |
Yes - it's called "money".
All these flows should have been identified & migrated into a more modern model as you moved version. This is the cheap & easy way of doing it. The longer you stay with this outmoded model the more problems you're going to hit going forwards and the more money you'll need to spend in a fire-fighting timeframe.
Mention this to your budget holder. |
We are thinking the same way! |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Mar 05, 2013 8:01 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Put in a mediator flow that deletes the RFH. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Mar 05, 2013 8:02 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Just to avoid any misunderstandings...
- just because you upgrade, that does not mean that you need to rewrite all your flows to use XMLNSC. In fact I would advise against it unless there is a pressing need to exploit the improved performance/standards compliance of XMLNSC.
- migration to XMLNSC can be expensive and difficult in some cases
but...
- writing new flows that use MRM XML is not good practice. It sounded as if the OP was doing that.
- if you are using MRM XML for writing XML then it should be pretty simple to switch to using XMLNSC instead. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Mar 05, 2013 8:42 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
kimbert wrote: |
writing new flows that use MRM XML is not good practice. It sounded as if the OP was doing that. |
But if you are changing an existing flow that's a great time to think about embracing new technologies. For instance, if you have a flow reading a file using MRM you might still want to consider changing to DFDL.
kimbert wrote: |
- if you are using MRM XML for writing XML then it should be pretty simple to switch to using XMLNSC instead. |
 _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|