|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
DFDL - possible to end sequence by next elements initiator? |
« View previous topic :: View next topic » |
Author |
Message
|
Svipdag |
Posted: Tue Aug 12, 2014 6:33 am Post subject: DFDL - possible to end sequence by next elements initiator? |
|
|
Newbie
Joined: 14 Jul 2014 Posts: 6
|
Hi,
Because of a escape character dfdl serialisation bug I think I have found, I try to find a temporary workaround. For that, I need to move a newline that was the terminator of a commaseparated sequence of strings with maxOccurs=unbounded and occursCountKind=implicit, to the initiator of the next element. But I get parsing problems when I do that, it seems that in an unbounded list, the parser only looks for the terminators in scope, but not for the following element's initiator. For example in the T_ACCONT_V7 below, the element RFF is followed by FII.
Code: |
<xsd:complexType name="T_ACCONT_V7">
<xsd:sequence dfdl:initiator="%NL;ACCONT" dfdl:terminator="%NL;ACCONT%WSP;END" maxOccurs="1" minOccurs="1">
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="2" minOccurs="0" ref="RFF"/>
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="3" minOccurs="1" ref="FII"/>
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="2" minOccurs="1" name="ACCNAD" type="T_ACCNAD"/>
<xsd:element dfdl:occursCountKind="implicit" minOccurs="0" ref="DTM"/>
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="unbounded" minOccurs="0" ref="FTX"/>
<xsd:element dfdl:occursCountKind="implicit" minOccurs="0" name="CONDIT" type="T_CONDIT"/>
</xsd:sequence>
</xsd:complexType>
|
And I need to move the termination newline from RFF to the initiator of FII, like this (you can see that the terminator is empty, it wasn't that originally):
Code: |
<xsd:element dfdl:initiator="%NL;RFF%NL;" dfdl:terminator="" name="RFF" nillable="false" type="T_Segment_Unbounded_List"/>
<xsd:element dfdl:initiator="%NL;FII%NL;" dfdl:terminator="" name="FII" nillable="false" type="T_Segment_Unbounded_List"/> |
The T_Segment_Unbounded_List looks like this:
Code: |
<xsd:complexType name="T_Segment_Unbounded_List">
<xsd:sequence dfdl:separator="," dfdl:separatorPosition="infix" dfdl:separatorSuppressionPolicy="trailingEmpty" maxOccurs="1" minOccurs="1">
<xsd:element default="%ES;" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="no" ibmDfdlExtn:sampleValue="" maxOccurs="unbounded" minOccurs="1" name="DATA" nillable="true" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType> |
I did this, assuming that if the parser finds the next initiator in the data stream, it will understand that the commaseparated list of strings is ended.
But it seems that it don't. If I test this model with this datastream (extract):
ACCONT
RFF
AGK,SHADDA
FII
OR,17345516,,CNY,ESSECNS0XXX,17,,,,,,,C
...it includes all in the datastream for SHADDA continuing to the next comma character, in other words it doesn't seem to use the initiators in scope to possibly end the sequence of DATA strings.
info: Offset: 224. Found delimited value: 'SHADDA%LF;FII%LF;OR' for element 'DATA'. The delimiter was ','.
[dfdl = /SEB_EDI_Inhouse/EIF_simplified.xsd, scd = #xscd(/type::T_Segment_Unbounded_List/model::sequence/schemaElement::DATA), 109]
..and after that, it does of course get parsing errors as it can't no longer find the initiator of the mandatory FII element, as it is eaten from the data stream.
Now, ist it possible to do what I want, or doesn't dfdl support this way of doing things? |
|
Back to top |
|
 |
smdavies99 |
Posted: Tue Aug 12, 2014 7:30 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
What version of Broker are you using?
Recent Fixpacks have included a number of DFDL fixes.
Even with 9.0.0.2, the Tooklit has an interim fix (released a few days ago) and at least one fix in that release is related to DFDL Serialization. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Aug 12, 2014 8:35 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Because of a escape character dfdl serialisation bug I think I have found, I try to find a temporary workaround |
You have piqued my curiosity. What is the bug?
Quote: |
it seems that in an unbounded list, the parser only looks for the terminators in scope, but not for the following element's initiator |
That's correct - this is how DFDL is designed to work. In simple cases it would be quite useful to terminate an element based on the initiator of the next element. But it would not work so well when the 'next' element or group could be any of 50 choice branches. Or when all of the next 10 things in the model are optional, with initiators. Or when 8 of them have initiators and two do not.
On balance, I think DFDL got this one right. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
Svipdag |
Posted: Wed Aug 13, 2014 1:01 am Post subject: |
|
|
Newbie
Joined: 14 Jul 2014 Posts: 6
|
Im using 9.0.0.1 still, as the clearcase-plugin in the eclipse toolkit stopped working when I tried the FP2 (for the runtime).
Kimbert, that was exacly the kind of clear answer I wanted. I suspected there could be a good reason for not supporting this initiator-terminating strategy. This means I can stop thinking about this way of solving my problem.
Ok, I will explain my escape scema problem. Brace yourself...(hopefully, you will say: that is not a bug, it's the settings thats wrong. But I my hopes are quite low)
My EDI inhouse model use ; as escape character, and this is meant to be inserted during serialization in the data strings in my data segments (the unbounded list of strings that is separated by comma (,)) if they contain, yes exacly, a comma character. This is all a quite standard scenario, I suspect.
My dfdl escape schema is defined as follows:
Code: |
<dfdl:defineEscapeScheme name="EIFEscapeScheme">
<dfdl:escapeScheme escapeBlockEnd=""" escapeBlockStart=""" escapeCharacter=";" escapeEscapeCharacter=";" escapeKind="escapeCharacter" extraEscapedCharacters="%#x0D; %#x0A;" generateEscapeBlock="whenNeeded"/>
</dfdl:defineEscapeScheme>
|
As you can see, it also contains some more details. One interesting setting that I will return to later is the eskapeKind setting. This one is naturally set to escapeCharacter, not to escape block, suiting this scenario.
To understand the bug symtoms, you need to have some eunderstanding of the message protocol. Below is an example message (actually, this message contains the extra character bug, but we leave that for later):
Code: |
ZIFIHD
ZSD
044046059
ZII
V3,HELXDDIS,STOIID01,,20140811,204358,20140811,184358,HELXDDIS20140811204358651383,,,0,0,INTRTXATX
ZIFIHD END
ZIFMHD
UNH
TC201408112043586513830001,ACCINF,V7,INTRTXATX
ZIFMHD END
ACCONT
RFF
AAGK,HELDDAH
FII
OR,33010001245679,FLICKFLAFCK AFB. SE,EUR,ESSEFIHXXXX,17,,,,,,N,C
ACCNAD
NAD
AAAAO,5560646167,CIF,SWEEPKUND;, FLICKFLASSCK ASSB ÅÄÖ%&@?!,FLICKFLAFFCK AFFB. SE,SWEEPKUND;, FLICKFLASSCK ASSB,RÖR EJ...,GYM
NAGGSTIKVÄGEN 1,106 40 STOCKHOLM
ACCNAD END
CONDIT
DTM
194,20051208
INBAND
ZIN
ZRU,0.0000,,E
MOA
180,0.000,EUR
INBAND END
INBAND
ZIN
ZRIZ,0.0000,,E
MOA
180,0.000,EUR
INBAND END
LIMITS
RFF
ZLZ1,N/AN
MOA
ZLZAZ,500000.000,EUR
LIMITS END
LIMITS
RFF
ZLZ2,N/AN
MOA
ZLZAZ,500000.000,EUR
LIMITS END
CONDIT END
ACCONT END
ZIFMTR
UNT
23,TC201408112043586513830001
ZIFMTR END
ZIFITR
UNZ
1,HELXDDIS20140811204358651383
ZIFITR END
|
This protocol contains groups, that is initiated by a six letter word (as for example ZIFIHD) and ended by the same word + END (as for example ZIFIHD END). They need to be on a separate line, both of them.
The groups can contain other groups, and segments. A segment is initiated by a three letter word (lfor example ZSD), and then on a new line a list of comma separated strings is the data items of the segment. You have already seen examples of this earlier in this topic.
The flow in question maps from a different format to this EDI inhouse format. My dfdl message model is used when the flow is serializing the logical model to the output MQ queue.
So, let me describe the actual problem. Let us regard some data strings that is mapped into the target EDI message (how they look before they are modified accoring to the assumed bug and how the look in the result found above (I'll let you find the different FLICKFLACK strings yourself, as they can be found on several places with different characters added):
AGK -> AAGK
HELDDA -> HELDDAH
FLICKFLACK AB. SE
AO -> AAAAO
SWEEPKUND, FLICKFLACK AB ÅÄÖ%&@?!
SWEEPKUND, FLICKFLACK AB
ZRI -> ZRIZ
ZL1 -> ZLZ1
ZL2 -> ZLZ2
N/A -> N/AN
ZLA -> ZLZAZ
It took some time for me to understand the pattern. As you can see, only data strings are affected, not initiators and terminators. For these data strings affected, the extra letter(s) added are always the starting letter of the data string. You can see that an 'H' is added in HELDDA and a 'Z' is added in ZL1. What took me even longer to understand was where and why these letters are added. Here it comes:
One extra character (the first data string character) is added in a data string if the data string contains any character that is the starting character of any terminator in scope. The extra letter is always added after the starting character of the terminator(s) in the data string.
For example AGK. The only group terminator in scope is (the segments are ended by newline, we'll disregard these in this discussion) 'ACCONT END'. It begins with 'A' and when A is found in AGK, an extra A (as AGK starts with A) is inserted.
Even better example: ZLA. The terninators in scope are 'LIMITS END', CONDIT END' and 'ACCONT END'. For the 'L' in ZLA an Z is added because of 'LIMITS END', and for the 'A' a Z is added becaose of 'ACCONT END', resulting in ZLZAZ.
Now, isn't this a very interesting bug?
And this behaviour is removed if i set the excapeScheme to "", making the serializer to not use any escaping at all. The strings are unchanged in debug mode of couse, that is before serialization. And if I serialize it to XML, no extra characters are added. This is a DFDL escapeSceme problem.
My interpretation is that the following statement about the escapeKind setting in the documentation has a, well, little faulty implementation in the IIB9 dfdl parser/serializer.
Quote: |
When ‘escapeCharacter’: On unparsing a single character of the data is escaped by adding an dfdl:escapeCharacter before it. The following are escaped if they are in the data
- Any in-scope terminating delimiter by escaping its first character.
- dfdl:escapeCharacter (escaped by dfdl:escapeEscapeCharacter)
- any dfdl:extraEscapedCharacters
|
My expectation is that it should insert a ';' before the first character of a terminator in scope if it finds the whole terminator string in a data string. It should NOT do this if it finds only the first letter! And it should't insert an extra data string starting letter. That isn't my escape character.
The implementation of this escape behaviour seems unstable.
Surely this must be a bug? But how can it be that no one else have had this problem? (I have tried it on IIB9.0.0.2, the same result there).
What do you think? |
|
Back to top |
|
 |
kimbert |
Posted: Wed Aug 13, 2014 1:44 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Surely this must be a bug? But how can it be that no one else have had this problem? (I have tried it on IIB9.0.0.2, the same result there).
What do you think? |
I think you should switch careers and become a detective.
Obviously a defect. You should probably open a PMR for this. I will open an internal defect anyway. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
Svipdag |
Posted: Wed Aug 13, 2014 4:08 am Post subject: |
|
|
Newbie
Joined: 14 Jul 2014 Posts: 6
|
Thank you, it was as I expected. A PMR it will be.
If you get any idea of an easy workaround that I can implement as the weels turn in the IBM machinery, I would be happy if you share it with me.
I have thougt of writing code taking care of the escaping during serialization, and turn of escaping in the schema. But that would not work in the parsing step. I need the schema escaping during parsing. I could make two schemas, one that is used during parsing and one that is used during serialization. But heck, that is really ugly. I would get name conflicts, which I can of course solve in one way or the other. However, I'm not sure how easy it is to go down this road.
And as you saw in the beginning of this topic, I tried to move the newlines to the first characters of the terminators (which also made it needed to move newlines to the first character of the initiators). This way, as I think newlines isn't present in any data strings, would circumvent the escape character bug. But as you saw, I got problems with the unbounded data segments.
So if anyone come up with a new idea, I'll be grateful. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Aug 13, 2014 5:40 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I am almost sure that this advice is not necessary - but please make sure that you reference this thread in your PMR problem description. It is likely to save time for both you and IBM. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|