|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
DFDL : Parsing Other text or binary message |
« View previous topic :: View next topic » |
Author |
Message
|
sankritya |
Posted: Thu Jun 15, 2017 1:48 am Post subject: DFDL : Parsing Other text or binary message |
|
|
Centurion
Joined: 14 Feb 2008 Posts: 100
|
Toolkit Version : 9.0.0.6
I am trying to create a dfdl which parses the repeated fixed length messages.
2 out of 3 elements in the fixed length file are optional. Problem is if I provide all the fields then it parses correctly but does not recognizes if the elements are absent. It takes <CR><LF> as a length unit and continues parsing the missing element. Elements do not have any initiator or terminator.
User will not send any space if the elements are not present.
DFDL Used
Code: |
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recFixLengthFieldsFmt="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" schemaLocation="IBMdefined/RecordFixLengthFieldFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format encoding="{$dfdl:encoding}" escapeSchemeRef="recFixLengthFieldsFmt:RecordEscapeScheme" occursCountKind="fixed" ref="recFixLengthFieldsFmt:RecordFixLengthFieldsFormat"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:lengthKind="delimited" ibmSchExtn:docRoot="true" name="SAMPLE_DFDL">
<xsd:complexType>
<xsd:sequence>
<xsd:element dfdl:lengthKind="implicit" dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" maxOccurs="unbounded" name="body">
<xsd:complexType>
<xsd:sequence>
<xsd:element dfdl:length="10" name="body_elem1" type="xsd:string"/>
<xsd:element dfdl:length="10" dfdl:lengthKind="explicit" dfdl:occursCountKind="implicit" minOccurs="0" name="body_elem2" type="xsd:string"/>
<xsd:element dfdl:length="10" dfdl:lengthKind="explicit" dfdl:occursCountKind="implicit" minOccurs="0" name="body_elem3" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
|
Message which was successful
Quote: |
123456789112345678911234567891
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
|
Message which failed
Quote: |
12345678911234567891
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
|
Error returned in DFDL Test Log is
********************* DFDL Parser Setup Starting *********************
Schema = /SAMPLE/SAMPLE_DFDL.xsd
**********************************************************************
********************* DFDL Parser Starting *********************
Data = TestData.txt
Message = SAMPLE_DFDL (/SAMPLE/SAMPLE_DFDL.xsd)
****************************************************************
Jun 15, 2017, 5:42:35 AM info: Offset: 0. Parsing will start from root element 'SAMPLE_DFDL'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL), 70]
Jun 15, 2017, 5:42:35 AM info: The default value of '%LF;' was assigned to variable 'outputNewLine' in namespace 'http://www.ogf.org/dfdl/dfdl-1.0/'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = , 133]
Jun 15, 2017, 5:42:35 AM info: Offset: 0. Starting to process element 'SAMPLE_DFDL'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL), 61]
Jun 15, 2017, 5:42:35 AM info: Offset: 0. Up to '-1' occurrences of element 'body' will be expected because occursCountKind='implicit' and maxOccurs='-1'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body), 131]
Jun 15, 2017, 5:42:35 AM info: Offset: 0. Starting to process element 'body'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body), 54]
Jun 15, 2017, 5:42:35 AM info: Offset: 0. Starting to process element 'body_elem1'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 60]
Jun 15, 2017, 5:42:35 AM info: Offset: 0. Found fixed length value: '1234567891' for element 'body_elem1'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 83]
Jun 15, 2017, 5:42:35 AM info: Offset: 10. Finished processing element 'body_elem1'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 61]
Jun 15, 2017, 5:42:35 AM info: Offset: 10. Optional element 'body_elem2' encountered. The DFDL parser will return to this position if the element is not present in the data stream.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 157]
Jun 15, 2017, 5:42:35 AM info: Offset: 10. Starting to process element 'body_elem2'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 61]
Jun 15, 2017, 5:42:35 AM info: Offset: 10. Found fixed length value: '1234567891' for element 'body_elem2'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 84]
Jun 15, 2017, 5:42:35 AM info: Offset: 20. Finished processing element 'body_elem2'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 61]
Jun 15, 2017, 5:42:35 AM info: Offset: 20. Optional element 'body_elem3' encountered. The DFDL parser will return to this position if the element is not present in the data stream.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 157]
Jun 15, 2017, 5:42:35 AM info: Offset: 20. Starting to process element 'body_elem3'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 61]
Jun 15, 2017, 5:42:35 AM info: Offset: 20. Found fixed length value: '%CR;%LF;ABCDEFGH' for element 'body_elem3'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 90]
Jun 15, 2017, 5:42:35 AM info: Offset: 30. Finished processing element 'body_elem3'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 61]
Jun 15, 2017, 5:42:35 AM info: Offset: 30. Did not find terminator for 'body'. Expected terminator list is '%CR;%LF;'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body), 96]
Jun 15, 2017, 5:42:35 AM error: CTDP3042E: Terminator '%CR;%LF;' not found at offset '30' for element '/SAMPLE_DFDL[1]/body[1]'.
Jun 15, 2017, 5:42:35 AM fatal: CTDP3042E: Terminator '%CR;%LF;' not found at offset '30' for element '/SAMPLE_DFDL[1]/body[1]'. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Jun 15, 2017 2:58 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
I can't see any difference between the successful message and the unsuccessful message... but maybe my glasses are just smudged.
The error is complaining about a missing CRLF (windows line ending).
If this is your terminator, then that would explain why there doesn't appear to be any difference between the two - the CRLF is not visible.
Examine the two messages with something that shows hidden characters. Perhaps you have an LF instead (Unix line endings).
Also, your model won't be able to discriminate between the last two body element types. There could be a one body2 and a bunch of body3, or 0 body2 and a bunch of body3 or a bunch of body2 and 0 body3. Unless there's something else in your model, or my glasses are smudged, the model can't tell when you stop having body2 elements. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
sankritya |
Posted: Thu Jun 15, 2017 5:02 am Post subject: |
|
|
Centurion
Joined: 14 Feb 2008 Posts: 100
|
Difference between Successful message and failure message is absence of 3 rd element.
Successful message
Quote: |
123456789112345678911234567891<CR><LF>
ABCDEFGHIJKLMNOPQRSTUVWXYZ <CR><LF>
abcdefghijklmnopqrstuvwxyz <CR><LF>
|
Error Message _ Missing 3rd element
Quote: |
12345678911234567891<CR>LF>
ABCDEFGHIJKLMNOPQRSTUVWXYZ <CR><LF>
abcdefghijklmnopqrstuvwxyz <CR><LF>
|
Issue that is happening is that even though third element occurrence is set to minOccurs = 0 it still copies the data from 2nd row including CRLF in the 3rd element. Terminator <CR><LF> is set for body does not terminates the occurence. |
|
Back to top |
|
 |
sankritya |
Posted: Thu Jun 15, 2017 5:36 am Post subject: |
|
|
Centurion
Joined: 14 Feb 2008 Posts: 100
|
Quote: |
Jun 15, 2017, 5:42:35 AM info: Offset: 20. Found fixed length value: '%CR;%LF;ABCDEFGH' for element 'body_elem3'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 90]
|
why CRLF is getting included in the 3rd element. When parser encounters <CR><LF> it should have terminated the body as it is set as terminator for body. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Jun 15, 2017 6:40 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
So, again. Your model right now won't be able to tell the difference between one record2 followed by one record3, two record2s, and two record3s.
If you are using CR/LF to terminate both the record1,2 and 3, AND the sequence, then you'd need *two* CRLFs at the end of the file. One to terminate the inner sequence and one to terminate the outer sequence.
Given that your record1,2 and 3 are all fixed lenght, and you *don't* want it to include the CR/LF, then you need to account for the CR/LF in your model separately from the records (another element in your sequence, probably mkaing record1 a sequence, record 2 a sequence and record 3 another sequence, each with the record data and another element for the CRLF)
If you *do* want the CR/LF in your data, then you need to increase the length to cover it. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
sankritya |
Posted: Thu Jun 15, 2017 9:37 am Post subject: |
|
|
Centurion
Joined: 14 Feb 2008 Posts: 100
|
Sorry mqjeff if I confused you with the sample message. My Message is something like
Element101Element102Element103<CR><LF>
Element201Element202Element203<CR><LF>
.
.
So in XML terms something like
<Sample>
<body>Element101Element102Element103</body>
<body>Element201Element202Element203</body>
.
.
</Sample>
Here body is repeating with min occurrence of 1 and max occurrence unbounded. Each line of body has 3 elements if fixed width 10 bytes. 1st Element is mandatory and last 2 elements are optional. If present they will be of 10 chars else no space will be there . Each body line is terminated by CRLF. So with the defined dfdl , parser is unable to move to next body line if (2nd and 3rd element) or 3rd Element is missing. In given example, It tries to assign CRLF and first 8 characters of next body line to the missing 3 rd element. It is not terminated by CRLF which is defined as terminator for body.
Hope it clarifies the issue. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Jun 15, 2017 12:07 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Yes.
That's what I meant by "unable to distinguish between the three records".
If you are going to have records like this, then you either need to make sure that both records are always there, even if one of them is blank.
If you can do some additional parsing of the contents of each record1,2, and 3 then you can use that to help decide whether you got a record 2 or a record 3.
For example, if record 2 has an integer and then a string, and record 3 is only a string, then you can ask it to parse record 2 first, and if that fails it will backtrack and try and parse record 3. Or even better if they start with different fixed value (a tag) then you can use those as indicators(discriminators?)
But if you're just going to treat them all as 10 character strings, with no other structure - then it's not clear why you need record1 2 or 3 - just use one 10 character string that repeats. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
sankritya |
Posted: Thu Jun 15, 2017 4:32 pm Post subject: |
|
|
Centurion
Joined: 14 Feb 2008 Posts: 100
|
It is a sample which I created for raising the question. In real there are a lot of elements after the mandatory element of different length which can come .If they come all of them will be present else none. It is possible to be done in MRM( that's how it is currently running). I wanted to know if this can be done in dfdl or not. |
|
Back to top |
|
 |
timber |
Posted: Fri Jun 16, 2017 3:16 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
why CRLF is getting included in the 3rd element. When parser encounters <CR><LF> it should have terminated the body as it is set as terminator for body. |
That is not correct. The third element is a fixed-length element and it will consume a fixed number of characters/bytes (depending on the lengthUnits). If there happens to be a CR/LF pair in the middle of those characters/bytes then it won't care. This is correct behaviour.
One solution would be to put an assert on each optional field to check whether it starts with CR/LF. It will then throw a DFDL parsing error if the field starts with CR/LF and the parser will gracefully back out of parsing the optional element(s). |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Jun 16, 2017 4:38 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Making CRLF the record terminator and the field optional. Would that tell the parser that because it encountered the CRLF the field is not present and move on to the next record?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
timber |
Posted: Fri Jun 16, 2017 4:52 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
That would only work if lengthKind was set to 'delimited'. If a field is fixed-length then it will consume exactly N bytes/characters regardless of any in-scope terminators or delimiters.
This is a very simple rule but it does not always correspond to our natural expectations. In this case, both you and the OP expected that DFDL would 'notice' that the enclosing structure has a terminator of CR/LF. You expected that the bitstream available to the innermost structure would be constrained by the CR/LF. That would be one way to specify the behaviour, but it would not be as flexible as the bottom-up approach that DFDL actually uses ( it would not allow truly fixed-length fields within a delimited structure ). |
|
Back to top |
|
 |
sankritya |
Posted: Fri Jun 16, 2017 6:27 am Post subject: |
|
|
Centurion
Joined: 14 Feb 2008 Posts: 100
|
Thanks timber. And how will I write the assert expression? I have never used it before. |
|
Back to top |
|
 |
timber |
Posted: Mon Jun 19, 2017 12:25 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
It's an XPath expression. If it evaluates to true then nothing happens. If it evaluates to false then the DFDL parser behaves exactly as if there was a parsing error. It will back out to the nearest enclosing 'point of uncertainty' ( which will be the optional element itself) and try the next item in the model. |
|
Back to top |
|
 |
sankritya |
Posted: Wed Jun 21, 2017 7:02 am Post subject: |
|
|
Centurion
Joined: 14 Feb 2008 Posts: 100
|
Thanks timber. It worked finally. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|