ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL : Parsing Other text or binary message

Post new topic  Reply to topic
 DFDL : Parsing Other text or binary message « View previous topic :: View next topic » 
Author Message
sankritya
PostPosted: Thu Jun 15, 2017 1:48 am    Post subject: DFDL : Parsing Other text or binary message Reply with quote

Centurion

Joined: 14 Feb 2008
Posts: 100

Toolkit Version : 9.0.0.6

I am trying to create a dfdl which parses the repeated fixed length messages.
2 out of 3 elements in the fixed length file are optional. Problem is if I provide all the fields then it parses correctly but does not recognizes if the elements are absent. It takes <CR><LF> as a length unit and continues parsing the missing element. Elements do not have any initiator or terminator.

User will not send any space if the elements are not present.

DFDL Used
Code:
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recFixLengthFieldsFmt="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat">

    <xsd:import namespace="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" schemaLocation="IBMdefined/RecordFixLengthFieldFormat.xsd"/>
    <xsd:annotation>
      <xsd:appinfo source="http://www.ogf.org/dfdl/">
         <dfdl:format encoding="{$dfdl:encoding}" escapeSchemeRef="recFixLengthFieldsFmt:RecordEscapeScheme" occursCountKind="fixed" ref="recFixLengthFieldsFmt:RecordFixLengthFieldsFormat"/>
      </xsd:appinfo>
   </xsd:annotation>

   <xsd:element dfdl:lengthKind="delimited" ibmSchExtn:docRoot="true" name="SAMPLE_DFDL">
      <xsd:complexType>
         <xsd:sequence>
            <xsd:element dfdl:lengthKind="implicit" dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" maxOccurs="unbounded" name="body">
               <xsd:complexType>
                  <xsd:sequence>
                     <xsd:element dfdl:length="10" name="body_elem1" type="xsd:string"/>
                     <xsd:element dfdl:length="10" dfdl:lengthKind="explicit" dfdl:occursCountKind="implicit" minOccurs="0" name="body_elem2" type="xsd:string"/>
                     <xsd:element dfdl:length="10" dfdl:lengthKind="explicit" dfdl:occursCountKind="implicit" minOccurs="0" name="body_elem3" type="xsd:string"/>
                  </xsd:sequence>
               </xsd:complexType>
            </xsd:element>
         </xsd:sequence>
      </xsd:complexType>
   </xsd:element>


</xsd:schema>


Message which was successful
Quote:

123456789112345678911234567891
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz


Message which failed
Quote:

12345678911234567891
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz


Error returned in DFDL Test Log is


********************* DFDL Parser Setup Starting *********************
Schema = /SAMPLE/SAMPLE_DFDL.xsd
**********************************************************************



********************* DFDL Parser Starting *********************
Data = TestData.txt
Message = SAMPLE_DFDL (/SAMPLE/SAMPLE_DFDL.xsd)
****************************************************************

Jun 15, 2017, 5:42:35 AM info: Offset: 0. Parsing will start from root element 'SAMPLE_DFDL'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL), 70]

Jun 15, 2017, 5:42:35 AM info: The default value of '%LF;' was assigned to variable 'outputNewLine' in namespace 'http://www.ogf.org/dfdl/dfdl-1.0/'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = , 133]

Jun 15, 2017, 5:42:35 AM info: Offset: 0. Starting to process element 'SAMPLE_DFDL'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL), 61]

Jun 15, 2017, 5:42:35 AM info: Offset: 0. Up to '-1' occurrences of element 'body' will be expected because occursCountKind='implicit' and maxOccurs='-1'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body), 131]

Jun 15, 2017, 5:42:35 AM info: Offset: 0. Starting to process element 'body'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body), 54]

Jun 15, 2017, 5:42:35 AM info: Offset: 0. Starting to process element 'body_elem1'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 60]

Jun 15, 2017, 5:42:35 AM info: Offset: 0. Found fixed length value: '1234567891' for element 'body_elem1'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 83]

Jun 15, 2017, 5:42:35 AM info: Offset: 10. Finished processing element 'body_elem1'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 61]

Jun 15, 2017, 5:42:35 AM info: Offset: 10. Optional element 'body_elem2' encountered. The DFDL parser will return to this position if the element is not present in the data stream.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 157]

Jun 15, 2017, 5:42:35 AM info: Offset: 10. Starting to process element 'body_elem2'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 61]

Jun 15, 2017, 5:42:35 AM info: Offset: 10. Found fixed length value: '1234567891' for element 'body_elem2'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 84]

Jun 15, 2017, 5:42:35 AM info: Offset: 20. Finished processing element 'body_elem2'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 61]

Jun 15, 2017, 5:42:35 AM info: Offset: 20. Optional element 'body_elem3' encountered. The DFDL parser will return to this position if the element is not present in the data stream.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 157]

Jun 15, 2017, 5:42:35 AM info: Offset: 20. Starting to process element 'body_elem3'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 61]

Jun 15, 2017, 5:42:35 AM info: Offset: 20. Found fixed length value: '%CR;%LF;ABCDEFGH' for element 'body_elem3'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 90]

Jun 15, 2017, 5:42:35 AM info: Offset: 30. Finished processing element 'body_elem3'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 61]

Jun 15, 2017, 5:42:35 AM info: Offset: 30. Did not find terminator for 'body'. Expected terminator list is '%CR;%LF;'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body), 96]

Jun 15, 2017, 5:42:35 AM error: CTDP3042E: Terminator '%CR;%LF;' not found at offset '30' for element '/SAMPLE_DFDL[1]/body[1]'.

Jun 15, 2017, 5:42:35 AM fatal: CTDP3042E: Terminator '%CR;%LF;' not found at offset '30' for element '/SAMPLE_DFDL[1]/body[1]'.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Thu Jun 15, 2017 2:58 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

I can't see any difference between the successful message and the unsuccessful message... but maybe my glasses are just smudged.

The error is complaining about a missing CRLF (windows line ending).

If this is your terminator, then that would explain why there doesn't appear to be any difference between the two - the CRLF is not visible.

Examine the two messages with something that shows hidden characters. Perhaps you have an LF instead (Unix line endings).

Also, your model won't be able to discriminate between the last two body element types. There could be a one body2 and a bunch of body3, or 0 body2 and a bunch of body3 or a bunch of body2 and 0 body3. Unless there's something else in your model, or my glasses are smudged, the model can't tell when you stop having body2 elements.
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
sankritya
PostPosted: Thu Jun 15, 2017 5:02 am    Post subject: Reply with quote

Centurion

Joined: 14 Feb 2008
Posts: 100

Difference between Successful message and failure message is absence of 3 rd element.

Successful message
Quote:

123456789112345678911234567891<CR><LF>
ABCDEFGHIJKLMNOPQRSTUVWXYZ <CR><LF>
abcdefghijklmnopqrstuvwxyz <CR><LF>


Error Message _ Missing 3rd element
Quote:

12345678911234567891<CR>LF>
ABCDEFGHIJKLMNOPQRSTUVWXYZ <CR><LF>
abcdefghijklmnopqrstuvwxyz <CR><LF>


Issue that is happening is that even though third element occurrence is set to minOccurs = 0 it still copies the data from 2nd row including CRLF in the 3rd element. Terminator <CR><LF> is set for body does not terminates the occurence.
Back to top
View user's profile Send private message
sankritya
PostPosted: Thu Jun 15, 2017 5:36 am    Post subject: Reply with quote

Centurion

Joined: 14 Feb 2008
Posts: 100

Quote:

Jun 15, 2017, 5:42:35 AM info: Offset: 20. Found fixed length value: '%CR;%LF;ABCDEFGH' for element 'body_elem3'.
[dfdl = /SAMPLE/SAMPLE_DFDL.xsd, scd = #xscd(/schemaElement::SAMPLE_DFDL/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 90]


why CRLF is getting included in the 3rd element. When parser encounters <CR><LF> it should have terminated the body as it is set as terminator for body.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Thu Jun 15, 2017 6:40 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

So, again. Your model right now won't be able to tell the difference between one record2 followed by one record3, two record2s, and two record3s.

If you are using CR/LF to terminate both the record1,2 and 3, AND the sequence, then you'd need *two* CRLFs at the end of the file. One to terminate the inner sequence and one to terminate the outer sequence.

Given that your record1,2 and 3 are all fixed lenght, and you *don't* want it to include the CR/LF, then you need to account for the CR/LF in your model separately from the records (another element in your sequence, probably mkaing record1 a sequence, record 2 a sequence and record 3 another sequence, each with the record data and another element for the CRLF)

If you *do* want the CR/LF in your data, then you need to increase the length to cover it.
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
sankritya
PostPosted: Thu Jun 15, 2017 9:37 am    Post subject: Reply with quote

Centurion

Joined: 14 Feb 2008
Posts: 100

Sorry mqjeff if I confused you with the sample message. My Message is something like
Element101Element102Element103<CR><LF>
Element201Element202Element203<CR><LF>
.
.

So in XML terms something like
<Sample>
<body>Element101Element102Element103</body>
<body>Element201Element202Element203</body>
.
.
</Sample>
Here body is repeating with min occurrence of 1 and max occurrence unbounded. Each line of body has 3 elements if fixed width 10 bytes. 1st Element is mandatory and last 2 elements are optional. If present they will be of 10 chars else no space will be there . Each body line is terminated by CRLF. So with the defined dfdl , parser is unable to move to next body line if (2nd and 3rd element) or 3rd Element is missing. In given example, It tries to assign CRLF and first 8 characters of next body line to the missing 3 rd element. It is not terminated by CRLF which is defined as terminator for body.

Hope it clarifies the issue.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Thu Jun 15, 2017 12:07 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Yes.

That's what I meant by "unable to distinguish between the three records".

If you are going to have records like this, then you either need to make sure that both records are always there, even if one of them is blank.

If you can do some additional parsing of the contents of each record1,2, and 3 then you can use that to help decide whether you got a record 2 or a record 3.

For example, if record 2 has an integer and then a string, and record 3 is only a string, then you can ask it to parse record 2 first, and if that fails it will backtrack and try and parse record 3. Or even better if they start with different fixed value (a tag) then you can use those as indicators(discriminators?)

But if you're just going to treat them all as 10 character strings, with no other structure - then it's not clear why you need record1 2 or 3 - just use one 10 character string that repeats.
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
sankritya
PostPosted: Thu Jun 15, 2017 4:32 pm    Post subject: Reply with quote

Centurion

Joined: 14 Feb 2008
Posts: 100

It is a sample which I created for raising the question. In real there are a lot of elements after the mandatory element of different length which can come .If they come all of them will be present else none. It is possible to be done in MRM( that's how it is currently running). I wanted to know if this can be done in dfdl or not.
Back to top
View user's profile Send private message
timber
PostPosted: Fri Jun 16, 2017 3:16 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

Quote:
why CRLF is getting included in the 3rd element. When parser encounters <CR><LF> it should have terminated the body as it is set as terminator for body.
That is not correct. The third element is a fixed-length element and it will consume a fixed number of characters/bytes (depending on the lengthUnits). If there happens to be a CR/LF pair in the middle of those characters/bytes then it won't care. This is correct behaviour.

One solution would be to put an assert on each optional field to check whether it starts with CR/LF. It will then throw a DFDL parsing error if the field starts with CR/LF and the parser will gracefully back out of parsing the optional element(s).
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Jun 16, 2017 4:38 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

Making CRLF the record terminator and the field optional. Would that tell the parser that because it encountered the CRLF the field is not present and move on to the next record?
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
timber
PostPosted: Fri Jun 16, 2017 4:52 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

That would only work if lengthKind was set to 'delimited'. If a field is fixed-length then it will consume exactly N bytes/characters regardless of any in-scope terminators or delimiters.

This is a very simple rule but it does not always correspond to our natural expectations. In this case, both you and the OP expected that DFDL would 'notice' that the enclosing structure has a terminator of CR/LF. You expected that the bitstream available to the innermost structure would be constrained by the CR/LF. That would be one way to specify the behaviour, but it would not be as flexible as the bottom-up approach that DFDL actually uses ( it would not allow truly fixed-length fields within a delimited structure ).
Back to top
View user's profile Send private message
sankritya
PostPosted: Fri Jun 16, 2017 6:27 am    Post subject: Reply with quote

Centurion

Joined: 14 Feb 2008
Posts: 100

Thanks timber. And how will I write the assert expression? I have never used it before.
Back to top
View user's profile Send private message
timber
PostPosted: Mon Jun 19, 2017 12:25 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

It's an XPath expression. If it evaluates to true then nothing happens. If it evaluates to false then the DFDL parser behaves exactly as if there was a parsing error. It will back out to the nearest enclosing 'point of uncertainty' ( which will be the optional element itself) and try the next item in the model.
Back to top
View user's profile Send private message
sankritya
PostPosted: Wed Jun 21, 2017 7:02 am    Post subject: Reply with quote

Centurion

Joined: 14 Feb 2008
Posts: 100

Thanks timber. It worked finally.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL : Parsing Other text or binary message
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.