Author |
Message
|
TKIY |
Posted: Wed Oct 10, 2018 11:37 am Post subject: Creating DFDL with variable segment terminators |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
I'm creating a DFDL to handle a pretty standard Hierarchical data structure, and have hit a bit of a snag.
The data segments, elements, etc all parse correctly but this interface will be handling messages from dozens of vendors, each of whom formats the data slightly differently. I've been able to handle all of the wrinkles so far but this one has me stumped.
The data is basically:
Code: |
<HEADER_SEGMENT><SEG_TERMINATOR>
<GROUP_SEGMENT><SEG_TERMINATOR>
<DATA_SEGMENT_1><SEG_TERMINATOR>
<DATA_SEGMENT_2><SEG_TERMINATOR>
...<DATA_SEGMENT_n> |
Not a big deal but some vendors are including a newline character after the segment terminators. I can pretty easily clean the files of newlines before passing it through to the DFDL but this doesn't really work if the goal is to produce a DFDL library for use throughout the organization.
It seems like I should be able to ignore whitespace after the segment terminator but I don't seem to have the option to.
Is there a way around this? |
|
Back to top |
|
 |
timber |
Posted: Wed Oct 10, 2018 1:50 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
If it was ordinary whitespace, you could use the DFDL mnemonic %WSP*. But that will not match newline characters.
Fortunately, DFDL does have a solution. All delimiters (terminators and separators) can be specified as a space-separated list of allowed values. So specify the terminator as a two-element list. The segment terminator without the newline is the first option, then a space, then the segment terminator followed by %NL; (or a more specific newline character, ideally). DFDL will automatically test for the longest match first, so it should work reliably. |
|
Back to top |
|
 |
TKIY |
Posted: Thu Oct 11, 2018 4:16 am Post subject: |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
timber wrote: |
If it was ordinary whitespace, you could use the DFDL mnemonic %WSP*. But that will not match newline characters.
Fortunately, DFDL does have a solution. All delimiters (terminators and separators) can be specified as a space-separated list of allowed values. So specify the terminator as a two-element list. The segment terminator without the newline is the first option, then a space, then the segment terminator followed by %NL; (or a more specific newline character, ideally). DFDL will automatically test for the longest match first, so it should work reliably. |
Ah perfect. I'll give this a shot then.
Thank you!
edit:
%NL; didn't work, IIB 9 throws a fit if you specify it as a terminator so I just included %CR; and %LF; and it works great. Thanks! |
|
Back to top |
|
 |
timber |
Posted: Thu Oct 11, 2018 5:01 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Glad it's working.
Quote: |
%NL; didn't work, IIB 9 throws a fit if you specify it as a terminator |
The CSV wizard uses %NL; as a terminator in all CSV models, so I don't think it can be as bad as that. What specific type of 'fit' did IIB throw when you tried it? |
|
Back to top |
|
 |
TKIY |
Posted: Thu Oct 11, 2018 5:03 am Post subject: |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
timber wrote: |
Glad it's working.
Quote: |
%NL; didn't work, IIB 9 throws a fit if you specify it as a terminator |
The CSV wizard uses %NL; as a terminator in all CSV models, so I don't think it can be as bad as that. What specific type of 'fit' did IIB throw when you tried it? |
CTDV1515
DFDL property 'outputNewLine' must only contain characters that are allowed for DFDL entity %NL;. |
|
Back to top |
|
 |
timber |
Posted: Thu Oct 11, 2018 9:04 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Sounds as if the property Output Newline was set to an incorrect value. Hard to say more than that, as you didn't mention what that property was set to  |
|
Back to top |
|
 |
TKIY |
Posted: Thu Oct 11, 2018 9:06 am Post subject: |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
timber wrote: |
Sounds as if the property Output Newline was set to an incorrect value. Hard to say more than that, as you didn't mention what that property was set to  |
It's actually blank. I can only assume there is a reference somewhere in there that uses the default terminator for the Output Newline property.
It's all working at the moment though, so I'm not going to complain. |
|
Back to top |
|
 |
timber |
Posted: Fri Oct 12, 2018 12:31 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
For the benefit of future readers...
The fix is to set the Output Newline property to a valid value. You can't leave it empty when %NL; is being used in the Terminator or Separator property. The %NL; mnemonic matches *any* newline, but when writing the message DFDL needs to be told which newline character(s) to output. It doesn't like guessing  |
|
Back to top |
|
 |
TKIY |
Posted: Fri Oct 12, 2018 4:10 am Post subject: |
|
|
Novice
Joined: 23 Aug 2016 Posts: 19
|
timber wrote: |
For the benefit of future readers...
The fix is to set the Output Newline property to a valid value. You can't leave it empty when %NL; is being used in the Terminator or Separator property. The %NL; mnemonic matches *any* newline, but when writing the message DFDL needs to be told which newline character(s) to output. It doesn't like guessing  |
That makes sense. Went into this process sort of blind and we didn't really have any documentation on it so information like this is extremely valuable.
Thanks for all of the advice Timber! |
|
Back to top |
|
 |
|