Author |
Message
|
TheYodas |
Posted: Wed Sep 24, 2014 1:58 pm Post subject: Escaping a Character in DFDL |
|
|
Novice
Joined: 28 Apr 2014 Posts: 15
|
Hi,
I have an DFDL parser which is ","(comma) delimited. In one of the field comma will appear within the value of the field, hence I have to escape the comma character for this field. But, I need that comma character to be available in the value when it is output. I have defined Espacescheme as below, and it is removing the Escape character from the value. Can anyone suggest how to make define the parser so that escape character is not removed.
Code: |
<dfdl:defineEscapeScheme name="CommaPropertiesEscapeScheme">
<dfdl:escapeScheme escapeBlockEnd="" escapeBlockStart="" escapeCharacter="," escapeEscapeCharacter="," escapeKind="escapeCharacter" extraEscapedCharacters="" generateEscapeBlock="whenNeeded"/>
|
|
|
Back to top |
|
 |
kimbert |
Posted: Wed Sep 24, 2014 2:26 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Please quote all of the following:
- the input message
- the value in the message tree ( ideally from a Trace node ) and
- the value in the output message _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
TheYodas |
Posted: Fri Sep 26, 2014 2:13 pm Post subject: |
|
|
Novice
Joined: 28 Apr 2014 Posts: 15
|
Sorry for the delay.
Input Message:
Quote: |
16,115,10006218,S,0,9362257,643961,,0012345,NBR OF ITEMS: 11,PROCESS SITE: LONDON
|
Trace Node:
Quote: |
(0x01000000:Name):Transactions = (
(0x01000000:Name):TransactionDetailRecord = (
(0x01000000:Name )btmu:bai:TransactionTypeCodes = (
(0x03000000:NameValue)btmu:bai:TypeCode = '115' (CHARACTER)
(0x03000000:NameValue)btmu:bai:Amount = '10006218' (CHARACTER)
(0x03000000:NameValue):TranFundsTypeS = 'S' (CHARACTER)
(0x01000000:Name ):TransactionFundTypesS = (
(0x01000000:Name):SFundsType = (
(0x03000000:NameValue):ImmediateAvailabilityAmount = '0' (CHARACTER)
(0x03000000:NameValue):OneDayAvailabilityAmount = '9362257' (CHARACTER)
(0x03000000:NameValue):MorethanDayAvailabilityAmount = '643961' (CHARACTER)
)
)
)
(0x03000000:NameValue)btmu:bai:CustomerReferenceNumber = '0012345' (CHARACTER)
(0x03000000:NameValue):Text = 'NBR OF ITEMS: 11PROCESS SITE: LONDON' (CHARACTER)
)
)
|
Output Message:
Quote: |
16,115,10006218,S,0,9362257,643961,,0017368,NBR OF ITEMS: 11,PROCESS SITE: LONDON
|
In the above structure, "Text" field is having comma which needs to be escaped. I see that when it is writing to wire it is having comma, but when it is in the tree it is missing comma. Since, I am mapping to different internal format, when I am mapping it is missing the comma.
Any ideas how to make sure, escape character is not removed when it is parsed. |
|
Back to top |
|
 |
kimbert |
Posted: Sat Sep 27, 2014 1:04 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Are you sure that you quoted the input message accurately? I would have expected this:
Code: |
16,115,10006218,S,0,9362257,643961,,0012345,NBR OF ITEMS: 11,,PROCESS SITE: LONDON |
So the first comma is the escape character, and it prevents the second comma from being interpreted as a separator. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
TheYodas |
Posted: Sun Sep 28, 2014 6:09 pm Post subject: |
|
|
Novice
Joined: 28 Apr 2014 Posts: 15
|
Hi kimbert,
I have the right input message. There is no indicator in my case to indicator the next comma should not be interpreted as delimiter.
May I know the reason behind your doubt? |
|
Back to top |
|
 |
kimbert |
Posted: Mon Sep 29, 2014 12:32 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
So the final comma looks exactly like a delimiter. How does any reader ( DFDL or otherwise) know that it is not a delimiter?
I can guess the answer - but only because I've worked with data formats for a long time. This is what you said:
Quote: |
In the above structure, "Text" field is having comma which needs to be escaped |
I think that is a misleading way to describe your data format - unless my guess is wrong.
So...what's the answer? _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
TheYodas |
Posted: Mon Sep 29, 2014 8:38 am Post subject: |
|
|
Novice
Joined: 28 Apr 2014 Posts: 15
|
Hi Kimbert,
My requirement is any "comma" in Text field should not be considered as delimiter. For this reason this is how I defined my DFDL:
Quote: |
<xsd:element dfdl:lengthKind="delimited" dfdl:occursCountKind="implicit" maxOccurs="unbounded" minOccurs="0" name="Transactions">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:emptyValueDelimiterPolicy="none" dfdl:initiator="16," dfdl:lengthKind="delimited" dfdl:terminator="%CR;%LF;" ibmSchExtn:docRoot="true" minOccurs="1" name="TransactionDetailRecord">
<xsd:complexType>
<xsd:sequence dfdl:initiator="" dfdl:separatorPolicy="suppressedAtEndLax" dfdl:terminator="">
<xsd:element maxOccurs="1" ref="ns0:TransactionTypeCodes"/>
<xsd:element dfdl:emptyValueDelimiterPolicy="none" dfdl:nilValueDelimiterPolicy="none" dfdl:occursCountKind="implicit" minOccurs="0" ref="ns0:BankReferenceNumber"/>
<xsd:element dfdl:emptyValueDelimiterPolicy="none" dfdl:nilValueDelimiterPolicy="none" dfdl:occursCountKind="implicit" minOccurs="0" ref="ns0:CustomerReferenceNumber"/>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:emptyValueDelimiterPolicy="none" dfdl:escapeSchemeRef="ns0:CommaPropertiesEscapeScheme" dfdl:lengthKind="delimited" dfdl:nilValueDelimiterPolicy="none" dfdl:occursCountKind="implicit" minOccurs="0" name="Text" type="xsd:string"/>
</xsd:sequence>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format documentFinalTerminatorCanBeMissing="yes" encoding="{$dfdl:encoding}" escapeSchemeRef="csv:CSVEscapeScheme" ref="csv:CommaSeparatedFormat"/>
<dfdl:defineEscapeScheme name="CommaPropertiesEscapeScheme">
<dfdl:escapeScheme escapeBlockEnd="" escapeBlockStart="" escapeCharacter="," escapeEscapeCharacter="," escapeKind="escapeCharacter" extraEscapedCharacters="" generateEscapeBlock="whenNeeded"/>
</dfdl:defineEscapeScheme>
</xsd:appinfo>
</xsd:annotation>
|
My understanding was "EscapeScheme" will solve my problem. Looks like my understanding is wrong.
Kimbert, can you point me to the right approach here. Thanks. |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Sep 29, 2014 8:42 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Again, ask yourself.
What criteria can the DFDL, *or any parser* use to know that the comma is not a delimiter?
Is the "Text" field surrounded by quotes? Is the "Text" field supposed to occupy a fixed length? Is the "Text" field supposed to be the last element in the record?
To put it another way - tell me how many fields are contained in this string:
alpha,beta,gamma,delta,epsilon
Is it five? Or one? or three? or two? or six? or thirty? |
|
Back to top |
|
 |
TheYodas |
Posted: Mon Sep 29, 2014 8:45 am Post subject: |
|
|
Novice
Joined: 28 Apr 2014 Posts: 15
|
The Text field is the last field in the record "TransactionDetailRecord", as described in the schema i posted in my last post. Sorry, I should have mentioned that before. |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Sep 29, 2014 8:47 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
TheYodas wrote: |
The Text field is the last field in the record "TransactionDetailRecord", as described in the schema i posted in my last post. Sorry, I should have mentioned that before. |
So it's delimited by the end of the record, and not delimited by a comma. |
|
Back to top |
|
 |
TheYodas |
Posted: Mon Sep 29, 2014 8:51 am Post subject: |
|
|
Novice
Joined: 28 Apr 2014 Posts: 15
|
Actually the terminator :
Quote: |
dfdl:terminator="%CR;%LF;" |
Will signal the end of "TransactionDetailRecord" record. The elements within TransactionDetailRecord are all delimited by "comma". |
|
Back to top |
|
 |
kimbert |
Posted: Mon Sep 29, 2014 11:46 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
For future reference, it help to provide all of the information up front. Otherwise we just end up asking for it anyway.
You solution ( putting the final field into its own sequence group with the separator set to the empty string ) was a good idea. Shame you did not mention it before. However, you got the structure wrong. I should look like this:
Quote: |
<xsd:element name="Transactions" maxOccurs="unbounded" minOccurs="0">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element name="TransactionDetailRecord" minOccurs="1"
dfdl:initiator="16," dfdl:lengthKind="implicit" dfdl:terminator="%CR;%LF;" >
<xsd:complexType>
<xsd:sequence>
<xsd:sequence dfdl:separator=",">
<xsd:element maxOccurs="1" ref="ns0:TransactionTypeCodes"/>
<xsd:element minOccurs="0" ref="ns0:BankReferenceNumber"/>
<xsd:element minOccurs="0" ref="ns0:CustomerReferenceNumber"/>
</xsd:sequence>
<xsd:sequence dfdl:separator="">
<xsd:element name="Text" type="xsd:string"/>
</xsd:sequence>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
|
The important changes are:
- The comma-separated fields are inside a nested sequence group of their own.
- the lengthKind of the complex element is changed to 'implicit' ( just believe me on this one - it's the correct option for almost any complex element in DFDL ).
- I removed some properties that should really be located in the default format block. Mainly to make the schema more readable for this forum. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Sep 29, 2014 11:47 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
@moderators: What happened to the leading white space in my code block? All of the indentation is lost - and indentation is one of the main reasons for using a code block. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Sep 29, 2014 11:52 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
kimbert wrote: |
@moderators: What happened to the leading white space in my code block? All of the indentation is lost - and indentation is one of the main reasons for using a code block. |
You appear to have put in a [ q u o t e ] block rather than a [ c o d e ] block. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Sep 29, 2014 12:07 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Definitely need more tea. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
|