|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
DFDL conversion from XML to CSV |
« View previous topic :: View next topic » |
Author |
Message
|
6FA |
Posted: Tue Mar 08, 2016 9:42 pm Post subject: DFDL conversion from XML to CSV |
|
|
Novice
Joined: 08 Jan 2016 Posts: 21
|
Hi Guys,
I'm converting XML message to CSV format using DFDL. But at each end of the line I'm getting the value within double qoutes(").
I'm getting CSV as below
20150506,D,2,STK,"5126L01500 "
20150506,AM,6,STK,"6666L01500 "
But it should be like below
20150506,D,2,STK,5126L01500
20150506,AM,6,STK,6666L01500
Can any one suggest me how can I achieve it.
Thanks  |
|
Back to top |
|
 |
martinb |
Posted: Tue Mar 08, 2016 11:03 pm Post subject: |
|
|
Master
Joined: 09 Nov 2006 Posts: 210 Location: UK
|
In order to help you we would need to know what input your XML is providing, and what you have currently implemented in your message flow that does the convert. |
|
Back to top |
|
 |
maurito |
Posted: Tue Mar 08, 2016 11:40 pm Post subject: Re: DFDL conversion from XML to CSV |
|
|
Partisan
Joined: 17 Apr 2014 Posts: 358
|
6FA wrote: |
Hi Guys,
I'm converting XML message to CSV format using DFDL. But at each end of the line I'm getting the value within double qoutes(").
I'm getting CSV as below
20150506,D,2,STK,"5126L01500 "
20150506,AM,6,STK,"6666L01500 "
But it should be like below
20150506,D,2,STK,5126L01500
20150506,AM,6,STK,6666L01500
Can any one suggest me how can I achieve it.
Thanks  |
It looks like you have a blank space at the end of the values, that's why they are being enclosed in double quotes. Trim the values and you will get the desired output.
If the blank space at the end is needed, then there is nothing wrong with the double quotes. Look at the CSVEscapeScheme in your comma separatedFormat |
|
Back to top |
|
 |
timber |
Posted: Wed Mar 09, 2016 3:49 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
If you look in CommaSeparatedFormat.xsd you will see this definition of the escape scheme:
Code: |
<dfdl:defineEscapeScheme name="CSVEscapeScheme">
<dfdl:escapeScheme escapeKind="escapeBlock"
escapeBlockStart='"' escapeBlockEnd='"' escapeCharacter='"'
escapeEscapeCharacter='"' generateEscapeBlock="whenNeeded"
extraEscapedCharacters=", %#x0D; %#x0A;">
</dfdl:escapeScheme>
</dfdl:defineEscapeScheme>
|
This is what controls the presence or absence of quotes around a field.
Look at the extraEscapedCharacters attribute. Yes, it contains spaces but it is a space-separated list so the spaces are not part of the value. So a space character in a field value will not cause it to be escaped ( surrounded by quotes ).
I suspect that you have one or both of the following characters in your field value:
- a carriage return
- a line feed
I can't prove it, of course. But I don't think that a space would be escaped. |
|
Back to top |
|
 |
maurito |
Posted: Wed Mar 09, 2016 4:30 am Post subject: |
|
|
Partisan
Joined: 17 Apr 2014 Posts: 358
|
timber wrote: |
If you look in CommaSeparatedFormat.xsd you will see this definition of the escape scheme:
Code: |
<dfdl:defineEscapeScheme name="CSVEscapeScheme">
<dfdl:escapeScheme escapeKind="escapeBlock"
escapeBlockStart='"' escapeBlockEnd='"' escapeCharacter='"'
escapeEscapeCharacter='"' generateEscapeBlock="whenNeeded"
extraEscapedCharacters=", %#x0D; %#x0A;">
</dfdl:escapeScheme>
</dfdl:defineEscapeScheme>
|
This is what controls the presence or absence of quotes around a field.
Look at the extraEscapedCharacters attribute. Yes, it contains spaces but it is a space-separated list so the spaces are not part of the value. So a space character in a field value will not cause it to be escaped ( surrounded by quotes ).
I suspect that you have one or both of the following characters in your field value:
- a carriage return
- a line feed
I can't prove it, of course. But I don't think that a space would be escaped. |
you are right Tim, I missed that. But shouldn't the CR LF be defined as record terminators in the DFDL ? that should remove them from the output as well.
Or maybe the terminator is defined as %LF; and the records contain %CR;%LF; so the %LF; is removed and the %CR; remains in place ? |
|
Back to top |
|
 |
timber |
Posted: Wed Mar 09, 2016 5:03 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
maurito said:
Quote: |
shouldn't the CR LF be defined as record terminators in the DFDL ? that should remove them from the output as well |
A CSV format can be simply described thus:
- a message is a sequence of records delimited by line breaks
- a record is a sequence of fields delimited by commas
- a field value that contains a line break or a comma must be escaped by surrounding it with quotes
So if a field contains a delimiter (CR, LF or comma) then it must be surrounded by quotes. But the the delimiter is never removed from the output by the DFDL parser - that might change the meaning of the data ( some countries use a comma as a decimal separator, for instance ). |
|
Back to top |
|
 |
maurito |
Posted: Wed Mar 09, 2016 5:14 am Post subject: |
|
|
Partisan
Joined: 17 Apr 2014 Posts: 358
|
timber wrote: |
maurito said:
Quote: |
shouldn't the CR LF be defined as record terminators in the DFDL ? that should remove them from the output as well |
A CSV format can be simply described thus:
- a message is a sequence of records delimited by line breaks
- a record is a sequence of fields delimited by commas
- a field value that contains a line break or a comma must be escaped by surrounding it with quotes
So if a field contains a delimiter (CR, LF or comma) then it must be surrounded by quotes. But the the delimiter is never removed from the output by the DFDL parser - that might change the meaning of the data ( some countries use a comma as a decimal separator, for instance ). |
yes, correct. I used the term removed in a loose way. The delimiter will still be there but rather than as CRLF it could be that the user is now seeing
"5126L01500%CR;"%LF; |
|
Back to top |
|
 |
timber |
Posted: Wed Mar 09, 2016 8:37 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Just to avoid confusion...
- a comma that appears between two field values is a delimiter
- a comma that appears within the value of a field is part of the field value. It is not a delimiter.
Similarly,
- a linefeed that appears between two records is a delimiter
- a linefeed that appears within the value of a field is part of the field value. It is not a delimiter.
The escape scheme tells the parser
a) how to recognise the difference between those two situations when *reading*.
b) how to decide whether a field needs to be escaped when *writing*.
I know you know this stuff. But others reading this thread may not, so it's important to be very clear in how it is explained. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|