MQSeries.net :: View topic - Need to parse more than 500 feilds in CSV file using DFDL

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Need to parse more than 500 feilds in CSV file using DFDL

Need to parse more than 500 feilds in CSV file using DFDL

« View previous topic :: View next topic »

Author

Message

visasimbu

Posted: Thu Sep 07, 2017 11:14 pm Post subject: Need to parse more than 500 feilds in CSV file using DFDL

Disciple

Joined: 06 Nov 2009
Posts: 171

Need to look for effective way to parse the 500 fields in CSV file using DFDL.

I have input file of 500 fields in CSV file. But i have mapping of only 30 fields. The rest of fields i wont use it in mapping. Looking for best way to parse the incoming fields using DFDL. Creating DFDL with 500 fields.

After going through documentation, i have found option like parse timing which can be set to

Quote:

On Demand

. So that when ever fields is used in code, that field alone will get parsed. Is this the only way i can make ignore the rest of the field parsing ? Or any other items which i need to take care on best use of memory and processing ?

Note - I couldn't suggest source system to instruct to send only 30 fields which i need in IIB.

mqjeff

Posted: Fri Sep 08, 2017 3:04 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

you can model the rest of the fields as blobs.

IF they are all at the end, you can model them as one big blob. If the fields are always filled with spaces, or otherwise of a constant length, you can read the input as a blob, truncate it and then parse that.

If they are at the front, you can model them as one big blob if you know how to find the first one you need. Likewise, if they are all fixed lengths, you can truncate the blog.

If they are mixed in, you can model he miixed elements as blobs. Then use your current on demand parsing.

But note that On Demand parsing will parse everything *up* to the record you want - but only once. I think. Unless I'm wrong.
_________________
chmod -R ugo-wx /

timber

Posted: Fri Sep 08, 2017 5:02 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

On Demand parsing will not help you. That applies to the entire message, not to individual records within the message.

Something like this should work:

1. Model the 30 fields using the CSV wizard. This will define the comma as a separator and the line-end character(s) as a terminator for each record.

2. Open the model in the DFDL editor

3. Edit the generated model as follows:
- wrap a new sequence group around the existing sequence group that contains the 30 records.
- remove the terminator from the original (now the inner) sequence group and put it onto the new, outer sequence group.
- within the *outer* sequence group, add one more string field. Call it 'remainingFields' or something like that. Set its lengthKind property to 'delimited'.

Personally, I find it easiest to do this kind of structural change using an XSD editor, but you may prefer to do it using the DFDL editor.

mqjeff

Posted: Fri Sep 08, 2017 6:18 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

timber wrote:

1. Model the 30 fields using the CSV wizard. This will define the comma as a separator and the line-end character(s) as a terminator for each record.

Does this assume that the 30 fields are located next to each other, and not dispersed across the entire record?

I.e : <bunch of unneeded fields or none><30 fields><bunch of unneeded fields or none>

Instead of
<bunch of unneeded fields or none><field x><bunch of uneeded fields><field Y> ... etc?
_________________
chmod -R ugo-wx /

timber

Posted: Fri Sep 08, 2017 8:18 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

@mqjeff: That's a very good point.

@visasimbu: You will need to model all fields up to and including the last field that you need to map. If you're lucky, that will be the 30th field. If you're very unlucky, that will be the 500th field.
For each unmapped field that you need to model, you can refer to a single global string element 'unmappedField'. If there is a sequence of N unmapped fields then you can set maxOccurs=N to consume them all.

urufberg

Posted: Fri Sep 08, 2017 11:18 am Post subject:

Apprentice

Joined: 08 Sep 2017
Posts: 28

@visasimbu:

I think @timber & @mqjeff answers are pretty much the best way to approach your situation.
I just want to add that the worst case scenario would be if you have one (or more) fields in between of those you're going to use. In that case you will have around 60 fields (I know it's a lot but it's way better than 500 hundred).

I've found myself in this situation before and what I always do is insert a dummyField( 1,2,3...) and set the min and max occurences to the specific number I need. With this solution you avoid creating more than 1 dummyField for each record of fields you're not going to use.

Hope it works for you

rekarm01

Posted: Fri Sep 08, 2017 4:30 pm Post subject: Re: Need to parse more than 500 feilds in CSV file using DFD

Grand Master

Joined: 25 Jun 2008
Posts: 1415

visasimbu wrote:

Need to look for effective way to parse the 500 fields in CSV file using DFDL.

What is the actual goal here? To improve performance, by reducing the memory, CPU, or other resources required to parse a message? Or just to simplify the message model?

If the goal is to improve performance, then just simplifying the message model probably won't help; the parser still has to count delimiters to get to the last field of interest. On demand parsing might help though, if the rest of the message has a significant number of fields that don't need parsing.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Need to parse more than 500 feilds in CSV file using DFDL

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP