ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Need to parse more than 500 feilds in CSV file using DFDL

Post new topic  Reply to topic
 Need to parse more than 500 feilds in CSV file using DFDL « View previous topic :: View next topic » 
Author Message
visasimbu
PostPosted: Thu Sep 07, 2017 11:14 pm    Post subject: Need to parse more than 500 feilds in CSV file using DFDL Reply with quote

Disciple

Joined: 06 Nov 2009
Posts: 171

Need to look for effective way to parse the 500 fields in CSV file using DFDL.

I have input file of 500 fields in CSV file. But i have mapping of only 30 fields. The rest of fields i wont use it in mapping. Looking for best way to parse the incoming fields using DFDL. Creating DFDL with 500 fields.

After going through documentation, i have found option like parse timing which can be set to
Quote:
On Demand
. So that when ever fields is used in code, that field alone will get parsed. Is this the only way i can make ignore the rest of the field parsing ? Or any other items which i need to take care on best use of memory and processing ?

Note - I couldn't suggest source system to instruct to send only 30 fields which i need in IIB.
Back to top
View user's profile Send private message Send e-mail
mqjeff
PostPosted: Fri Sep 08, 2017 3:04 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

you can model the rest of the fields as blobs.

IF they are all at the end, you can model them as one big blob. If the fields are always filled with spaces, or otherwise of a constant length, you can read the input as a blob, truncate it and then parse that.

If they are at the front, you can model them as one big blob if you know how to find the first one you need. Likewise, if they are all fixed lengths, you can truncate the blog.

If they are mixed in, you can model he miixed elements as blobs. Then use your current on demand parsing.

But note that On Demand parsing will parse everything *up* to the record you want - but only once. I think. Unless I'm wrong.
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
timber
PostPosted: Fri Sep 08, 2017 5:02 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

On Demand parsing will not help you. That applies to the entire message, not to individual records within the message.

Something like this should work:

1. Model the 30 fields using the CSV wizard. This will define the comma as a separator and the line-end character(s) as a terminator for each record.

2. Open the model in the DFDL editor

3. Edit the generated model as follows:
- wrap a new sequence group around the existing sequence group that contains the 30 records.
- remove the terminator from the original (now the inner) sequence group and put it onto the new, outer sequence group.
- within the *outer* sequence group, add one more string field. Call it 'remainingFields' or something like that. Set its lengthKind property to 'delimited'.

Personally, I find it easiest to do this kind of structural change using an XSD editor, but you may prefer to do it using the DFDL editor.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Fri Sep 08, 2017 6:18 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

timber wrote:
1. Model the 30 fields using the CSV wizard. This will define the comma as a separator and the line-end character(s) as a terminator for each record.


Does this assume that the 30 fields are located next to each other, and not dispersed across the entire record?

I.e : <bunch of unneeded fields or none><30 fields><bunch of unneeded fields or none>

Instead of
<bunch of unneeded fields or none><field x><bunch of uneeded fields><field Y> ... etc?
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
timber
PostPosted: Fri Sep 08, 2017 8:18 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

@mqjeff: That's a very good point.

@visasimbu: You will need to model all fields up to and including the last field that you need to map. If you're lucky, that will be the 30th field. If you're very unlucky, that will be the 500th field.
For each unmapped field that you need to model, you can refer to a single global string element 'unmappedField'. If there is a sequence of N unmapped fields then you can set maxOccurs=N to consume them all.
Back to top
View user's profile Send private message
urufberg
PostPosted: Fri Sep 08, 2017 11:18 am    Post subject: Reply with quote

Apprentice

Joined: 08 Sep 2017
Posts: 28

@visasimbu:

I think @timber & @mqjeff answers are pretty much the best way to approach your situation.
I just want to add that the worst case scenario would be if you have one (or more) fields in between of those you're going to use. In that case you will have around 60 fields (I know it's a lot but it's way better than 500 hundred).

I've found myself in this situation before and what I always do is insert a dummyField( 1,2,3...) and set the min and max occurences to the specific number I need. With this solution you avoid creating more than 1 dummyField for each record of fields you're not going to use.

Hope it works for you
Back to top
View user's profile Send private message
rekarm01
PostPosted: Fri Sep 08, 2017 4:30 pm    Post subject: Re: Need to parse more than 500 feilds in CSV file using DFD Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

visasimbu wrote:
Need to look for effective way to parse the 500 fields in CSV file using DFDL.

What is the actual goal here? To improve performance, by reducing the memory, CPU, or other resources required to parse a message? Or just to simplify the message model?

If the goal is to improve performance, then just simplifying the message model probably won't help; the parser still has to count delimiters to get to the last field of interest. On demand parsing might help though, if the rest of the message has a significant number of fields that don't need parsing.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Need to parse more than 500 feilds in CSV file using DFDL
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.