MQSeries.net :: View topic - Opaque parsing in DFDL

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Opaque parsing in DFDL

Opaque parsing in DFDL

« View previous topic :: View next topic »

Author

Message

ghoshly

Posted: Tue Aug 15, 2017 10:23 am Post subject: Opaque parsing in DFDL

Partisan

Joined: 10 Jan 2008
Posts: 333

Hello,

My question is : Is there something like Opaque parsing possible when using DFDL, or I need to play with On-Demand parsing to avoid complete parsing?

I'll elaborate my question to be clear as much as I can.

Input would be flat files which I need to parse dynamically as different format is possible. I can restrict the end system to produce a first line with metadata information and I want to parse only the metadata first to identify the proper message set / schema dynamically. Do I need to define two separate schema for the metadata and actual data or I can do with one also.

Please let me know if anymore details is required to provide suggestion.

mqjeff

Posted: Tue Aug 15, 2017 12:35 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

DFDL is a back-tracing parser. It will attempt to parse the first thing you give it in a single model, until it fails. It will then back track and try with the next thing.

It will stop when it parses something successfully.

It sounds like your model is pretty straight forward. There's likely no good reason to use metadata to decide what format/schema is being used.

But that is a choice, as is how you construct your model.
_________________
chmod -R ugo-wx /

timber

Posted: Wed Aug 16, 2017 1:50 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Opaque parsing is not what you need - that's for skipping parts of the model that you *never* want to parse. What you are asking about is discrimination - how to guide the parser to the correct branch of the model.

As mqjeff implies, you could use a single model with a choice. DFDL can use various techniques for working out which choice branch to take (all documented in the specification). So if the different formats can be distinguished then you don't even need the metadata line at the start, although it might be good practice to include it anyway.

Did you consider using a choice, and decide against it?

ghoshly

Posted: Wed Aug 16, 2017 3:51 am Post subject:

Partisan

Joined: 10 Jan 2008
Posts: 333

Hello -

Thanks for your responses.

I wanted to skip the actual data part and only parse the metadata first so that I can segregate them for different actions (Different processing / transformation for different interfaces)

I am naive to DFDL creation and I'll try to use Choice and do it using a single model as you guided. Can there be any performance bottleneck for using this technique where I can expect 25 - 30K files per day each maxed to few MB with mixed types? I hope there will be other remediation as well if there is issue at all.

fjb_saper

Posted: Wed Aug 16, 2017 4:21 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

You could use 2 schemas:
One that defines the metadata and a "generic" record type the details of which will not get parsed, one that parses everything.
You could then use the first one for routing and the second one for processing

Have fun

_________________
MQ & Broker admin

mqjeff

Posted: Wed Aug 16, 2017 4:32 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

... If you're reading from files, and you tell it to a BLOB parser, then you could create a small model that only matches your metadata and substring out that part and parse just that.

But it's probably a lot easier to either a) put files for different interfaces into different folders, b) use file names that indicate the type of processing, c) both and then use that data, easily available in the local environment, to make your decisions.

But you should also have separate flows for each "group" of processing. Then you can get reliable parallel processing of the files and one slowdown or mistake of one flow won't affect the others.

Given the above, I don't really see the value of the metadata.
_________________
chmod -R ugo-wx /

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Opaque parsing in DFDL

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP