Author |
Message
|
ghoshly |
Posted: Tue Aug 15, 2017 10:23 am Post subject: Opaque parsing in DFDL |
|
|
Partisan
Joined: 10 Jan 2008 Posts: 333
|
Hello,
My question is : Is there something like Opaque parsing possible when using DFDL, or I need to play with On-Demand parsing to avoid complete parsing?
I'll elaborate my question to be clear as much as I can.
Input would be flat files which I need to parse dynamically as different format is possible. I can restrict the end system to produce a first line with metadata information and I want to parse only the metadata first to identify the proper message set / schema dynamically. Do I need to define two separate schema for the metadata and actual data or I can do with one also.
Please let me know if anymore details is required to provide suggestion. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Aug 15, 2017 12:35 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
DFDL is a back-tracing parser. It will attempt to parse the first thing you give it in a single model, until it fails. It will then back track and try with the next thing.
It will stop when it parses something successfully.
It sounds like your model is pretty straight forward. There's likely no good reason to use metadata to decide what format/schema is being used.
But that is a choice, as is how you construct your model. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
timber |
Posted: Wed Aug 16, 2017 1:50 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Opaque parsing is not what you need - that's for skipping parts of the model that you *never* want to parse. What you are asking about is discrimination - how to guide the parser to the correct branch of the model.
As mqjeff implies, you could use a single model with a choice. DFDL can use various techniques for working out which choice branch to take (all documented in the specification). So if the different formats can be distinguished then you don't even need the metadata line at the start, although it might be good practice to include it anyway.
Did you consider using a choice, and decide against it? |
|
Back to top |
|
 |
ghoshly |
Posted: Wed Aug 16, 2017 3:51 am Post subject: |
|
|
Partisan
Joined: 10 Jan 2008 Posts: 333
|
Hello -
Thanks for your responses.
I wanted to skip the actual data part and only parse the metadata first so that I can segregate them for different actions (Different processing / transformation for different interfaces)
I am naive to DFDL creation and I'll try to use Choice and do it using a single model as you guided. Can there be any performance bottleneck for using this technique where I can expect 25 - 30K files per day each maxed to few MB with mixed types? I hope there will be other remediation as well if there is issue at all. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Aug 16, 2017 4:21 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
You could use 2 schemas:
One that defines the metadata and a "generic" record type the details of which will not get parsed, one that parses everything.
You could then use the first one for routing and the second one for processing
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Aug 16, 2017 4:32 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
... If you're reading from files, and you tell it to a BLOB parser, then you could create a small model that only matches your metadata and substring out that part and parse just that.
But it's probably a lot easier to either a) put files for different interfaces into different folders, b) use file names that indicate the type of processing, c) both and then use that data, easily available in the local environment, to make your decisions.
But you should also have separate flows for each "group" of processing. Then you can get reliable parallel processing of the files and one slowdown or mistake of one flow won't affect the others.
Given the above, I don't really see the value of the metadata. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
|