MQSeries.net :: View topic - Parsing a CSV file with variable number of columns

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing a CSV file with variable number of columns

Parsing a CSV file with variable number of columns

« View previous topic :: View next topic »

Author

Message

KJCB

Posted: Thu Feb 01, 2018 2:38 am Post subject: Parsing a CSV file with variable number of columns

Apprentice

Joined: 09 Dec 2017
Posts: 26

Hi all,

I need to parse and process a CSV File downloaded from an external system which contains a variable number of columns, in order to send the information to another system through a web service.
As mentioned, it has a variable number of columns, and I just need to send some of them to the service. Moreover, they are allocated in different positions.

Let me explain myself. For example, the first time I download the file it has the following appearance:

"Column 1","Column 2", "Column 3"

And I just need to send Columns 1 and 3 through the web service.

The second time I download the file it has the following appearance:

"Column 1","Column 3","Column 4".

In this case, I don't need column 4 information for anything.

Any idea on how to archive this requirement? It's seems that creating a DFDL model for CSV files is not the best option...

[/u]

souciance

Posted: Thu Feb 01, 2018 3:24 am Post subject:

Disciple

Joined: 29 Jun 2010
Posts: 169

Why wouldn't DFDL work? It has support for managing variable number of columns if you there is a proper delimiter. Then just extract the columns you want and send those.

Off course, how will you keep track of a file has been sent or not and then make a decision next time a file has come?

KJCB

Posted: Thu Feb 01, 2018 5:57 am Post subject:

Apprentice

Joined: 09 Dec 2017
Posts: 26

souciance wrote:

Hi,

I'm using IIB v9.0 and when creating the DFDL it asks me for the number of columns. I can't see how to fit this to my requirement.

Regards,

Vitor

Posted: Thu Feb 01, 2018 6:04 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

KJCB wrote:

I'm using IIB v9.0 and when creating the DFDL it asks me for the number of columns.

So the CSV wizard expects a fixed number of columns, because that's the most common case for a CSV file.

This has nothing to do with DFDL's ability to model such a file, it just means the wizard isn't appropriate and you'll have to build it yourself. Using proper delimiters and (probably) discriminators.
_________________
Honesty is the best policy.
Insanity is the best defence.

KJCB

Posted: Thu Feb 01, 2018 6:17 am Post subject:

Apprentice

Joined: 09 Dec 2017
Posts: 26

Vitor wrote:

KJCB wrote:

I'm using IIB v9.0 and when creating the DFDL it asks me for the number of columns.

Hi,

I understand I must create a new empty DFDL schema...or how?

I'm lost

Vitor

Posted: Thu Feb 01, 2018 6:38 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

KJCB wrote:

I understand I must create a new empty DFDL schema...or how?

It's one of the other options to "use a wizard"

KJCB wrote:

I'm lost

Once you've used the wizard, you are dropped into the DFDL editor with a pre-populated DFDL model based on the answers you gave the wizard. If you don't use that, you are dropped into the DFDL editor with a blank model.

One possible option (and I don't claim this to be the best one) would be to have the wizard create a CSV model with the maximum number of columns, and then edit it the model to deal with the optional columns. This really depends how variable the file layout is; if it's highly variable I'd be inclined to build the model long hand.

But that's just me.
_________________
Honesty is the best policy.
Insanity is the best defence.

fjb_saper

Posted: Thu Feb 01, 2018 7:07 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

An other alternative would be to look at the actual layout. If you always have at a minimum 3 columns ....
What you really need here is a first line or "header" row describing the column content. You can then choose the content that is relevant for you and only extract the columns for which that content is relevant....

_________________
MQ & Broker admin

KJCB

Posted: Thu Feb 01, 2018 7:30 am Post subject:

Apprentice

Joined: 09 Dec 2017
Posts: 26

fjb_saper wrote:

What you really need here is a first line or "header" row describing the column content.

I can't configure the header row in my DFDL as I can't predict the column order.

fjb_saper wrote:

You can then choose the content that is relevant for you and only extract the columns for which that content is relevant....

How can I archive this?

Vitor

Posted: Thu Feb 01, 2018 8:08 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

KJCB wrote:

How can I archive this?

I suggested achieving this with discriminators a bit further up.

Other solutions are undoubtedly possible and may be better.
_________________
Honesty is the best policy.
Insanity is the best defence.

Vitor

Posted: Thu Feb 01, 2018 8:14 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

One other tack you could take is asking whoever's producing this file what controls which columns are included and their order (seriously - "Column 1" can be either before or after "Column 3??) and if there's any way this can be determined or included - perhaps a prefix to the file name?

Another tack would be to look at it as a human - when you look at the file, how do you determine which column is where and which ones you need? From that, how would you encode that into a DFDL model?

One final point is to remember that the DFDL is not the silver bullet of data parsing. There may come a point where DFDL can only go so far and to finally achieve your end you may need to introspect the message with code. But I would urge you to go as far as you can with DFDL; it's far and away the best place to do this, and it sounds to me like DFDL can handle it, albeit with a fairly extensive model.
_________________
Honesty is the best policy.
Insanity is the best defence.

KJCB

Posted: Thu Feb 01, 2018 8:50 am Post subject:

Apprentice

Joined: 09 Dec 2017
Posts: 26

Vitor wrote:

Hi All,

Thanks for your help.

Just an scenario:
- I have these three columns:
ProjectTitle, TargetDepartment and TargetCompany

Column "TargetDepartment" can be present or not. If not present, it doesn't appear as empty in the CSV (it is removed when exported, so the other two columns are together).

Question 1: How can I configure my DFDL to include this scenario? I guess I must set property minOccurs to 0, and which other property do I have to change?

Question 2: I only want to extract columns ProjectTitle and TargetCompany. The others are not important for me. Is it possible to model this scenario too? How?

Thanks again for your help.

fjb_saper

Posted: Thu Feb 01, 2018 9:25 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

If the order of the columns is not predictable, your header row should just have something like
column_name , occurs 1 to n for the fields.

So you would need to read the header row and analyze it to find which columns you want to extract...

Have fun

_________________
MQ & Broker admin

timber

Posted: Fri Feb 02, 2018 9:30 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

- I have these three columns:
ProjectTitle, TargetDepartment and TargetCompany

Column "TargetDepartment" can be present or not. If not present, it doesn't appear as empty in the CSV (it is removed when exported, so the other two columns are together).

If you or some other engineer looked at a random example of this file, how you you know whether the TargetDepartment field was present or not? If you can answer that question, you will know how to solve the problem.

Based on what you've said so far, one option would be to parse each line as an array of columns and take the first and last columns from each row. That would be simple enough in DFDL and in ESQL.

Vitor

Posted: Fri Feb 02, 2018 9:44 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

timber wrote:

Quote:

Vitor wrote:

Another tack would be to look at it as a human - when you look at the file, how do you determine which column is where and which ones you need?

_________________
Honesty is the best policy.
Insanity is the best defence.

KJCB

Posted: Thu Feb 15, 2018 8:50 am Post subject:

Apprentice

Joined: 09 Dec 2017
Posts: 26

Hi All,

Finally we applied another approach: a map/reduce implementation that extracts the information we want from the "unpredictable" CSV file received, generating a fixed XML as output with the information we want, which is configured dynamically based in a properties file.

That way we gained independence from the system generating the CSV files and the Integration Bus got its closed schema.

Thank you.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing a CSV file with variable number of columns

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP