ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing a CSV file with variable number of columns

Post new topic  Reply to topic
 Parsing a CSV file with variable number of columns « View previous topic :: View next topic » 
Author Message
KJCB
PostPosted: Thu Feb 01, 2018 2:38 am    Post subject: Parsing a CSV file with variable number of columns Reply with quote

Apprentice

Joined: 09 Dec 2017
Posts: 26

Hi all,

I need to parse and process a CSV File downloaded from an external system which contains a variable number of columns, in order to send the information to another system through a web service.
As mentioned, it has a variable number of columns, and I just need to send some of them to the service. Moreover, they are allocated in different positions.

Let me explain myself. For example, the first time I download the file it has the following appearance:

"Column 1","Column 2", "Column 3"

And I just need to send Columns 1 and 3 through the web service.

The second time I download the file it has the following appearance:

"Column 1","Column 3","Column 4".

In this case, I don't need column 4 information for anything.


Any idea on how to archive this requirement? It's seems that creating a DFDL model for CSV files is not the best option...

[/u]
Back to top
View user's profile Send private message
souciance
PostPosted: Thu Feb 01, 2018 3:24 am    Post subject: Reply with quote

Disciple

Joined: 29 Jun 2010
Posts: 169

Why wouldn't DFDL work? It has support for managing variable number of columns if you there is a proper delimiter. Then just extract the columns you want and send those.

Off course, how will you keep track of a file has been sent or not and then make a decision next time a file has come?
Back to top
View user's profile Send private message
KJCB
PostPosted: Thu Feb 01, 2018 5:57 am    Post subject: Reply with quote

Apprentice

Joined: 09 Dec 2017
Posts: 26

souciance wrote:
Why wouldn't DFDL work? It has support for managing variable number of columns if you there is a proper delimiter. Then just extract the columns you want and send those.

Off course, how will you keep track of a file has been sent or not and then make a decision next time a file has come?


Hi,

I'm using IIB v9.0 and when creating the DFDL it asks me for the number of columns. I can't see how to fit this to my requirement.

Regards,
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Feb 01, 2018 6:04 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

KJCB wrote:
I'm using IIB v9.0 and when creating the DFDL it asks me for the number of columns.


So the CSV wizard expects a fixed number of columns, because that's the most common case for a CSV file.

This has nothing to do with DFDL's ability to model such a file, it just means the wizard isn't appropriate and you'll have to build it yourself. Using proper delimiters and (probably) discriminators.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
KJCB
PostPosted: Thu Feb 01, 2018 6:17 am    Post subject: Reply with quote

Apprentice

Joined: 09 Dec 2017
Posts: 26

Vitor wrote:
KJCB wrote:
I'm using IIB v9.0 and when creating the DFDL it asks me for the number of columns.


So the CSV wizard expects a fixed number of columns, because that's the most common case for a CSV file.

This has nothing to do with DFDL's ability to model such a file, it just means the wizard isn't appropriate and you'll have to build it yourself. Using proper delimiters and (probably) discriminators.


Hi,

I understand I must create a new empty DFDL schema...or how?

I'm lost
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Feb 01, 2018 6:38 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

KJCB wrote:
I understand I must create a new empty DFDL schema...or how?


It's one of the other options to "use a wizard"

KJCB wrote:
I'm lost


Once you've used the wizard, you are dropped into the DFDL editor with a pre-populated DFDL model based on the answers you gave the wizard. If you don't use that, you are dropped into the DFDL editor with a blank model.

One possible option (and I don't claim this to be the best one) would be to have the wizard create a CSV model with the maximum number of columns, and then edit it the model to deal with the optional columns. This really depends how variable the file layout is; if it's highly variable I'd be inclined to build the model long hand.

But that's just me.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Feb 01, 2018 7:07 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

An other alternative would be to look at the actual layout. If you always have at a minimum 3 columns ....
What you really need here is a first line or "header" row describing the column content. You can then choose the content that is relevant for you and only extract the columns for which that content is relevant....
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
KJCB
PostPosted: Thu Feb 01, 2018 7:30 am    Post subject: Reply with quote

Apprentice

Joined: 09 Dec 2017
Posts: 26

fjb_saper wrote:
What you really need here is a first line or "header" row describing the column content.


I can't configure the header row in my DFDL as I can't predict the column order.

fjb_saper wrote:
You can then choose the content that is relevant for you and only extract the columns for which that content is relevant....


How can I archive this?
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Feb 01, 2018 8:08 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

KJCB wrote:
How can I archive this?


I suggested achieving this with discriminators a bit further up.

Other solutions are undoubtedly possible and may be better.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Feb 01, 2018 8:14 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

One other tack you could take is asking whoever's producing this file what controls which columns are included and their order (seriously - "Column 1" can be either before or after "Column 3??) and if there's any way this can be determined or included - perhaps a prefix to the file name?

Another tack would be to look at it as a human - when you look at the file, how do you determine which column is where and which ones you need? From that, how would you encode that into a DFDL model?

One final point is to remember that the DFDL is not the silver bullet of data parsing. There may come a point where DFDL can only go so far and to finally achieve your end you may need to introspect the message with code. But I would urge you to go as far as you can with DFDL; it's far and away the best place to do this, and it sounds to me like DFDL can handle it, albeit with a fairly extensive model.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
KJCB
PostPosted: Thu Feb 01, 2018 8:50 am    Post subject: Reply with quote

Apprentice

Joined: 09 Dec 2017
Posts: 26

Vitor wrote:
One other tack you could take is asking whoever's producing this file what controls which columns are included and their order (seriously - "Column 1" can be either before or after "Column 3??) and if there's any way this can be determined or included - perhaps a prefix to the file name?

Another tack would be to look at it as a human - when you look at the file, how do you determine which column is where and which ones you need? From that, how would you encode that into a DFDL model?

One final point is to remember that the DFDL is not the silver bullet of data parsing. There may come a point where DFDL can only go so far and to finally achieve your end you may need to introspect the message with code. But I would urge you to go as far as you can with DFDL; it's far and away the best place to do this, and it sounds to me like DFDL can handle it, albeit with a fairly extensive model.



Hi All,

Thanks for your help.

Just an scenario:
- I have these three columns:
ProjectTitle, TargetDepartment and TargetCompany

Column "TargetDepartment" can be present or not. If not present, it doesn't appear as empty in the CSV (it is removed when exported, so the other two columns are together).

Question 1: How can I configure my DFDL to include this scenario? I guess I must set property minOccurs to 0, and which other property do I have to change?

Question 2: I only want to extract columns ProjectTitle and TargetCompany. The others are not important for me. Is it possible to model this scenario too? How?

Thanks again for your help.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Feb 01, 2018 9:25 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

If the order of the columns is not predictable, your header row should just have something like
column_name , occurs 1 to n for the fields.

So you would need to read the header row and analyze it to find which columns you want to extract...

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
timber
PostPosted: Fri Feb 02, 2018 9:30 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

Quote:
- I have these three columns:
ProjectTitle, TargetDepartment and TargetCompany

Column "TargetDepartment" can be present or not. If not present, it doesn't appear as empty in the CSV (it is removed when exported, so the other two columns are together).
If you or some other engineer looked at a random example of this file, how you you know whether the TargetDepartment field was present or not? If you can answer that question, you will know how to solve the problem.

Based on what you've said so far, one option would be to parse each line as an array of columns and take the first and last columns from each row. That would be simple enough in DFDL and in ESQL.
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Feb 02, 2018 9:44 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

timber wrote:
Quote:
- I have these three columns:
ProjectTitle, TargetDepartment and TargetCompany

Column "TargetDepartment" can be present or not. If not present, it doesn't appear as empty in the CSV (it is removed when exported, so the other two columns are together).
If you or some other engineer looked at a random example of this file, how you you know whether the TargetDepartment field was present or not? If you can answer that question, you will know how to solve the problem.


Vitor wrote:
Another tack would be to look at it as a human - when you look at the file, how do you determine which column is where and which ones you need?

_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
KJCB
PostPosted: Thu Feb 15, 2018 8:50 am    Post subject: Reply with quote

Apprentice

Joined: 09 Dec 2017
Posts: 26

Hi All,

Finally we applied another approach: a map/reduce implementation that extracts the information we want from the "unpredictable" CSV file received, generating a fixed XML as output with the information we want, which is configured dynamically based in a properties file.

That way we gained independence from the system generating the CSV files and the Integration Bus got its closed schema.

Thank you.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing a CSV file with variable number of columns
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.