MQSeries.net :: View topic - Splitting a large file into smaller chunks with a count

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Splitting a large file into smaller chunks with a count

Splitting a large file into smaller chunks with a count

« View previous topic :: View next topic »

Author

Message

petervh1

Posted: Thu Oct 28, 2021 12:16 am Post subject: Splitting a large file into smaller chunks with a count

Centurion

Joined: 19 Apr 2010
Posts: 123

Hi

IIB 10.0.0.14

Can anyone assist me here? I'm trying to split a large (2.5+ GB) CSV file containing small records (+/- 100 bytes per record) into output file chunks that are not allowed to be larger than 1 GB.

I'm suspect someone will tell me that this is a job for MFTE rather than IIB. If so, I understand that, but would still like to pursue the solution in IIB.

I propose to read the data, parse each record using DFDL and then write one message per record to a queue. Once this process has finished, then start another flow that reads each message from this queue and writes it to a FileOutput node (as discussed elsewhere on this forum).

The only thing I could think of for keeping a count was to write a count value to a DB table every time I write a record to the queue. Then, before I GET each message, read the count from the DB table. If it's = say 10,000,000 then start writing to a new output file.

This probably sounds clunky, and I know that's a lot of DB reads/writes, but I can't think of a better way to do it.

TIA

abhi_thri

Posted: Fri Oct 29, 2021 3:50 am Post subject:

Knight

Joined: 17 Jul 2017
Posts: 516
Location: UK

hi...I assume the 2.5GB file contains independent records? What happens if the DFDL parsing fails for a record for some reason, is the flow configured the backout the whole transaction? If the records are independent ones you could opt for logging the error and continue processing the next records.

Regarding keeping track of the count in the following flow, is it really required to keep the output files around 1GB? As the records are independent ones isn't it possible to keep a much lower count (say 1000 or 10000 records) making it a bit easier for downstream programs to process it. Anyway, you could consider using a shared variable to keep track of the count. i.e use a UPD variable to see the total count required (say 1000) and keep track of the processing using a shared variable send a finish signal when the count reaches 1000.

petervh1

Posted: Fri Oct 29, 2021 5:21 am Post subject:

Centurion

Joined: 19 Apr 2010
Posts: 123

Quote:

What happens if the DFDL parsing fails for a record for some reason, is the flow configured the backout the whole transaction?

Thanks for that - I'll need to put logic in to handle that.

Quote:

Anyway, you could consider using a shared variable to keep track of the count. i.e use a UPD variable to see the total count required (say 1000) and keep track of the processing using a shared variable send a finish signal when the count reaches 1000.

UDP values can't be modified by ESQL, so I think I'm going to have to use a DB.

timber

Posted: Fri Oct 29, 2021 8:43 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1280

UDP != shared variable.

I strongly suggest that you read this: https://www.ibm.com/docs/en/integration-bus/10.0?topic=overview-long-lived-variables

abhi_thri

Posted: Sat Oct 30, 2021 1:37 am Post subject:

Knight

Joined: 17 Jul 2017
Posts: 516
Location: UK

[quote="petervh1"]

Quote:

UDP values can't be modified by ESQL, so I think I'm going to have to use a DB.

hi...No, i meant to use the UDP to set the total desired record count at the flow level and then compare it against the incremented shared variable, i.e shared-var <= UDP value, if you don't want to use an UDP you can define a constant value instead...and have a look at the shared var link timber shared.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Splitting a large file into smaller chunks with a count

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP