ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Splitting a large file into smaller chunks with a count

Post new topic  Reply to topic
 Splitting a large file into smaller chunks with a count « View previous topic :: View next topic » 
Author Message
petervh1
PostPosted: Thu Oct 28, 2021 12:16 am    Post subject: Splitting a large file into smaller chunks with a count Reply with quote

Centurion

Joined: 19 Apr 2010
Posts: 123

Hi

IIB 10.0.0.14

Can anyone assist me here? I'm trying to split a large (2.5+ GB) CSV file containing small records (+/- 100 bytes per record) into output file chunks that are not allowed to be larger than 1 GB.

I'm suspect someone will tell me that this is a job for MFTE rather than IIB. If so, I understand that, but would still like to pursue the solution in IIB.

I propose to read the data, parse each record using DFDL and then write one message per record to a queue. Once this process has finished, then start another flow that reads each message from this queue and writes it to a FileOutput node (as discussed elsewhere on this forum).

The only thing I could think of for keeping a count was to write a count value to a DB table every time I write a record to the queue. Then, before I GET each message, read the count from the DB table. If it's = say 10,000,000 then start writing to a new output file.

This probably sounds clunky, and I know that's a lot of DB reads/writes, but I can't think of a better way to do it.

TIA
Back to top
View user's profile Send private message
abhi_thri
PostPosted: Fri Oct 29, 2021 3:50 am    Post subject: Reply with quote

Knight

Joined: 17 Jul 2017
Posts: 516
Location: UK

hi...I assume the 2.5GB file contains independent records? What happens if the DFDL parsing fails for a record for some reason, is the flow configured the backout the whole transaction? If the records are independent ones you could opt for logging the error and continue processing the next records.

Regarding keeping track of the count in the following flow, is it really required to keep the output files around 1GB? As the records are independent ones isn't it possible to keep a much lower count (say 1000 or 10000 records) making it a bit easier for downstream programs to process it. Anyway, you could consider using a shared variable to keep track of the count. i.e use a UPD variable to see the total count required (say 1000) and keep track of the processing using a shared variable send a finish signal when the count reaches 1000.
Back to top
View user's profile Send private message
petervh1
PostPosted: Fri Oct 29, 2021 5:21 am    Post subject: Reply with quote

Centurion

Joined: 19 Apr 2010
Posts: 123

Quote:
What happens if the DFDL parsing fails for a record for some reason, is the flow configured the backout the whole transaction?



Thanks for that - I'll need to put logic in to handle that.

Quote:
Anyway, you could consider using a shared variable to keep track of the count. i.e use a UPD variable to see the total count required (say 1000) and keep track of the processing using a shared variable send a finish signal when the count reaches 1000.


UDP values can't be modified by ESQL, so I think I'm going to have to use a DB.
Back to top
View user's profile Send private message
timber
PostPosted: Fri Oct 29, 2021 8:43 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

UDP != shared variable.

I strongly suggest that you read this: https://www.ibm.com/docs/en/integration-bus/10.0?topic=overview-long-lived-variables
Back to top
View user's profile Send private message
abhi_thri
PostPosted: Sat Oct 30, 2021 1:37 am    Post subject: Reply with quote

Knight

Joined: 17 Jul 2017
Posts: 516
Location: UK

[quote="petervh1"]
Quote:

UDP values can't be modified by ESQL, so I think I'm going to have to use a DB.


hi...No, i meant to use the UDP to set the total desired record count at the flow level and then compare it against the incremented shared variable, i.e shared-var <= UDP value, if you don't want to use an UDP you can define a constant value instead...and have a look at the shared var link timber shared.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Splitting a large file into smaller chunks with a count
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.