|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Splitting a large file into smaller chunks with a count |
« View previous topic :: View next topic » |
Author |
Message
|
petervh1 |
Posted: Thu Oct 28, 2021 12:16 am Post subject: Splitting a large file into smaller chunks with a count |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
Hi
IIB 10.0.0.14
Can anyone assist me here? I'm trying to split a large (2.5+ GB) CSV file containing small records (+/- 100 bytes per record) into output file chunks that are not allowed to be larger than 1 GB.
I'm suspect someone will tell me that this is a job for MFTE rather than IIB. If so, I understand that, but would still like to pursue the solution in IIB.
I propose to read the data, parse each record using DFDL and then write one message per record to a queue. Once this process has finished, then start another flow that reads each message from this queue and writes it to a FileOutput node (as discussed elsewhere on this forum).
The only thing I could think of for keeping a count was to write a count value to a DB table every time I write a record to the queue. Then, before I GET each message, read the count from the DB table. If it's = say 10,000,000 then start writing to a new output file.
This probably sounds clunky, and I know that's a lot of DB reads/writes, but I can't think of a better way to do it.
TIA |
|
Back to top |
|
 |
abhi_thri |
Posted: Fri Oct 29, 2021 3:50 am Post subject: |
|
|
 Knight
Joined: 17 Jul 2017 Posts: 516 Location: UK
|
hi...I assume the 2.5GB file contains independent records? What happens if the DFDL parsing fails for a record for some reason, is the flow configured the backout the whole transaction? If the records are independent ones you could opt for logging the error and continue processing the next records.
Regarding keeping track of the count in the following flow, is it really required to keep the output files around 1GB? As the records are independent ones isn't it possible to keep a much lower count (say 1000 or 10000 records) making it a bit easier for downstream programs to process it. Anyway, you could consider using a shared variable to keep track of the count. i.e use a UPD variable to see the total count required (say 1000) and keep track of the processing using a shared variable send a finish signal when the count reaches 1000. |
|
Back to top |
|
 |
petervh1 |
Posted: Fri Oct 29, 2021 5:21 am Post subject: |
|
|
Centurion
Joined: 19 Apr 2010 Posts: 135
|
Quote: |
What happens if the DFDL parsing fails for a record for some reason, is the flow configured the backout the whole transaction? |
Thanks for that - I'll need to put logic in to handle that.
Quote: |
Anyway, you could consider using a shared variable to keep track of the count. i.e use a UPD variable to see the total count required (say 1000) and keep track of the processing using a shared variable send a finish signal when the count reaches 1000. |
UDP values can't be modified by ESQL, so I think I'm going to have to use a DB. |
|
Back to top |
|
 |
timber |
Posted: Fri Oct 29, 2021 8:43 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
|
Back to top |
|
 |
abhi_thri |
Posted: Sat Oct 30, 2021 1:37 am Post subject: |
|
|
 Knight
Joined: 17 Jul 2017 Posts: 516 Location: UK
|
[quote="petervh1"]
Quote: |
UDP values can't be modified by ESQL, so I think I'm going to have to use a DB. |
hi...No, i meant to use the UDP to set the total desired record count at the flow level and then compare it against the incremented shared variable, i.e shared-var <= UDP value, if you don't want to use an UDP you can define a constant value instead...and have a look at the shared var link timber shared. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|