I think I read years ago that the design accounts for media such as tape or optical disks that are write once or at least very slow to go back to the start, along with the complexity of figuring out how much space must be left on such systems. I believe the format also allows for spanning multiple physical disks as a multi part archive.
travistrue said:
Although I do find that interesting that the ZIP format puts the index information last instead of at the beginning of the file because read scenarios will be more common, if not, even more common than write scenarios.
Even if reading is expected to be common, I think really it's just a slightly more complex but of programming to read (probably more than made up for by the easier programming to write). Other than maybe on tape (where I guess you lose half the time with any format) I don't think reading blocks from the end has a significant penalty, and in practice most of the time will be spent reading and decompressing the actual entries.
The local headers I am less sure on the reason given. One thing that comes to mind is they allow for a good degree of data recovery of otherwise intact data if the central directory at the end is lost or damaged.
travistrue said:
Are there any known archive formats that can be written to in parallel? Maybe that could help with the large file sizes. I'm not sure how (or if) that could work in a practical sense.
I can't think of one. It's the sort of thing filesystems and databases do, generally by having a lot of extra space to play with as they don't try and compact everything as much as possible. At best if you write blocks from multiple “streams” as they come while you might get good “sequential” write performance, you will horribly fragment the contents which would be a massive impact on reading single files back out, and at best make efficiently extracting the entire archive difficult.
More practically, there are compression algorithms that can use multiple threads, or you could maybe make use of the large amounts of RAM. Maybe something only for SSD's would be possible due to the lower read seek penalties, and buffering multiple commands, but never seen anything.