Explain why we're not using ZIP #45

jyasskin · 2017-04-12T00:16:56Z

We have some hints in https://w3ctag.github.io/packaging-on-the-web/#intro, but it's not complete, and it needs to appear in the local explainer, not something remote.

Other considerations against zip:

We've seen vulnerabilities caused by validating one copy of a resource but using a different one. Optimize the storage of the resource URLs in the manifest, index, and main content #41 will avoid this for the CBOR-based format.
Zip resources are identified by filename, which isn't the primary key of web resources.
Zip resources don't include response headers.
Lots of details in the format are archaic and wouldn't be used.

We should probably also list reasons in favor of re-using zip so that proponents know we've considered their arguments:

A huge number of other formats are based on zip, so we're unlikely to run into something we can't express.
Existing tools would be able to extract packages.

bmeck · 2017-05-30T17:13:11Z

should make sure that the fact that .zip has duplication of headers that can cause mismatch of local vs central headers is mentioned

rektide · 2017-10-05T03:31:31Z

how many of these arguments also stand against lzo or lz4 or (best) zstd?

i'd also be interested in seeing how much of the raging popular https://github.com/opencontainers/image-spec would make sense to use, vs how much doesn't match up?

jyasskin · 2017-10-17T04:26:13Z

@rektide I believe lzo, lz4, and zstd are better compression algorithms, and none of the arguments are about any quality issues in zip's compression, so ... all of them?

I haven't looked through the opencontainers spec and will do so. Thanks for the pointer.

jyasskin · 2017-12-12T21:54:08Z

Now that I've skimmed https://github.com/opencontainers/image-spec, it seems mostly-inapplicable. https://github.com/opencontainers/image-spec/blob/master/layer.md doesn't appear to support random access because it's a tar file. I'm having trouble finding the primary key of items in the image, but it seems like it's a path, contrary to https://tools.ietf.org/html/draft-yasskin-webpackage-use-cases-00#section-3.1.1.

skhameneh · 2019-01-15T20:22:46Z

lz4 preferred for speed, zstd preferred for ratio.

I am working on a similar packaging format and have a working tool, see
https://github.com/lbryio/lbry-format

I am very interested in cross adoption, @jyasskin 😃

jimmywarting · 2022-12-05T00:44:26Z

While i have been playing with web bundles for a bit i can't say that i grew particularly fund of cbor and the way you are encoding things. you haven't made random access any easy, modifying a web bundle as a way to add/remove and even concat files together haven't been made any easy.

Serving a bit more dynamic generated web bundles on the fly isn't any good at all.

Encoding a web bundle literally requires you to allocate the web bundle in one go with lots of ram being used at once as you have to read the content out of all the file in order to generate some CBOR. it isn't any file streaming friendly at all.

And it also requires a cbor dependency as opposite to just having as simple as built in json or even plain DataView to read/write data in a own format.

i wish it where just some central directory like the zip files have in the end but in the beginning instead... so that i could simply just write the structure of everything and concatinate everything with my files afterwards.

I think it was a misstake to encode the respons payload as an array of [[headers, payload], ...] etc. and then putting it in a section that then later have to be encoding with cbor. after which you would have to decode the hole cbor in one go as cbor decoder is not so stream firendly either in many libraries.

it would be a heck of a lot easier if i could just generate some small cbor directory, write how big it is, then write the cbor directory and then afterwards pipe all my files to the end of the stream. kind of like
new Blob([size, cbor, fileA, fileB, fileC, ...])

and this don't even supported any compressed payload either...

i honestly think i want to stick to regular separate http requests with good caching and do more of code splitting and lazy loading when needed and writing more optimized code instead. and caching things on the fly as needed.
I honestly think i would rather want to use regular zip files instead and then populate the cache storage and using some service worker.

jimmywarting · 2022-12-05T17:07:58Z

fyi if the central cbor dir would have just known the size of a payload instead of the offset then it would be much easier to just simply concat multiple entries together and you could even do some kind of jobs more parallels if you eg would want to compress multiple files in parallel. and you wouldn't have to update the offset index or even calculate how large a cbor payload length is. which is necessary to calculate the offset cuz a byte array can be of different length. a major tag can be anything from 1-3 bytes depending on the size. so you need to some smart as logic

something better could be more like:

var cborSize = new Uint8Array(8)

var [ file1, file2 ] = input.files

var meta1 = { 
  url: origin + file1.name,
  size: file1.size, 
  headers: { ':status': 200, 'content-type': file1.type, 'content-length': file1.size },
  extraFields: { ... }
}
var meta2 = { 
  url: origin + file2.name,
  size: file2.size, 
  headers: { ':status': 200, 'content-type': file2.type, 'content-length': file2.size },
  extraFields: { ... }
}

var cbor = encode([ meta1, meta2 ])
const dv = new DataView(cborSize)
const dv.setBigUint64(0, cbor.byteLength)
const webBundle = new Blob([ cborSize, cbor, file1, file2 ])

☝️ here in this example you

won't even have to read the content of the file.
- no data dose ever have to be allocated for the files. as the new blob would simply just hold references points to the file paths on the users disc.
you don't have to re-calculate the byte offset every time, it would be just as simple as how Tar concats multiple gzip'ed files together.
many cbor parser/generators arn't stream friendly in regards to files. or even doing partial cbor decoding.
- polyfilling something like web bundle would be hard as you would have to manually step into the cbor parsers and read one token at the time and then decide what you want to do with it afterwards. should it continue to read the next following bits or not? most encode/decoder would simply just have one single encode/decode function and no way of manually stepping. you would need the hole byte array and decode it in one go...
and getting a response chunk would be as simple as just doing something like webbundle.slice(start, end)

the file payload should have been kept apart from the hole cbor response payload... it would have made the parsing of the central cbor structure more easier and allow for more random access more easier and then making random byte range requests to the things you need

also an option to have the central directory in either the end or in the beginning would be neat if you want to dynamically generate a web bundle where you don't know the number of request as of yet.

jimmywarting · 2022-12-31T11:34:26Z

I'm only wondering if you took any consideration into choosing zip's extra custom fields? it have the option to add in extra fields such as response headers. request url and more that's needed

jyasskin added the needs spec The issue has agreement, but someone needs to add it to the specification. label May 13, 2019

gaycodegal mentioned this issue Dec 31, 2022

Convert from zip to web pack / bundle format online #840

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain why we're not using ZIP #45

Explain why we're not using ZIP #45

jyasskin commented Apr 12, 2017 •

edited

bmeck commented May 30, 2017

rektide commented Oct 5, 2017

jyasskin commented Oct 17, 2017

jyasskin commented Dec 12, 2017

skhameneh commented Jan 15, 2019

jimmywarting commented Dec 5, 2022 •

edited

jimmywarting commented Dec 5, 2022 •

edited

jimmywarting commented Dec 31, 2022

Explain why we're not using ZIP #45

Explain why we're not using ZIP #45

Comments

jyasskin commented Apr 12, 2017 • edited

bmeck commented May 30, 2017

rektide commented Oct 5, 2017

jyasskin commented Oct 17, 2017

jyasskin commented Dec 12, 2017

skhameneh commented Jan 15, 2019

jimmywarting commented Dec 5, 2022 • edited

jimmywarting commented Dec 5, 2022 • edited

jimmywarting commented Dec 31, 2022

jyasskin commented Apr 12, 2017 •

edited

jimmywarting commented Dec 5, 2022 •

edited

jimmywarting commented Dec 5, 2022 •

edited