Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain why we're not using ZIP #45

Open
jyasskin opened this issue Apr 12, 2017 · 8 comments
Open

Explain why we're not using ZIP #45

jyasskin opened this issue Apr 12, 2017 · 8 comments
Labels
needs spec The issue has agreement, but someone needs to add it to the specification.

Comments

@jyasskin
Copy link
Member

jyasskin commented Apr 12, 2017

We have some hints in https://w3ctag.github.io/packaging-on-the-web/#intro, but it's not complete, and it needs to appear in the local explainer, not something remote.

Other considerations against zip:

We should probably also list reasons in favor of re-using zip so that proponents know we've considered their arguments:

  • A huge number of other formats are based on zip, so we're unlikely to run into something we can't express.
  • Existing tools would be able to extract packages.
@bmeck
Copy link
Collaborator

bmeck commented May 30, 2017

should make sure that the fact that .zip has duplication of headers that can cause mismatch of local vs central headers is mentioned

@rektide
Copy link

rektide commented Oct 5, 2017

how many of these arguments also stand against lzo or lz4 or (best) zstd?

i'd also be interested in seeing how much of the raging popular https://github.com/opencontainers/image-spec would make sense to use, vs how much doesn't match up?

@jyasskin
Copy link
Member Author

@rektide I believe lzo, lz4, and zstd are better compression algorithms, and none of the arguments are about any quality issues in zip's compression, so ... all of them?

I haven't looked through the opencontainers spec and will do so. Thanks for the pointer.

@jyasskin
Copy link
Member Author

Now that I've skimmed https://github.com/opencontainers/image-spec, it seems mostly-inapplicable. https://github.com/opencontainers/image-spec/blob/master/layer.md doesn't appear to support random access because it's a tar file. I'm having trouble finding the primary key of items in the image, but it seems like it's a path, contrary to https://tools.ietf.org/html/draft-yasskin-webpackage-use-cases-00#section-3.1.1.

@skhameneh
Copy link

lz4 preferred for speed, zstd preferred for ratio.

I am working on a similar packaging format and have a working tool, see
https://github.com/lbryio/lbry-format

I am very interested in cross adoption, @jyasskin 😃

@jyasskin jyasskin added the needs spec The issue has agreement, but someone needs to add it to the specification. label May 13, 2019
@jimmywarting
Copy link

jimmywarting commented Dec 5, 2022

While i have been playing with web bundles for a bit i can't say that i grew particularly fund of cbor and the way you are encoding things. you haven't made random access any easy, modifying a web bundle as a way to add/remove and even concat files together haven't been made any easy.

Serving a bit more dynamic generated web bundles on the fly isn't any good at all.

Encoding a web bundle literally requires you to allocate the web bundle in one go with lots of ram being used at once as you have to read the content out of all the file in order to generate some CBOR. it isn't any file streaming friendly at all.

And it also requires a cbor dependency as opposite to just having as simple as built in json or even plain DataView to read/write data in a own format.

i wish it where just some central directory like the zip files have in the end but in the beginning instead... so that i could simply just write the structure of everything and concatinate everything with my files afterwards.

I think it was a misstake to encode the respons payload as an array of [[headers, payload], ...] etc. and then putting it in a section that then later have to be encoding with cbor. after which you would have to decode the hole cbor in one go as cbor decoder is not so stream firendly either in many libraries.

it would be a heck of a lot easier if i could just generate some small cbor directory, write how big it is, then write the cbor directory and then afterwards pipe all my files to the end of the stream. kind of like
new Blob([size, cbor, fileA, fileB, fileC, ...])

and this don't even supported any compressed payload either...

i honestly think i want to stick to regular separate http requests with good caching and do more of code splitting and lazy loading when needed and writing more optimized code instead. and caching things on the fly as needed.
I honestly think i would rather want to use regular zip files instead and then populate the cache storage and using some service worker.

@jimmywarting
Copy link

jimmywarting commented Dec 5, 2022

fyi if the central cbor dir would have just known the size of a payload instead of the offset then it would be much easier to just simply concat multiple entries together and you could even do some kind of jobs more parallels if you eg would want to compress multiple files in parallel. and you wouldn't have to update the offset index or even calculate how large a cbor payload length is. which is necessary to calculate the offset cuz a byte array can be of different length. a major tag can be anything from 1-3 bytes depending on the size. so you need to some smart as logic

something better could be more like:

var cborSize = new Uint8Array(8)

var [ file1, file2 ] = input.files

var meta1 = { 
  url: origin + file1.name,
  size: file1.size, 
  headers: { ':status': 200, 'content-type': file1.type, 'content-length': file1.size },
  extraFields: { ... }
}
var meta2 = { 
  url: origin + file2.name,
  size: file2.size, 
  headers: { ':status': 200, 'content-type': file2.type, 'content-length': file2.size },
  extraFields: { ... }
}

var cbor = encode([ meta1, meta2 ])
const dv = new DataView(cborSize)
const dv.setBigUint64(0, cbor.byteLength)
const webBundle = new Blob([ cborSize, cbor, file1, file2 ])

☝️ here in this example you

  • won't even have to read the content of the file.
    • no data dose ever have to be allocated for the files. as the new blob would simply just hold references points to the file paths on the users disc.
  • you don't have to re-calculate the byte offset every time, it would be just as simple as how Tar concats multiple gzip'ed files together.
  • many cbor parser/generators arn't stream friendly in regards to files. or even doing partial cbor decoding.
    • polyfilling something like web bundle would be hard as you would have to manually step into the cbor parsers and read one token at the time and then decide what you want to do with it afterwards. should it continue to read the next following bits or not? most encode/decoder would simply just have one single encode/decode function and no way of manually stepping. you would need the hole byte array and decode it in one go...
  • and getting a response chunk would be as simple as just doing something like webbundle.slice(start, end)

the file payload should have been kept apart from the hole cbor response payload... it would have made the parsing of the central cbor structure more easier and allow for more random access more easier and then making random byte range requests to the things you need

also an option to have the central directory in either the end or in the beginning would be neat if you want to dynamically generate a web bundle where you don't know the number of request as of yet.

@jimmywarting
Copy link

I'm only wondering if you took any consideration into choosing zip's extra custom fields? it have the option to add in extra fields such as response headers. request url and more that's needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs spec The issue has agreement, but someone needs to add it to the specification.
Projects
None yet
Development

No branches or pull requests

5 participants