Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy unboxed tuples / warn on unbanged strict patterns #35

Merged
merged 7 commits into from Sep 12, 2018

Conversation

goldfirere
Copy link
Contributor

@goldfirere goldfirere commented Jan 10, 2017

The proposal has been accepted; the following discussion is mostly of historic interest.


This proposes to make unboxed tuple patterns lazy-by-default, and to turn GHC's existing error about unbanged strict patterns (which are strict by virtue of binding a variable of an unlifted type) into a warning.

Rendered


is *not* an unlifted bind.

Change (A) says that unlifted binds, and only unlifted binds, are strict by default. (Before Change (A), unboxed tuple patterns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is incomplete.

z = ()
where I# x = 4

This code is rejected by GHC 8.0 with an error. Change (B) makes this error into a warning. The binding is strict.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typesetting mistake. Unindent this paragraph by one or two character, I guess.

@rwbarton
Copy link

Does this mean that

f :: ()
f = let t = undefined :: (# () #)
    in ()

would still diverge (because t is of unlifted type), while

g :: ()
g = let (# u #) = undefined :: (# () #)
    in ()

would now evaluate to () (because u is of lifted type)?

@goldfirere
Copy link
Contributor Author

@rwbarton , yes, precisely.

It also means that

z = ()
  where _ = undefined :: Int#

will evaluate to () while

z = ()
  where _x = undefined :: Int#

will throw an exception.

However, this kind of silliness is present in GHC 8. If we have

data T = K Int# Int

and then say

y = ()
  where K _ _x = undefined

we can evaluate y to (). But if I say

y = ()
  where K _z _x = undefined

now I get an error saying I should bang the pattern. Under this proposal, I would get the unbanged-strict-patterns warning in this last example, but then y would throw an exception when evaluated.

My z example above makes me think we should require the bang even on bare variables, but that would break a lot of code I think.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jan 10, 2017

To be clear, an "unlifted id", in the definition of an "unlifted bind", is an freshly-bound identifier whose type is unlifted, right?

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jan 10, 2017

Also, might "unlifted pattern" be a better term than "unlifted bind"? IIUC the single where clause in

foo = ()
  where (# #) = undefined

would be affected by this (would evaluate to () instead of throwing an exception), yet contains no bindings.

@simonpj
Copy link
Contributor

simonpj commented Jan 10, 2017

I like both parts of this proposal. It's a dark corner, and we should strive for simplicity in the specification. This does exactly that.

@goldfirere
Copy link
Contributor Author

Clarified points as requested by @Ericson2314 , who is right on both counts.

@simonmar
Copy link

I actually like it that unboxed tuple bindings are strict (even though it was probably accidental) because it matches my intuition that every unlifted binding is strict. It feels like a special case to make unboxed tuple bindings lazy. Do we have a compelling use case for this, or is it just that (a) it's easy and (b) it's slightly more general? It might lead to less efficient code in some cases, and if someone is using unboxed tuples they probably care about performance.

It would be nice if ~ could be used to make an unboxed tuple binding lazy.

By all means make the error about nonbanged strict patterns into a warning. I'm not sure why it was made an error, but we did go through a series of changes where it was originally made a warning and then an error, and I recall a lot of churn particularly with making code work across multiple GHC versions.

@simonpj
Copy link
Contributor

simonpj commented Jan 11, 2017

This is tricky stuff to specify, and the proposal still isn't very clear.

The proposal say "Define an "unlifted pattern" to be any pattern that binds variable with an unlifted type.". That's good.

Then it uses the term "strict pattern"; I think that's precisely the same as "unlifted pattern", but I'm not sure. I'll stick to "unlifted pattern" alone.

Now, a binding p = e, where p is an unlifted pattern, must compile to a case expression. That means:

  • It must not be top-level
  • It must not be recursive
  • It must not be polymorphic or overloaded

So it's pretty restrictive.

Define a "banged binding" to be one of form !p = e. A banged binding cannot be top-level, but it can be recursive, polymorphic, or overloaded. Why? See the FORCE rule in Section 9.28.4 of the manual: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#dynamic-semantics-of-bang-patterns. To take a recursive example, the binding

let ![x,y] = f x y z in blah

desugars to

let t = case f x y z of [x,y] -> (x,y)
    x = case t of (x,y) -> x
    y = case t of (x,y) -> y
in t `seq` blah

This might diverge; but it also might not. The banged binding means that before evaluating blah we seq on a full match of the banged pattern.

OK, having set the scene, what about unboxed tuple bindings?

(# x,y #) = e

We could:

  1. Treat them uniformly (this is Richard's proposal)
  2. Treat them like a banged binding
  3. Treat them like a unlifted pattern binding.

Richard's proposal is (1). It's unclear whether Simon M is suggesting (2) or (3). I think (2) would be better: it is more expressive, but in the common case of a non-recursive binding you get the case expression you probably expected. But I suppose we could chose (3).

There are no technical issues at stake here; it's a free stylistic choice.

@simonmar
Copy link

Simon, in point (3) I think you meant "unlifted pattern" rather than "strict pattern", correct?

So yes, I was thinking of (3), I hadn't considered (2). (3) seems simpler (to me), because it desugars directly to the case expression rather than requiring subsequent optimisations, but (2) is more expressive. I don't feel very strongly about any of this, but it does seem intuitive to me to treat an unboxed tuple binding like other unlifted pattern bindings.

@simonpj
Copy link
Contributor

simonpj commented Jan 12, 2017

in point (3) I think you meant "unlifted pattern" rather than "strict pattern", correct?

Yes! I've updated my comment. Sorry.

@simonpj
Copy link
Contributor

simonpj commented Jan 12, 2017

It would be nice if ~ could be used to make an unboxed tuple binding lazy.

Since "~" cancels "!", that desire would be uniformly supported with (2); but not with (3). So that argues for (2)

@goldfirere
Copy link
Contributor Author

goldfirere commented Jan 13, 2017 via email

@simonmar
Copy link

Sorry for being a bit loose with the terms. (perhaps "unlifted pattern" is an unfortunate terminology since it doesn't mean that the pattern is unlifted. How about "unlifted-variable pattern"?)

ghc-mirror pushed a commit to ghc/ghc that referenced this pull request Jan 19, 2017
This commit implements the proposal in
ghc-proposals/ghc-proposals#29 and
ghc-proposals/ghc-proposals#35.

Here are some of the pieces of that proposal:

* Some of RuntimeRep's constructors have been shortened.

* TupleRep and SumRep are now parameterized over a list of RuntimeReps.
* This
means that two types with the same kind surely have the same
representation.
Previously, all unboxed tuples had the same kind, and thus the fact
above was
false.

* RepType.typePrimRep and friends now return a *list* of PrimReps. These
functions can now work successfully on unboxed tuples. This change is
necessary because we allow abstraction over unboxed tuple types and so
cannot
always handle unboxed tuples specially as we did before.

* We sometimes have to create an Id from a PrimRep. I thus split PtrRep
* into
LiftedRep and UnliftedRep, so that the created Ids have the right
strictness.

* The RepType.RepType type was removed, as it didn't seem to help with
* much.

* The RepType.repType function is also removed, in favor of typePrimRep.

* I have waffled a good deal on whether or not to keep VoidRep in
TyCon.PrimRep. In the end, I decided to keep it there. PrimRep is *not*
represented in RuntimeRep, and typePrimRep will never return a list
including
VoidRep. But it's handy to have in, e.g., ByteCodeGen and friends. I can
imagine another design choice where we have a PrimRepV type that is
PrimRep
with an extra constructor. That seemed to be a heavier design, though,
and I'm
not sure what the benefit would be.

* The last, unused vestiges of # (unliftedTypeKind) have been removed.

* There were several pretty-printing bugs that this change exposed;
* these are fixed.

* We previously checked for levity polymorphism in the types of binders.
* But we
also must exclude levity polymorphism in function arguments. This is
hard to check
for, requiring a good deal of care in the desugarer. See Note [Levity
polymorphism
checking] in DsMonad.

* In order to efficiently check for levity polymorphism in functions, it
* was necessary
to add a new bit of IdInfo. See Note [Levity info] in IdInfo.

* It is now safe for unlifted types to be unsaturated in Core. Core Lint
* is updated
accordingly.

* We can only know strictness after zonking, so several checks around
* strictness
in the type-checker (checkStrictBinds, the check for unlifted variables
under a ~
pattern) have been moved to the desugarer.

* Along the way, I improved the treatment of unlifted vs. banged
* bindings. See
Note [Strict binds checks] in DsBinds and #13075.

* Now that we print type-checked source, we must be careful to print
* ConLikes correctly.
This is facilitated by a new HsConLikeOut constructor to HsExpr.
Particularly troublesome
are unlifted pattern synonyms that get an extra void# argument.

* Includes a submodule update for haddock, getting rid of #.

* New testcases:
  typecheck/should_fail/StrictBinds
  typecheck/should_fail/T12973
  typecheck/should_run/StrictPats
  typecheck/should_run/T12809
  typecheck/should_fail/T13105
  patsyn/should_fail/UnliftedPSBind
  typecheck/should_fail/LevPolyBounded
  typecheck/should_compile/T12987
  typecheck/should_compile/T11736

* Fixed tickets:
  #12809
  #12973
  #11736
  #13075
  #12987

* This also adds a test case for #13105. This test case is
* "compile_fail" and
succeeds, because I want the testsuite to monitor the error message.
When #13105 is fixed, the test case will compile cleanly.
@bgamari
Copy link
Contributor

bgamari commented Jan 24, 2017

Note that this proposal is now half-way through its four week discussion period. At the end of this period we ask @goldfirere to summarize the discussion and bring the proposal to the GHC committee for consideration. Thanks!

@nomeata
Copy link
Contributor

nomeata commented Feb 27, 2017

This proposal has not attracted discussion for a while. Please consider submitting it to the Committee for discussion, if you think it is ready, or close it, if you have abandoned this proposal.

@nomeata
Copy link
Contributor

nomeata commented Apr 13, 2017

Another ping. @goldfirere, whither this proposal?

@goldfirere
Copy link
Contributor Author

Oops. I missed the first ping. I will update this proposal with respect to recent commentary and then submit for a decision.

@nomeata
Copy link
Contributor

nomeata commented May 7, 2017

@goldfirere Another ping about this proposal… :-)

@goldfirere
Copy link
Contributor Author

I have updated the proposal to incorporate feedback here. I would like to submit this for a decision, @nomeata.

@goldfirere goldfirere added the Pending committee review The committee needs to evaluate the proposal and make a decision label Jun 5, 2017
@Ericson2314
Copy link
Contributor

For the intermediate proposal, can we also have let 3# = 4# in () diverge, but let ~3# = 4# in () evaluate to ()? Then the intermediate proposal only depends on whether the type of the pattern is lifted, not anything to do with the binding of variables.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jun 9, 2017

Philosophically, I see as the lazy matching of unlifted types as a sort of "static laziness". With this static laziness, the compiler is allowed to delay evaluation (or perhaps more broadly rewrite the term just worrying about preserving semantics in the case that the bound expression terminates) as long as no new thunking or boxing is introduced.

@goldfirere's variable binding rule is somewhat a special case of this principle where, because of the lack of unlifted bindings with the pattern, thunks are "already available" for every possible data dependency with the potential to force the match.

@nomeata
Copy link
Contributor

nomeata commented Jul 11, 2017

Oh, I missed that this was submitted for review… will act immediately.

@nomeata nomeata assigned nomeata and rrnewton and unassigned nomeata Jul 11, 2017
@rrnewton
Copy link

rrnewton commented Jul 11, 2017

I tend to agree with @simonmar that we should do what is best for performance, which is where we need explicit use of unboxed tuples. But ... I guess there's some performance risk to going too strict by default as it might introduce extra "thunk-evaluated?" checks in scenarios where we know the thunks are already evaluated?

I wish there were a simple rule like if a pattern variable binds Int#, it needs !. Or that they never do. Personally, I'm not really a fan of having bangs on unlifted-bindings, because I wanted to think of bangs, operationally, as forcing thunks just like seq. In the case of unlifted-primitive-bindings they're not doing that, they're just documenting something you already know: that Int# is unlifted.

@rrnewton
Copy link

rrnewton commented Jul 12, 2017

@goldfirere, I'd like to reply here to some of the text in the proposal:

requiring bangs on unboxed tuple patterns that bind a variable of an unlifted type

Wait, does this language mean that we could need double bangs to avoid warnings? !(# !a, !b #)?

Again, I believe we want to generate the strict behavior by default most of the time. That is, that we don't want to make it easy for the user to accidentally generate this code shown in the manual:

f x = let t = case h x of { (# p,q #) -> (p,q) }
          p = fst t
          q = snd t
      in ..body..

Almost always, when I'm mucking with unboxed tuples because I'm explicitly trying to squeeze the allocation out of a loop. This terrible code above would reintroduce mystery thunk allocations (for p and q) and destroy the performance of my loop in a way that causes me to do that much more digging around in the -ddump'd Core and STG.

I'm using unboxed tuples to get away from boxed tuples -- reintroducing boxed tuples implicitly is cruel!

So that leads me to want to leave laziness as "opt in" rather than "opt out". But there are two distinct ways to accomplish that. One is the "option (2)" above, which is described in the Alternatives section of the proposal as:

There is also a middle ground for (A) around unboxed tuples: we could pretend they always have a bang on them.

Ok, that's fine. And @simonmar and @simonpj seem to be at-least-not-opposed to it.

An add-on to that idea seems to be to issue warnings when any unboxed tuple binding lacks the bang, just like we do for other unlifted-type variable bindings in patterns.

Uh, except this bumps up against the fact that we are not uniform in requiring "pattern variables of unlifted types should have a bang" (e.g. !x :: Int#), because we require that for nested fields but not for top-level bindings. So doing the "same" thing as we do for Int# would mean that we issue a warning if there is no bang on "inner" unboxed-tuple patterns, but not at the top-level? (But we would still then interpret the top-level one as having an implicit bang?)

The good thing is that the meaning with the ! or with the ~ is clear, its just the interpretation of patterns without them that's very confusing.

@rrnewton
Copy link

rrnewton commented Jul 13, 2017

@goldfirere - Do all the discussed designs satisfy these two "theorems"?:

  • If there is textually a ! in a pattern, then the pattern binding introduces at least some evaluation (at least one case expr, though it may be nested).
  • If there is textually a ~ in a pattern, then the pattern binding introduces code for at least one thunk allocation (though it may be optimized away).

@goldfirere
Copy link
Contributor Author

I will number your theorems (A) and (B).

(A) is not true today, and this proposal does not change that fact. Counterexample: let (!x, !y) = (undefined, undefined) in () evaluates to ().

(B) is true today and with this proposal.

I don't believe my proposal gets near to these theorems. The more interesting space (which my proposal does affect) is what happens when there is no bang or twiddle.

@simonmar
Copy link

I agree with @rrnewton, making an unboxed tuple binding lazy by default seems to be intuitively the wrong choice. I guarantee I would get tripped up by this! Giving unboxed tuples an implicit bang seems reasonable to me.

@goldfirere goldfirere mentioned this pull request Dec 6, 2017
@nomeata
Copy link
Contributor

nomeata commented May 14, 2018

I am also worried about existing code. People that do careful tricks with unboxed tuples probably require the particular semantics that GHC currently does (and they probably didn’t read the manual :-)). Since evaluation order is not something that is captured by our types, changing something here might cause hard-to-track-down bugs somewhere. I would expect (without looking) that ghc-heap-view would break.

Is there a good example why the documented behavior is actually better than the implementation? I.e. a use-case?

Otherwise I’d be inclined to just update the manual to reflect reality that suited our users for several years, and consider it a bug in the manual, not the code :-)

@simonpj
Copy link
Contributor

simonpj commented May 15, 2018

Catching up on this again, I mis-read the proposal. Could you add to the proposal something about desugaring? That is, you propose that

(# a, Just b #) = e

desguars to

t = case e of (# a, Just b #) -> (a,b)
a = fst t
b = snd t

I was worried about having a thunk with unboxed-tuple type; but in the desugaring we see that all thunks have lifted types.

As to an implicit bang on outermost unboxed tuples, I defer to users. It's a free choice. I agree that making an unboxed tuple lazy (and making the desugaring create a boxed tuple) might be unexpected.

@nomeata nomeata assigned bravit and unassigned rrnewton Aug 4, 2018
@nomeata nomeata added Accepted The committee has decided to accept the proposal and removed Pending committee review The committee needs to evaluate the proposal and make a decision labels Sep 12, 2018
@nomeata nomeata merged commit d87c296 into ghc-proposals:master Sep 12, 2018
@barci2
Copy link

barci2 commented Dec 22, 2022

Since this is implemented, should this PR have the "Implemented" label added?

@nomeata nomeata added the Implemented The proposal has been implemented and has hit GHC master label Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted The committee has decided to accept the proposal Implemented The proposal has been implemented and has hit GHC master
Development

Successfully merging this pull request may close these issues.

None yet

10 participants