Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutable constructor fields #8

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

simonmar
Copy link

This is a proposal to add mutable constructor fields to GHC.

Rendered

@simonmar
Copy link
Author

@Ericson2314
Copy link
Contributor

Do the constructors need to be magic like that? I rather have a magic way to construct a Ref# so all exposed variants are non-magical.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Sep 20, 2016

Perhaps we constrain the RuntimeReps in -> so whatever magic constructs Ref# is always inlinable / not exists on its own post compilation?


The GC write barrier for a mutable constructor may be a little less
efficient than the write barrier for a ``MutVar#``, but this is more
than compensated for by losing a layer of indirection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the effect of "losing a layer of indirection" been measured and benchmarked? Instinctively, this seems correct to me, that removing a layer of indirection is more beneficial than the performance hit of the GC write barrier for a mutable constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other people have experimented with hacks to eliminate this layer of indirection (e.g. @ekmett's Struct and @fryguybob's TStruct) and found it worthwhile. Remember that you only pay the performance hit of the write barrier if you ask for it, and it's only likely to be a tiny bit more expensive than the MutVar# write barrier (which you would be paying anyway), if at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance benefits from my TStruct can be seen on slide 26 here:

http://www.cs.rochester.edu/u/ryates/files/ryates-IFL2016-slides.pdf

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonmar can this bit be extended to say why the GC barrier would be less efficient, since it's not obvious to me (so probably not obvious to lots of people).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcoutts I think it is the granularity of the dirty header that is different. A mutable pair made of three heap objects will mark exactly which pointer is dirty while a single heap object mutable pair traverses both pointers even if only one changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons: one is the lack of accuracy that @fryguybob mentioned, and the other is more mundane: we know statically the name of stg_MUT_VAR_CLEAN_info and its DIRTY equivalent, but for an arbitrary mutable constructor we will have to fetch these from the info table.

Copy link
Contributor

@yav yav May 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonmar: do you think a layout where we store the number of mutable pointers in the payload might work? Then we could have just two new closure types: MUT_CTR_CLEAN and MUT_CTR_DIRTY, and the layout of the payload would be: [ number of mutable pointers | mutable pointers | non-mutable pointers | non-pointers ]. If we allowed ourselves an extra word in the payload, we may even have some bits for a mask to specify exactly which fields are actually dirty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my limited understanding, would it be possible to lay out DIRTY and CLEAN info tables in a way that the offset between them is statically known? e.g. MutPair_CLEAN_info+0x420 == MutPair_DIRTY_info.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I successfully implemented that by having a field in each info table that pointed to the other info table. They do, as far as I can tell, end up statically offset from each other but it was simple enough to add a field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very interesting! Ideally, I'd hope we can get rid of that additional field, by allocating CLEAN and DIRTY tables next to each other, offset by what I hope is a (info table agnostic) constant. Quite similar to TABLES_NEXT_TO_CODE.

@cartazio
Copy link

Edward K has a package , Structs, that effectively allows unboxed
references in a potentially recursive setting. It's worth looking at as a
point of comparison perhaps

On Tuesday, September 20, 2016, ramonkeypr notifications@github.com wrote:

@ramonkeypr commented on this pull request.

In proposals/0000-mutable-fields.rst
#8 (review)
:

+generate code for other constructors, taking care to add the Void
+argument for the State#, and generating an info table with the
+correct information about the mutable fields.
+
+Unpacking constructors with mutable fields
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+UNPACK must be a no-op on constructors with mutable fields. There's
+no sensible way to make UNPACK work with mutable fields.
+
+Drawbacks
+---------
+
+The GC write barrier for a mutable constructor may be a little less
+efficient than the write barrier for a MutVar#, but this is more
+than compensated for by losing a layer of indirection.

Has the effect of "losing a layer of indirection" been measured and
benchmarked? Instinctively, this seems correct to me, that removing a layer
of indirection is more beneficial than the performance hit of the GC write
barrier for a mutable constructor.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#8 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAQwoIf5HDBgszUTJGoFyziBLBKqnq_ks5qr3nggaJpZM4KBKtX
.

@cartazio
Copy link

Cc @ekmett , this seems very related to your structs package / codes.

@goldfirere
Copy link
Contributor

From a surface syntax standpoint, I don't like the magical use of IOField. What if we introduce a new keyword mutable and require the GADT syntax at the bottom of the proposal?

data MutPair :: Type -> Type -> Type where
  MutPair :: mutable a -> mutable b -> IO (MutPair a b)

Differentiating between an IOField and a STField could be done by looking at the monad mentioned in the return type.

@adamgundry
Copy link
Contributor

Can I declare a mutable field with record syntax? If so, what are the typing rules for record selection and update?

@rwbarton
Copy link

Would it be feasible to support mutable unboxed fields? It would be nice to have an "IORef Int#". Presumably Ref would need to gain an argument to describe the representation of the referent.

@dcoutts
Copy link

dcoutts commented Sep 20, 2016

I very much like the fact that this involves the compiler properly knowing about the mutable fields and generating info tables, since this means profiling should work, whereas with @ekmett's cunning hack all your mutable types are the same as far as the profiler can tell.

Will there also be a Ref type, as well as Ref#? It'd be nice if error messages can mention Ref rather than scary # types :-)

@remiturk
Copy link

Something I've been working on in my own (unpublished, broken, incomplete & in eternal development hell) unboxed-records-with-first-class-labels-library is the freezing & thawing of records, analogous to the way it works for e.g. (im)mutable vectors. That way my main simulation core can run blazingly fast in ST while calculating statistics after a round can be done in a nicely pure interface.

Do you see (the value of) any way of extending the current proposal later to support something similar?

@rwbarton
Copy link

Will it be possible to compare such constructors for reference equality? How would that look?

@fryguybob
Copy link

I have a bunch of additional features that I want and don't need to be fully worked out yet, but might influence decisions about syntax now. One consideration would be including memory model information. In the GADT form this could be something like AtomicRef#. Another extension is arrays with ArrayRef#. I want this feature in the context of STM too which would likely add some metadata to the representation. This might be just using STM instead of IO in the GADT syntax. I also want sum types. All together I would like to be able to write this for a Hashed Array Mapped Trie:

data Node k v where
    Level :: Word -> ArrayRef# (Node k v) -> STM (Node k v)
    Leaf :: Hash -> ArrayRef# (k,v) -> STM (Node k v)

or

data Node k v where
    Level :: Word -> ArrayRef# (Node k v) -> STM (Node k v)
    Leaf :: Hash -> k -> v -> STM (Node k v)
    Leaves :: Hash -> ArrayRef# (k,v) -> STM (Node k v)
    Nil :: STM (Node k v)

or even better:

data Node k v where
    Level :: Word -> ArrayRef# (Node k v) -> STM (Node k v)
    Leaf :: Hash -> k -> v -> Node k v
    Leaves :: Hash -> ArrayRef# (k,v) -> STM (Node k v)
    Nil :: Node k v

Copy link
Contributor

@bgamari bgamari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a neat idea, but I have some questions.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

UNPACK must be a no-op on constructors with mutable fields. There's
no sensible way to make UNPACK work with mutable fields.
Copy link
Contributor

@bgamari bgamari Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to say that we are unable to unpack even immutable fields within a constructor also containing mutable fields? It's not immediately obvious to me why this should be the case. It would be nice to clarify here.

Copy link

@rrnewton rrnewton Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this refers only to unpacking a mutable constructor into another mutable constructor. But it would be nice to unpack the reasoning there. I think it goes like this:

"While C++ or Rust allow unpacking a mutable product type into another, this requires a recursive notion of object construction/initialization and would not fit here. If we tried the following:

data MutPair2 = MP Int {-# UNPACK #-} MutPair

what is the type of the MP constructor? It cannot be this:

MP :: Int -> MutPair -> IO MutPair2

because the only way to make that work would be to copy the mutable record MutPair into the MP constructor. This is (1) not what we'd want for including inner objects as value types, and (2) ruins the guarantee that UNPACK is a performance hint rather than semantically important."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrnewton exactly; I've expanded that section with your text.


The ``MutPair`` constructor has the following type::

MutPair :: a -> b -> IO (MutPair a b)
Copy link
Contributor

@bgamari bgamari Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems very odd to me to change the type of the constructor like this; this feels like a very ugly exception where the moment you add a mutable field the behavior of your constructor changes in a very fundamental way. It almost feels as though there should be some syntactic indicator to mark these special constructors (e.g. data MutPair = mutable MutPair (IOField a) (IOField b), which could also be used in the GADT case).

I tried playing around with a few alternative approaches to provide the "constructor action" via a typeclass (such that we could just hide the constructor itself), but it sadly seems to get quite complicated. My initial thought was that you could identify a constructor with its promoted kind (e.g. 'MutPair), but the presence of the field arguments in its kind make this rather awkward. To make this fly we'd need to have a better way to represent constructor identity at the type level.

Another approach would be to build this on the existing Generics infrastructure, although this also seemed quite complicated and not necessarily safe.

Regardless, while changing the type of the constructor itself is awkward, it may be the least of the evils here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @RyanGlScott in case he has thoughts about Generics in this context.

@rrnewton
Copy link

What are the rules for what monad goes on the RHS of a constructor signature? It seems like there is a spectrum here between (1) only supporting raw State# types, and (2) giving special treatment to IO/ST/STM.

@simonmar
Copy link
Author

@adamgundry Record selection works smoothly I think - the record selector returns a Ref# s a, and the typing is exactly as if you'd written the record selector by hand. Record construction should also work nicely, although of course the record constructor has an IO or ST type. Record update should probably be disallowed for a record with mutable fields. I'll add some notes about this to the proposal.

@simonmar
Copy link
Author

@rwbarton It would be nice to allow mutable unboxed fields. In fact, there's no reason why Ref# shouldn't be able to take a representation-polymorphic type argument, since that doesn't affect the representation of the Ref# itself. However, we would need a family of primitives for reads and writes at the different unboxed types. I'll add a section about this to the proposal.

@simonmar
Copy link
Author

@goldfire yes GADT syntax gives us a great way to specify whether you want IO or ST (or later STM), so I agree this is the way to go. There is still magic though: mutable a turns into a for the constructor, but IOField a when pattern matching. I'm not crazy about adding a new keyword, but I don't have a better suggestion so I think we could go with this. Anyone else have thoughts before I update the proposal?

- Add a section on records
- Add a section on mutable unboxed fields
- Add various unresolved questions
- Move the unresolved GC section to a TODO
@cartazio
Copy link

I would hope that it's not hard coded to just ST and IO, at the very least I've my own third sibling ,
STE https://hackage.haskell.org/package/monad-ste that is valid in any use site of MonadPrim or or PrimBaseMonad along with ST and IO

Would this boil down to "is the monad thing a new type around the raw internal rep that we use for IO Or ST"?

Granted there's then several parallel yaks around fields that form valid mvars or stm variables I guess ?

@simonmar
Copy link
Author

@fryguybob why do you say we would need a different AtomicRef#? I imagined we would just have atomic primops that operate on the ordinary Ref#, in the same way that we have atomic primops for MutVar#.

@fryguybob
Copy link

@simonmar right, this choice would go along with having a memory model. C++11/14/17 whatever distinguishes atomic locations and concurrent accesses (with at least one a write) outside of those locations are undefined behavior. Other approaches may work but at a minimum we should be ensuring that enough information gets down to say the LLVM level, if it is doing the code generation. For instance we may know that an atomic release store on TSO is a normal store, but we don't want LLVM to reorder that store in a basic block. I imagine we are doing the right thing now for this particular case, but it may be very convenient to distinguish explicitly synchronization locations when it does come to having a memory model. In addition, I think volatile is considered orthogonality in LLVM preventing things like accumulating in a register from a value that may be accessed in the same thread from a signal handler.

@ggreif
Copy link
Contributor

ggreif commented Sep 21, 2016

Just leaving my reddit comment here for consideration...

Yes, I'd also prefer some syntactic clue, that something different is going on.

data MutPair a b in IO = MutPair (IOField a) (IOField b)

sounds like a possibility.

@wdanilo
Copy link

wdanilo commented Aug 17, 2018

@fryguybob I was writing a message earlier to thank you for such detailed answer but it seems I've never sent it, so thank you!

Btw, did anything change regarding this topic? I know it's not the highest priority but I'm waiting for it like crazy now! :)

@fryguybob
Copy link

@wdanilo I don't have any major updates at this point. I will probably have discussions with people about working toward an implementation that can be merged in at some point at Haskell Symposium.

@Ericson2314
Copy link
Contributor

If the linear types proposal is accepted, we could perhaps someday purely do this at the library level. I'd be interested to see that design fleshed out.

@winterland1989
Copy link
Contributor

winterland1989 commented Aug 18, 2018

I think it's worthwhile to describe the difference between a primitive mutable field and a lifted one in full detail because a closure with only primitive mutable fields do not need write barrier, while lifted fields require them. I even prefer the extension can be split into two variations: MutableFields and PrimMutableFields for different performance consideration.

@yav
Copy link
Contributor

yav commented Aug 20, 2018

@winterland1989 I would expect the compiler to just do "the right thing" when emitting code (e.g., if it does not need a write barrier it should not emit one). What would be the benefit of having two separate extensions?

@winterland1989
Copy link
Contributor

@yav The benefit of a separate PrimMutableFields is twofold:

  1. Currently, we don't have primitive references. Users simulate it with MutableByteArray#, but it carries an extra word (length which is always 1). PrimMutableFields can provide a good replacement.

  2. I believe PrimMutableFields have a different impact on implementation and performance, and some operations should be limited to primitive fields, such as fetch-and-add.

Of course, if we can state all these differences clearly and the compiler is smart enough to always do the right thing, a unified extension is OK.

@fryguybob
Copy link

@winterland1989 I think we should be fine with just one extension. In my implementation you can specify an unlifted type directly and it does what you would expect. Matching on an unlifted field gives a RefU# type which supports only unlifted types. Writing to a non-pointer field does not incur the GC write barrier. If you have a constructor with only non-pointer fields I do still generate two info tables. All the information is available at the right place for me to improve that, however.

I think some sort of levity polymorphism would be needed to handle what is normally done with strictness and unpacking. If you had something like:

data M a where
    MkM :: {-# UNPACK #_} mutable !a -> IO (M a)

When you match on an M Int do you get a Ref# or a RefU#? I haven't gone down that rabbit hole yet.

@winterland1989
Copy link
Contributor

@fryguybob The unpacking example probably shouldn't work since we don't allow unpack polymorphic field anyway. But we probably shouldn't allow unpacking arbitrary concrete type either.

Do you think it helps if we introduce another mutable# keyword? We can limit the types which mutable# accept (and unpack), matching on a mutable# field will always result in a RefU#, which may support exotic atomic operations.

@treeowl
Copy link
Contributor

treeowl commented Feb 16, 2021

Is someone still working on this?

@Boarders
Copy link

@treeowl I've wondered the same thing, I think this would open up lots of interesting Haskell performance which didn't rely on strange aspects of ArrayArray# and the like.

@fryguybob
Copy link

I have worked on it some in the last year when I had a little time.

https://github.com/fryguybob/ghc/tree/wip/ryates/mutable-fields-9

It is sort of in a state of minimal working example for only IO based mutable constructor fields.

@ramirez7
Copy link

ramirez7 commented Feb 16, 2021

Now that we have -XLinearTypes, I think there's room in this proposal for a third Data.Mutable API as well. Maybe something like:

module Data.Mutable.Linear where
  type Mutable a = ??? -- Unsure what this should be
  readRef :: Mutable a %1-> (Ur a, Mutable a)
  writeRef :: a -> Mutable a %1-> Mutable a

I'd imagine if we wanted to leverage LTs for this proposal, more changes would also be needed.

@sgraf812
Copy link
Contributor

I would be very happy to see progress here! Thanks for your efforts, @fryguybob.

I'm not convinced considering linear types adds any value to this proposal. Sure, the API will be unsafe, but then you can always expose a safe, linearly-typed layer over it, as is the case for mutable arrays, for example. The primitives are ST/IO-based (e.g. state-token passing based), and it works well enough. Unless I see a use case that really needs specific treatment for linear types, I don't see why we should complicate the proposal even more.

@ramirez7
Copy link

@sgraf812 If I understand this proposal correctly, that route would not work very well. Correct me if I'm wrong:

The proposed extension generates Mutable fields in IO or ST, and when you pattern match on the record, you're now forced to use either of those. So the benefit of adding LTs as a third option would be to avoid having the end-user (i.e. the person who defined their records) perform unsafe operations. It's not really worth it if you have to write the unsafe IO to LT wrapper for each of your records. This is unlike mutable arrays, which are a single type and therefore can hide the IO completely internally in a single module. Mutable record fields otoh give you that IO/ST everywhere you define them.

More concretely:

I don't think you can safely write this shim:

module Data.Mutable.Linear where
  type Mutable a = Data.Mutable.IO.Mutable a
  readRef :: Mutable a %1-> (Ur a, Mutable a)
  writeRef :: a -> Mutable a %1-> Mutable a

because that would allow you to break referential transparency. There's no guarantee that that IO.Mutable a is used linearly throughout the program. With the linear mutable arrays you mention, you can use the module system to get this guarantee.

So I think that would be the value of including a LT option to this extension. We would need a different Mutable type for LTs that is distinct, and we would then know that all Linear.Mutable values are used linearly, which in turn allows for this safe, linear, pure API. And frees the end-programmer from unsafe boilerplate.

Whether or not that should go in this extension or a future incremental one 🤷‍♂️ This proposal has plenty of value as-is and great work already towards implementation thanks to @fryguybob So a future incremental one may make more sense, especially since -XLinearTypes is so new and we are still learning our way around it as a programming community.

@sgraf812
Copy link
Contributor

Correct me if I'm wrong: The proposed extension generates Mutable fields in IO or ST, and when you pattern match on the record, you're now forced to use either of those.

Basically, yes. I'm not familiar with the implementation, but I'm pretty sure that internally we won't have "implementations in IO or ST", but just one that uses state-token passing, e.g. State# s -> (# State# s, t #), which is representationally equal to ST s t and to IO t if s is specialised to RealWorld#. Note that the arrow there is basically a linear arrow for all intends and purposes (it seems that linear-base even redefines it as such), but it is not really checked by GHC (which has led to bugs in the past) and it doesn't matter for surface language semantics, because it's an implementation detail of the compiler. (Whether the compiler should make more use of linear types in its PrimOps is an entirely unrelated discussion.)

"Make it compatible with Linear Types" isn't a demand that makes sense; You can always use an unsafe non-linear arrow and coerce it into a linear one, offering a safe API over that. You restrict the programs the user can write, thereby making an API that previously allowed unsafe operations into one that only allows safe programs by construction.

With that in mind...

So the benefit of adding LTs as a third option would be to avoid having the end-user (i.e. the person who defined their records) perform unsafe operations. It's not really worth it if you have to write the unsafe IO to LT wrapper for each of your records.

I think you want new syntactic sugar for linear types and mutable record fields. E.g. if you have

data MutPair a b where
  MutPair :: { mutFst :: mutable a
             , mutSnd :: mutable b
             } -> IO (MutPair a b}

You want the corresponding safe, linear API to be "autogenerated" (to the extent that I understand -XLinearTypes):

newtype LMutPair a b = LMutPair' (MutPair a b) -- constructor is internal!
pattern LMutPair :: a -> b -> LMutPair a b
get_mutFst :: LMutPair a b %1-> (a, LMutPair a b) 
get_mutSnd :: LMutPair a b %1-> (b, LMutPair a b)
set_mutFst :: LMutPair a b %1 -> a %1-> LMutPair a b 
set_mutSnd :: LMutPair a b %1 -> b %1-> LMutPair a b

set_mutFst (LMutPair' p) a = unsafePerformIO (writeRef (mutFst p) a) `pseq` LMutPair' p

The other functions are defined similarly to set_mutFst, using an abundance of unsafePerformIO. I already see a bunch of new concerns:

  • Should the functions operate in a Linear state monad directly?
  • Is get_mutFst :: LMutPair a b -> a OK?
  • get_ and set_ is ugly, I want record accessors and update syntax! Can't we integrate it with one of the record field proposals?

Etc. Finally, I don't see why you couldn't just use TH to derive this API from the MutPair data type. Heck, even some Generic-based implementation should do it if we can have generic-lens and you prefer that.

This is unlike mutable arrays, which are a single type and therefore can hide the IO completely internally in a single module. Mutable record fields otoh give you that IO/ST everywhere you define them.

I don't think it's different at all. A data type mit mutable fields that is meant to be used linearly should define a corresponding linear API. linear-base defines its own shim over mutable arrays, for example. If you want to retro-fit a linear API on a mutable data constructor, use the corresponding TH function. Another point for this derivation business not to end up in a language extension.

@Ericson2314
Copy link
Contributor

Frankly, I wish this and the linear types extension were co-designed in the past, rather than trying to retrofit one onto the other after the fact. Being able to tie together the type system changes to low level FFI stuff like this would have been wonderful.

Copy link
Contributor

@sgraf812 sgraf812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal was probably written before the general proposal skeleton was added. Nevertheless, I think it would benefit from being adpated to fit that skeleton.

- ``Data.Mutable.IO.Mutable a``, if the constructor has an ``IO``
return type
- ``Data.Mutable.ST.Mutable s a``, if the constructor has an ``ST s``
return type, or a ``State# s -> (# State# s, t #)`` return type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very crude. Ideally, we'd have something like PrimMonad as a builtin thing and infer PrimMonad m => m t here.

I don't think we'll see that anytime soon, so an alternative would be a sort of "PrimMonad light":

module Data.Mutable where
  class PrimMonadLight m where -- feel free to bike-shed
    type Mutable m :: Type -> Type -- or `forall r. TYPE r -> Type` to accomodate runtime-rep poly
    readRef :: Mutable a -> m a
    writeRef :: Mutable a -> a -> m ()

And corresponding instances for IO and ST.

- The new ``mutable`` keyword declares a mutable field
- The return type is in ``IO``: a constructor with any ``mutable``
fields *must* have a return type that has one of the forms ``IO t``,
``ST s t``, or ``State# s -> (# State# s, t #)``, where ``t`` takes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment on 1.2.2, it would be much nicer to require a a -> b -> (forall m. PrimMonad m => m (MutPair a b)) return type instead of this quite arbitrary list of permitted types. Although that is not really simple and begs the question whether we can rid ourselves of the constraint in all cases...

fields *must* have a return type that has one of the forms ``IO t``,
``ST s t``, or ``State# s -> (# State# s, t #)``, where ``t`` takes
the form of the normal return type for the constructor. In the
latter two cases, the type variable ``s`` must appear in ``t``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
latter two cases, the type variable ``s`` must appear in ``t``.
latter two cases, the type variable ``s`` must not appear in ``t``.


A ``Ref#`` represents a mutable field of a constructor. Although ``Ref#``
appears to be a normal first-class primitive type, its *runtime
representation* will be ``(# Any, Int# #)``, that is, an unboxed pair
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why invent an entirely new runtime-rep Ref then? Hopefully that is just a synonym for TupleRep [ BoxedRep Lifted, IntRep ]. Also shouldn't this proposal allow mutable fields in unlifted data types (the proposal of which has been accepted, #265)? In that case, Ref# should take an argument l :: Levity. E.g.

Ref# :: forall (l :: Levity). * -> * -> TYPE (TupleRep [ BoxedRep l, IntRep ])`

But this proposal appears to rely on Any, which must have lifted rep today, so maybe supporting unlifted data types isn't possible.

I assume the reason to store the offset separately is to be able to insert the write barrier for the object that contains the field. Is that correct? I'd appreciate a forward reference if so.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is talking about the implementation here, how the reference itself can be represented by pairing a pointer to a heap object with a mutable field, with the offset to the field. My implementation started before levity polymorphism was completed so I actually have types for lifted and unlifted values in the field that the references points too. I couldn't avoid making a Ref# type as we need to know rather late how to generate code for accesses to references.


So, ``{-# UNPACK #-} !T`` cannot do anything if ``T`` is a type with
mutable constructors. However, ``UNPACK`` annotations can be used as
normal on immutable fields in the definition of a mutable constructor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean UNPACK on a field with a mutable constructor is an error? Or is it just ignored?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer an error.


`Eq` is supported by using ``reallyUnsafePtrEquality#`` to compare
mutable constructors, but we must ensure that the constructors are
evaluated strictly in the same way as we do for ``dataToTag#``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't want that kind of derived Eq instance. I find it very misleading; derived Eq instances normally determine structural equality, not referential equality. For example, users might add a mutable field to some pre-existing record and then wonder why the semantics of the derived Eq instance change completely.

I don't even see a reason why the Eq instance couldn't just compute structural equality! A field being mutable doesn't mean reading it is impossible. I certainly would advocate for deriving to have the same semantics as today if it doesn't lead to an error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even see a reason why the Eq instance couldn't just compute structural equality! A field being mutable doesn't mean reading it is impossible. I certainly would advocate for deriving to have the same semantics as today if it doesn't lead to an error.

That would be ideal, and is what Rust does already, but it unfortunately violates referential transparency: reading from a mutable field must be done in a monad, and none is available in a call to Eq. Perhaps deriving Eq should be an error, with a separate type class for pointer equality?


- Its constructor has the declared type
- It has identity, and equality is implemented using pointer equality
(see "Deriving" above).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's why you want that weird Eq instance. I don't like it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are really no alternative implementations I can think of that do not violate referential transparency. One could make this an error, though.

operation creates a new one that should compare equal only to itself.
Therefore the usual optimisation for nullary constructors whereby we
share a single global copy of the constructor does *not* apply to
mutable constructors, and we must always allocate them in the heap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have Unique for that use case. It performs better and is less quirky wrt. evaluatedness.

If you want to somehow optimise Eq instances for pointer equality, I'd much rather support a design like https://arxiv.org/pdf/2003.01685.pdf. E.g., export a primitive

-- | Semantically equivalent to `\a b k -> k`, but GHC is free to return `True`
-- if `a` and `b` are referentially equal and omit evaluating `k`.
GHC.Exts.withPtrEq :: a -> a -> Bool -> Bool

instance Eq (MutPair a b) where
  MutPair a1 b1 == MutPair a2 b2 = withPtrEq a1 a2 (a1 == a2) && withPtrEq b1 b2 (b1 == b2)

= ui, otherwise

Primitives
^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be nested under "Core"?

* Can we add a way to include mutable arrays in a constructor?
* It would be great to allow STM as an option in addition to IO and
ST. The constructor will need to store extra metadata, because
TVar# is more complex than MutVar#.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like another argument for PrimMonadLight. I'm having a hard time imagining what the intended semantics would look like for mutable fields in STM though...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STM with mutable fields was the central part of my thesis and why I made an implementation. The difficulty with pushing that to a high level with something like PrimMonad is we need extra fields in STM heap objects (at least that is a desirable design choice).

@treeowl
Copy link
Contributor

treeowl commented Feb 17, 2021

@sgraf812 my concern about PrimMonad specifically is that it is tied into GHC's internal representations of IO and ST. Exposing that in a language feature seems quite wrong. I wouldn't be opposed to using some internal version that doesn't expose that. I am curious whether some linearity-based approach could unify these in a more principled way, but I don't know nearly enough to guess.

@sgraf812
Copy link
Contributor

sgraf812 commented Feb 17, 2021

I don't think we need to expose the methods of PrimMonad (although we could in a GHC-specific module), it suffices to specify the instances. We do the same for Coercible.

Also note that the proposal already exposes the internal State# s -> (# State# s, t #) type by offering it as one of 3 alternative return types in the constructor signature. If anything, hiding all 3 types behind the rather abstract PrimMonad constraint makes the proposal less implementation specific.

As I said above, I don't see how linear types help. They are just in the type system and have no operational nature. You'd need unsafe primops and hide them behind a linear API (unless you want to extend Core with new LT-specific primitive operations...), which I did in the comment above, using primops this proposal provides.

@simonmar
Copy link
Author

Is someone still working on this?

I still think this proposal is important, but my bandwidth to work on it is very limited so I'd be more than happy if someone else wanted to pick it up and drive it.

@sgraf812
Copy link
Contributor

From the perspective of GHC, it would be so good to have some form of this proposal implemented. I believe that we could have significant savings in allocations in GHC by using a transient (e.g. locally mutable and selectively frozen) IntMap data structure backing UniqFM/VarEnv/InScopeSet/whatnot. Each update/insertion into these structures cost us a full re-allocation of the spine. Very costly if they are huge! I think we are suffering death by a thousand papercuts.

Unfortunately, a good source level representation of opt-in mutability eludes me.

@santiweight
Copy link

Fwiw: we would use this at work if it were implemented! We need it because we have a system in which we have many small components which are being mutated at high rates. There is an overhead to StateT that is becoming too large for us, and IORef is also not great.

It's unclear to me whether Simon's original proposal or a linear-types-focused version would have better usability for our use case. The nice thing for us about a linear-types version is that it would the mutation would be able to live outside of IO.

possible to eliminate a layer of indirection in mutable data
structures in a type-safe way, where currently this can only be done
by using ``unsafeCoerce`` tricks; see for example `the structs
package <http://hackage.haskell.org/package/structs>`_.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package <http://hackage.haskell.org/package/structs>`_.
package <https://hackage.haskell.org/package/structs>`_.

As a special case, this proposal also provides for the declaration of
constructors with *identity*. That is, constructors that are created
by an explicitly effectful operation, and that can be compared using
pointer-equality. Currently only a few built-in types like ``IORef``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the pointer equality need to be a primitive, or can it be implemented with a magic type class and reallyUnsafePtrEquality#?

Comment on lines +86 to +88
- mutable fields *must not* have a strictness annotation. (we
anticipate that support for strictness annotations on mutable fields
will be a future proposal).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend having the default be strict, as that is more likely to be what people who care about performance actually want.


So, ``{-# UNPACK #-} !T`` cannot do anything if ``T`` is a type with
mutable constructors. However, ``UNPACK`` annotations can be used as
normal on immutable fields in the definition of a mutable constructor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer an error.


`Eq` is supported by using ``reallyUnsafePtrEquality#`` to compare
mutable constructors, but we must ensure that the constructors are
evaluated strictly in the same way as we do for ``dataToTag#``.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even see a reason why the Eq instance couldn't just compute structural equality! A field being mutable doesn't mean reading it is impossible. I certainly would advocate for deriving to have the same semantics as today if it doesn't lead to an error.

That would be ideal, and is what Rust does already, but it unfortunately violates referential transparency: reading from a mutable field must be done in a monad, and none is available in a call to Eq. Perhaps deriving Eq should be an error, with a separate type class for pointer equality?

Comment on lines +196 to +200
because the only way to make that work would be to copy the mutable
record ``MutPair`` into the ``MP`` constructor. This is (1) not what
we'd want for including inner objects as value types, and (2) ruins
the guarantee that ``UNPACK`` is a performance hint rather than
semantically important.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an imperative language background, it seems that this is what one would want, at least assuming that GHC can avoid the allocation in most cases. UNPACK might be the wrong pragma, though.


- Its constructor has the declared type
- It has identity, and equality is implemented using pointer equality
(see "Deriving" above).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are really no alternative implementations I can think of that do not violate referential transparency. One could make this an error, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs revision The proposal needs changes in response to shepherd or committee review feedback
Development

Successfully merging this pull request may close these issues.

None yet