Wednesday, September 29, 2004

Jasmine Status

Andreas and I are working toward a release of "Jasmine" for Saturday. The major difference is that the TeaTime architecture will not be part of it. Why? This is a very good and deep question. The major reason is that Solar works so incredibly well, and everything we have done so far that uses a TeaTime/TeaParty like architecture is surprisingly slow, except for David Reed's version, which isn't quite done. So what we are doing is the following:

Moving Solar to the Squeak 3.6 image. This gives us a better
environment to develop in. Andreas wants to skip 3.7 and move the
system directly to 3.8 when that is available.

Upgrade the OpenGL rendering model. This is a much nicer interface.
Also, the OpenGL object is only passed to the rendering object at
render time, hence there is no need for global updates of it when it
changes. Overall, a big win. This also means that the initialize
methods are greatly simplified and since #initialize is now always
called by default inside of #new, this clean up the code as well.

Keep the current Solar #step model. This is the major difference
between Solar Croquet and Mad Hatter (as far as the component code is
concerned).

Upgrade the remote object construction. Currently, this does not allow
nested calls to metaConstruct. For that matter, any nested meta is a
bad thing. This change will allow you to call any meta inside of any
other, hopefully without too much trouble. This SHOULD lead to making
the 3D CAD system I built (see below) truly collaborative.

Synchronization of TObjects. This is everything from copying your
avatar over to the remote machine to copying entire spaces. This may be
in a later release, if we can't quite get it done this week.

Multiple TeaParties. This depends upon the synchronization. Currently,
Solar can only deal with a single TeaParty at a time.

Morphic,Linux, Windows, Mac 2D collaboration. This will be done using
the remote frame buffer (RFB) courtesy of Ian Piumerta. The way this
works is you have a single master component that renders the 2D buffer
and ships it over to the slave components on the remote machines.

From a component view there is one MAJOR difference between Jasmine and
Mad Hatter. In Jasmine, the programmer will have to explicitly deal
with synchronization. That is, the system will not automatically ensure
that messages are sent and executed at the right time, for that matter,
it doesn't guarantee that they are sent at all. It is totally up to the
programmer via the meta messaging architecture I did. There are pluses
and minuses to this approach. On the one hand, the programmer has a lot
more control over what is communicated and when between replicants. On
the other, the programmer has to take control of this and must have a
much deeper understanding of the nature of how the object should stay
synchronized. I did a search for "meta" and "metaSend" and was
surprised by how few of these messages actually get sent. But of
course, it isn't the number of them, but their strategic location in
the code. Further, we won't support multiple teaparties to start, but
should have them quite soon.

I am not certain about this, but I believe that components written in
Jasmine will port to Mad Hatter with no changes and will work
essentially identically, however, there are a number of components that
employ a different strategy between these two models. First, there is a
push update model, as employed by the Mars Rover. This is a
master/slave architecture, where the master broadcasts the current
rover state to the slaves. In this case, the master is determined by
whoever interacted with it last. The second is a pull model, not yet
implemented, that could be employed by the flag (a 3D flag waving in
the breeze). The way this would work is the flag would not update
unless it was being rendered. When you first see it, it will send a
message to all of the peers in the teaparty asking for whoever last
touched it and getting that value. There are some other different
approaches to this as well. Also, time synchronization is much less of
a priority in Jasmine. Messages are still time stamped and sorted, but
they are executed immediately. I plan to implement my simple time model
on top of this, which should dramatically improve robustness.

Tuesday, September 28, 2004

Possible Change

Andreas is coming out today to help work on Jasmine. He and I spoke
about a different approach to synchronization that is more along the
lines of the Solar Croquet architecture. Solar uses a key word, called
"meta" that is used to send messages remotely. The programmer is much
more responsible for how an object acts collaboratively.

The previous example of how a window works demonstrates the difference.
In the new Mad Hatter TeaTime model, the entire TeaParty, or container
of all the objects in a shared environment, is really the basis of
collaboration. That is, any message sent via the TeaParty is
replicated. This tends to be much higher order messages like events
(pointerDown, keyDown, etc). This gives the programmer less control,
but it also allows him to not worry about managing the replication of
state. The objects all run lock-step across all of the users - totally
deterministic.

In the Solar model, the object is totally responsible itself for
guaranteeing synchronization. This means that the programmer needs to
determine which key pieces of state need to be shared. When an event
occurs, the object can act on the event and simply share the result, or
the object can in turn send another message that replicates the
computation. The programmer decides. The advantage is that the
programmer can design an object that is quite efficient. The
disadvantage is that the programmer needs to have a very good
understanding of the nature of the communication - what is important
and what is not.

So the question is - do we trust the programmer?

If we do move forward on the Solar/meta approach, we would need to
seriously upgrade the world synchronization capability of the system.
There are two issues, one is replication of construction - new objects,
and the other is replication of world state, replication of existing
objects upon joining a TeaParty. Currently, our model of replication of
construction is pretty bad. I think I have a way to make this work
nicely, though. We have no replication of world state at all - though I
think this too can be handled nicely.

This would mark a major divergence between Jasmine and Mad Hatter. I
think it is a good idea to build Jasmine this way if only to explore
this further.

Sunday, September 26, 2004

Pointer/Avatar/Camera change

Much of this is quite technical and total gibberish to anyone who
doesn't know anything about the architecture of Croquet. As we get
closer to release, I intend to spend some time writing more useful
notes for people interested in developing for the system. For now, I am
just using this forum so that people can keep track of our progress.
Please feel free to comment, or ask questions. I will endeavor to reply
as quickly as I can.

-----

I have modified the system to generate a TPointer inside of the
replicated TAvatar. The way this works is the original TUserCamera
still has it's own, non-replicated, pointer. This is used to determine
which object is being interacted with at render time. Once this object
is determined, when an event is triggered (pointerDown, up, etc) we
send the TSelection object via the replicated TAvatar, as well as the
event itself. It then transfers the TSelection to it's replicated
TPointer and sends the requested event to it.

There are a number of reasons to NOT just put this pointer into the
array of TRays that the TSpace manages.

First, there is a need to traverse portals. The TCamera manages this
nicely, and it makes little sense to do such a complex transform
multiple times, with the additional overhead of querying the target to
determine if it is a portal and acting accordingly. This may change, as
I may want objects to be able to fall through portals that might be on
the ground, but for now, this is not a priority.

Second, the TPointer can only act on objects that are visible to the
camera, hence there is a significant paring of the tree to a small
subset of the objects in the environment that we need to deal with.

The replicated TPointer has it's own replicated TCamera which is a
non-functioning camera. It maintains a number of key values including
the bounds of the users screen, the viewAngle, and the current camera
transform. These are values that are used for some of the actions that
occur in the world such as the TAvatar jumping to a window. The
distance from the window is determined by these two values and could be
different from each user.

There is still a problem with the avatar after it enters a remote space
that is not fixed by this change set, but I think we are closer to
fixing it with this change.

I also moved the TRay tests into a separate teatime based iteration.
This fixes a big problem that we have glossed over. Since the ray tests
were only being performed at render time, if we have different machines
rendering, this could cause divergent behaviors of things that depended
upon these rays, such as the mars rover.

Saturday, September 25, 2004

Interesting problem

David and I are putting the TPointer into the same TeaParty as the
TAvatar. The reason for this is that currently the TPointer is NOT
replicated, hence we are sending far more messages than we should if it
were. Once it is replicated, then the vast majority of its messages
will be sent locally, hence will be quite efficient. This in itself is
a good thing and will be part of both Mad Hatter and Jasmine.

In making this change of putting the TPointer into the same TParty as
the TAvatar, a problem I had ignored until now becomes a bit more
apparent. In Solar, all "selections" of objects occur locally. That is,
when I select a window, or for that matter generate a pointerEnter
message, this only occurs on the local version of the TWindow. Hence,
it is only highlighted locally. In some ways, this simplifies the idea
of selection and manipulation, as only the local guy is involved. Of
course, if any deep changes occur, then the rest of the world needs to
know about this, which is why we did the meta sends.

Of course, Mad Hatter and Jasmine don't work this way. Instead - every
message that is sent to a replicated object is itself replicated among
all of them. This means that when I have a pointerEnter event, this
message is sent to all of the target objects. For a TWindow to work
properly now, it must be aware that multiple users may be able to work
with it, and now IT must manage all of the users state - where before
it needed to manage only whether a single
pointerEnter/pointerOver/pointerLeave and a single
pointerDown/pointerMove/pointerUp.

Why is this a problem? We now need to redesign the window (and
virtually all of the other TObjects) to either service all of the
events on a first come/first serve lock-out everyone else until
released, or we need to keep track of all of users currently
interacting with the object. For example, in Solar, if I select a
windows drag area on one machine and you select it on another, this
works nicely because we are just sending updated location information
to each of the windows. We have a nice tug-of-war demonstrating
robustness. With the new model, the only way I can see this working in
this case is whoever gets to the window first controls it completely.
Some events can still be handled properly, like key presses, but most
act over time, like dragging, drawing a line, etc. In the case of the
TWindow, to drag, it keeps track of the camera's normal vector so that
it can drag perpendicular to this. Multiple cameras from different
locations and orientations would make this fail, possibly dramatically.

So what to do?

Some objects would require a lock-out to work properly, usually between
a pointerDown and pointerUp. The pointerEnter and pointerLeave should
NOT require a lock-out because these usually not deep modifications,
but they would require some kind of reference count. That is, we need
to track how many pointerEnter's have occurred and match these with the
pointerLeaves. In the case of the window, we hilite on the first
pointerEnter, and unhilite on the last pointerLeave. No matter what,
this is more complex code.

The lockout may not be that big of a problem as long as everyone knows
that that is the situation. For ICE, what I did was when a lockout
occurred, I hilited the object in red to show that you could not touch
it. This is actually something that can be done locally with something
like the following:

render: ogl

downPointer = ogl camera pointer ifTrue:[
self hilite: go color.]
ifFalse:[
self hilite: stop color].

render object...

Not pretty, but it was quite effective in ICE from a UI point of view.
A very clear indication of what the state of the object is, which
avoids confusion. The fact is, my perspective on the world IS different
from yours. What I can and can't do IS different from what you can and
can't, and this MUST be made explicit.

The pointerEnter/pointerLeave reference count and the
pointerDown/pointerUp lockout is a pretty simple pattern to implement.
If anyone has a better idea, feel free to let me know.

Friday, September 24, 2004

Unfortunate Formatting

I have been posting to this via email for the most part which has an
unfortunate side effect of adding line breaks in the wrong place. I am
pretty sure that it is because the email program I use - or the one
that receives on the other end, forces line breaks for a certain width
line, and of course the blogger app does as well. There seems to be a
slight mismatch of the algorithms used however. I suspect that the
email uses something like <= x characters, where the blog uses < x. If
the line happens to be exactly x characters long, an extra word wrap
will be forced. I don't expect people to read this, so I am not going
to worry about it - but just in case you were wondering ... there it
is.

Thursday, September 23, 2004

Mad Hatter Fixes

There were two bugs that we found. The first was related to the fact
that the openGL texture ids were being copied to the other machine.
This meant that when the TForm - the object that holds the glID
attempted to render, it assumed that it was a valid number, and since
in fact this texture had never been instanced on this new machine, it
would fail. This was fixed (as well as the problems with TPrimitives)
by adding a fixup on the receiving side to nil out these openGL
reference values. So now the textures are properly visible.

David never actually saw this problem because he was either running
multiple versions of Croquet in the same image - hence a guarantee that
it would use the same OpenGL id, or he ran one machine as a headless
Croquet server - that is, it never actually rendered anything, and the
other rendering, hence the openGL texture id was never even set, so
when it was copied from the server to the rendering machine, the id
value was indeed nil.

The other problem has to do with the fact that the TPointer object, the
thing we use to determine what we are pointing at - is not actually
replicated. This causes some nasty issues because the TPointer queries
virtually every object at render time to determine if it is selected
or not. What we decided to do here was move all of the TRay tests out
of the rendering loop, which is actually a good thing, and move the
TPointer into the same TeaParty as the TAvatar. This leads to one other
slight complexity and that is that the TOverlaySpaces also need a
TPointer and it really can't be the same one as the replicated version,
hence every overlayed TSpace will also have it's own TPointer. The job
of the TUserCamera will be to arbitrate which of these pointers gets
the events that are sent. It is actually a bit simpler this way, though
it took a bit of thought to figure out what to do. The problem is
understanding how TeaTime and TeaParties really work.

A funny thing I found was that TWindow had a reference to a instance
value called pointerXY (which is the x,y location of the 2D cursor on
the screen), which it grabbed from the TPointer on a #pointerDown. This
was actually never used anywhere inside the TWindow, and it was the
only place that the pointerXY was ever asked for from the TPointer. I
can't imagine what I was trying to do with that value, but it is
interesting how this kind of cruft can find it's way into a system.

Fixing Mad Hatter

Today I am visiting with David R to fix up Mad Hatter. David thinks
that these bugs should be quite easy to fix. Hope he is right.

Currently David is working on a modification to the Squeak garbage
collector parameters. What is neat is that the GC actually has these
parameters available to modify. It turns out (according to David) that
these values were set based upon extremely small memory footprints and
relatively slow machines. Since we have significantly more RAM to work
with these days (none of my machines has less than 512 Meg) and good
virtual memory, we can actually perform GCs less often. What David is
doing is modifying a max allocation counter that keeps track of the
number of allocations before a GC based upon the time it takes to
perform a GC with the goal of having it be less than 10 milliseconds.
This is the value that is required to ensure that multimedia
applications, like sound and video, don't have any interrupt. It does
seem to have a good impact on the performance of Croquet.

Tuesday, September 21, 2004

Mad Hatter Progress

I tried out the newly named "Mad Hatter" version of Croquet again
today. This is the name that David R and I have given the TeaTime
version of the system that he has been working on. The good news is
that much of it seems to be working. The transfer of simple worlds
between my PC and Mac seemed to go well. I found a number of problems,
none of which appear to be serious, but show-stoppers for the time
being nonetheless.

- When my avatar entered the remote world the remote machine received
an error in the TStandIn for the avatar. A TStandIn is just a simple
representative of the object that is created as soon as a message is
sent to that non-existent object on the remote machine. The TStandIn is
placed inside of the world and both fields the incoming messages to the
object and begins the process of loading the object from its originator
machine. I would have thought that by definition the TStandIn could not
signal an error because it is supposed to be a universal message
handler. I guess I was wrong about that.
- For fun, I tried running a small world on my Mac and a huge world on
my PC. This failed on the PC somewhere inside of the morphic objects
that are hung inside the world. In fact, I don't think that these
objects are supposed to be sent, so I may have some obsolete code here.
- When I removed the morphs in the world, I got another error. This was
with just a TFlag and a TMyCube. Not much to transfer at all. This may
be a message synchronization problem as the TFlag is constantly sending
future messages to itself.
- The other problem, and one that I am quite a bit more concerned about
is with the speed of data transfer between machines. I do have some
strategies for dealing with this (see below with the imposters) but I
am not quite certain just how much of a problem we have to overcome
until I get the whole thing working.

I will be seeing David tomorrow and already sent him the error report.

I also sent a message to Andreas about the current speed issues with
the TParty object (not to be confused with the TeaParty object - though
of course they will be).

I also received a message from Mark McCahill that he should have a
version of the object caching system done tomorrow. This will be
essential for both Mad Hatter and Jasmine.

Not much I can do on the Mad Hatter front, so back to Jasmine...

Monday, September 20, 2004

Croquet Status

Things are looking up for a release very soon. First, David Reed looks
like he has a real candidate for the developer release. I just spoke
with him and his major concern is that we beat on it a bit before
launching just to make sure there is nothing major. I will be doing
that tonight and through the rest of the week. Also, Andreas just sent
me a change set for his TParty class that should fix the freeze-up I
saw with the previous version.

I am going to Boston tomorrow and will probably have lunch with David
at Mary Chung's (my favorite Chinese restaurant). Also, I will be
spending all of Thursday with him working on Croquet. This week looks
quite promising.

Sunday, September 19, 2004

Croquet: Things to do

Things to do:
Jasmine
Jasmine is a simpler version of the Croquet architecture intended to
act as a stop gap until the TeaTime version is released. It may be
unnecessary, but I think a lot of the ideas in it will be recycled into
the TeaTime version anyway. I worked hard to make it so that both
versions use the same object models, so any changes in one should be
easily replicated in the other.

- Get the TParty architecture to work properly. Currently waiting for Andreas Raab to fix a bug that causes the system to freeze up. The TParty is actually what the E Language people refer to as a VAT . That is, a collection of objects in a computational subworld that you can only have indirect access to. There are a number of very nice things that Andreas did to make this work. First, every Vat has it's own
independent process, which should make it quite easy to add my lightweight TTime model to it. Second, the entire Vat is easily checkpointed, or streamed out, and resurrected.
- When we do a checkpoint of a TParty, we need to replace the external
TParty references with something useful when we ultimately resurrect
the TParty. Currently, there are three kinds of objects that we need
to deal with that exist in external TParties. The first is the TAvatar
- or the representation of the users in the space. These objects are
transient, hence they need to be simply removed upon a checkpoint. The
second are the portals leading to external spaces. We will be replacing
these with an imposter TSpace object inside its own TParty that looks
like the original, but is much lighter weight. Think of it as a 3D
snapshot. The third kind of object are the TForms inside of the
TTextures. These need to be replaced by the TForm thumbnails when we
checkpoint.
- The imposter space maintains a reference to the real space on
another machine. Until the imposter is forced to be rendered (or
touched in any way by the user), it begins the process of downloading
the real object space using it's internal reference. Once the real
space is loaded, the imposter space will replace itself with the real
version. The imposter space lives in its own TParty.
- The thumbnails work the same way. They live in their own TParty.
When they are used in any way, either rendered as an object, or used as
a texture on another object, the thumbnails begin the process of
loading the real texture.
- Since textures are static objects, we can cache them in a global
server somewhere. That is, when you construct a space and use a new
texture in that space, this texture can be written to an external
server. When the space that uses this texture is shared, a reference to
the original texture on this server is also shared. When a texture is
downloaded from the server, it is placed both into the space replacing
the thumbnail and onto a local cache on the disk.
- The next big thing is the actual TTime model. This is well described
in a white paper I wrote, which is unpublished. I need to reread it
before posting anywhere, but this will proceed without the rest of the
system.

TeaTime
This is the "real" version of Croquet. It looks like it may see the
light of day before I actually complete Jasmine, which would be
wonderful - though I intend to continue work on Jasmine if only because
it is a simpler model of the system.
- TeaTime requires the same texture caching mechanism as described
above for Jasmine.
- Need to test out a more massive environment to determine the cost of
synchronization.
- Fix up all of the demos. Rework them into the appropriate
Teaparties. Each demo space really needs to be in its own TeaParty.
- Debugging of course.

Documentation
The current documentation for developing in Croquet is in very poor
state. I need to completely rethink this - though I would like to have
something in the next three or four weeks.

Walkthrough in Croquet

One of the things I have been thinking about is how I might develop a true 3D design tool inside of Croquet. When I wrote Virtus Walkthrough, I had a rather constrained set of capabilities dictated by the 2D user interface paradigm introduced by the Mac and by MacPaint and MacDraw. The goal became that of figuring out how to extend the existing interface to include a realtime 3D space. This was accomplished by having the tools and the design view be all in 2D, which looked very much like MacDraw, and adding a 3D first person view inside of another window. The user would design the space in the 2D view, and explore it in the 3D view. This wasn't bad, but it meant that you always had to jump back and forth between 2D and 3D, which was a pain.

Here is a picture of it:



We started adding some direct manipulation in the 3D world in later versions of the system, in fact, when we developed OpenSpace, the intended successor of WalkThrough, we left out the 2D manipulation and design entireley. This also proved to be suboptimal. When I first described the idea of removing the 2D view entirely, a friend of mine working at Disney Imagineering said that would be a huge mistake. The 2D view acted as a map that we quite easy to understand and work with. This proved to be the case with OpenSpace. Using it as simply a layout animation tool was great for small non-environment based projects, but it didn't really lend itself to world building. Of course, it really didn't have any design tools either, so this might not have been an issue. But with Croquet - everything is 3D. I think that the key idea here is not to abandon the 2D, but figure out a nice way to interleave it with the 3D. That is, the 2D becomes a true working surface that can be viewed either flat or in context with the extruded 3D data set. The first efforts on this were the Wicket demo I built:



I like this a lot, but the big problem is that I don't really have a tool palette to work with here. This is a more subtle problem than you might think, as the palette needs to live somewhere, without cluttering the collaborative design of the model. Further, everything in Croquet is an independent object. This means that any tool developed needs to communicate with the model via it's messages. But I need to be able to extend the capabilities of the tools independently of the capabilities of the model that I am working on. This problem leads to some pretty crufty approaches in the code. I have not seen anything that works the way I would want.

Saturday, September 18, 2004

Croquet Personal Blog - A Start

Hi there. This is just an experiment to see if I can sustain a blog about the Croquet Project - or anything else I feel like writing about. I decided to give this a try because a friend of mine - Takashi Yamamiya has started a blog using blogger and it seemed pretty easy to start up.

The main thrust of this blog will be technical discussions about Croquet. For more information about Croquet, see:
Croquet Project.