Wednesday, December 29, 2004

Alive and well and living in Bangalore

Bangalore was, fortunately, uneffected by receint disasters, as it is not close to the coast of India. Things are calm here, though not joyful.

Rupali and I signed a lease last night for a nice 2-bedroom appartment in a quiet area, less than a killometer from the office. Lots of other things are falling into place as well, and I've finally assembled the paperwork for foreign registration. Tonight our furnature should be shipped from Raipur--I just hope it gets here soon.

My initial assignment at Sun is to become familiar with portals (and related components). I've seen a lot of this before, but that was a few years ago when I was working with Anil.

Portals are interesting, as they are the inverse implementation of what is intended with CompStrm. A lot of the features and processes have analogs (or intended analogs) in CompStrm.

Now with internet access only from the office for the time being, I'm inclined to put off further development of CompStrm for a while yet.

Finally, I'd like to note that CompStrm itself is an attempt to build a computing platform that meets the base requirements of a former client. His blog can be found at http://ontologist.blogspot.com/

Tuesday, December 21, 2004

Assignment of Copyright

I have confirmed with Thelma Nanappa in Human Resources here that when publishing, I must assign the copyright to Sun Microsystems.

So the copyright for all entries in this blog, and all work done on CompStrm, starting with 20-DEC-2004, are hereby assigned to Sun Microsystems Pvt Ltd.

Monday, December 20, 2004

First day on the new job

First day here. Everything is very nice, everyone very helpful.

Signed non-disclosure and inventor agreements. Very simple--everything I do belongs to them.

Friday, December 17, 2004

Busy in Raipur, contact info

Between the jet lag, packing for the move to Bangalore and visits to relatives, I've managed to keep moderately busy here in Raipur.

Our phone here (when dialing from the US) is 011-91-771-242-8133, an upstairs line (wireless local loop, actually) shared with my brother Anil.

Sunday we will be flying down to Bangalore, and will stay at the Richmond Hotel for about 2 weeks. 011-91-802-223-3666

Wednesday, December 15, 2004

Alive and well in Raipur

I am very happy to once again be in Raipur. It has been 4 months, and it is very good to see my wife Rupali again.

Rupali and I will be traveling to Bangalore on Sunday. Until then, we are staying with her folks. I arrived Monday, but hardly even remember Tuesday. Today finally I am just about recovered from the trip and the 10.5 hr time change.

Saturday, December 11, 2004

rls 0450a--a simple .dmp file analyzer

Well, its been over a week, but then there it is, rls 0450a:

New command added to base: shDmp.

Sharing intentions as well as Changes

Sharing changes doesn't always work. Suppose User A deletes something in a shared dataset. The changes are passed to user B in the form of a .dmp file and everything is updated. Or not!

The problem is that user B may have an unshared dataset that contains references to what user A deleted. Now this unshared dataset will have broken links (i.e. non-symbolic references). What to do?

When loading a file, we should look for evidence that a file has been destroyed. This intention is likely indicated by the destruction of a parent property (removal of a file from a directory). Once we see that, all references to the file should be destroyed.

Friday, December 10, 2004

All set to go

Things have been happening quickly. I got the documents I was waiting for, applied for and was issued an Indian employment visa. Leaving Saturday for Mumbai.

The down side is that I've been a bit too busy to write code. And once I leave for India, it will likely be quite a while before I have any time again.

Wednesday, December 08, 2004

In Philadelphia, time to code?

Well, I'm now on the first leg of my trip back to India. Yesterday I left State College, where I had stayed with my father for 4 months. I'm in Philadelphia now, staying with my daughter Genny, and waiting for a shipment of documents from Sun, India.

I'll go to NYC to apply for an employment visa once I have the documents in hand. Meanwhile I have a quiet place to work and not much else to do. So it might be a good time to work up that dump file analyzer.

Tuesday, December 07, 2004

Creating well known files

Creating a file using TKI just means running the "cr" command on the appropriate directory and providing a file name. But when using the web interface, we may want to do something different.

Directories are really just a type designation, so we might not want to have a user create command tied to the directory. Instead, it seems more appropriate to have it on a DataSet file. Indeed, we may want to require that all WellKnown files are members of one or more DataSets.

But now a circularity issue--if DataSet files are also WellKnown, how do we create DataSets? Hmm. If a DataSet can hold other DataSet files, then we have a partial answer. Now we only need to have a top-level DataSet file. And we could create the top-level DataSet file using TKI.

(Remember, we want to support TKI operations on WellKnown files.)

OK then! We put the create command for WellKnown files on DataSets. (Oh. Guess we haven't defined how we will define web commands on files. More blogging yet to do.) This new create command would require the WellKnownName (preferably unique within the dataset, but not required to be) and the type (a directory of WellKnown files).

Now, do we want to constrain the types of files a DataSet can hold? Possibly. But that can be difficult to enforce with shared data, unless we put that constraint on a dataset descriptor. And that is basicly a "logic" change that requires changes to metadata and would be harder to share than ordinary user data. So I'm inclined to avoid having this kind of constraint.

Monday, December 06, 2004

Analyzing dump files

I'm thinking that a .dmp file analyzer would be a great tool. Probably implement it as a command on the Base capability, so it could be run from anywhere.

A first draft should address those security concerns I've been blogging about by listing all the directories where the files reside that are modified by the dump file.

Beyond that, pleanty of room for extra features. Counts of the number of files that are changed in each directory would be a great second draft.

Of course, it should list all the (classifier) information in the (somewhat obscured) dump header. The idea of course is to give the administrator a good idea of what a particular dump file will do to the database. Right now, you gotta just guess based on the file name (or try to look at the strange content).

Sunday, December 05, 2004

sharing vs .dmp files (Security Issues)

TKS (Time Knowledge Store) is the backend of TKCS, and keeps a transaction log of property changes, rather than property values, where each transaction deals with the changes for a single file (TKS record). Using these changes and related change indexes, TKS can quickly calculate the value(s) of a property for any given time.

TKCS dump files (.dmp) are little more than a collection of transactions. Restoring a .dmp file is done by reprocessing the transactions to update the TKS indexes. But there is virtually no validation, so .dmp files are useful only when the source is trusted.

Dump file loading is almost good enough for distributing applications (scripts and metadata), that really can't be validated very effectively in any case. I would only add a mechanism to restrict a load operation to files in selected directories, these directories being declared in the file header and viewable before starting the load.

But will this be good enough for sharing data between users? Here the element of trust may be entirely lacking. How well can we validate a .dmp file? I have a mixed answer.

We can restrict load processing to specific directories, as mentioned above. And TKCS has been carefully designed to clearly separate metadata from ordinary data, as metadata changes made by an untrused source are far more risky. However, a user might use TKI to make changes that an application can't handle. And there is currently in TKCS no validation of state. So sharing data by way of .dmp files could result in strange data that causes unexpected behavior. Is this good enough?

I think it is good enough. Scripts and other logic must be prepared for bad data, which can be caused by any number of causes, including bugs. But it is still a long way from perfect. Not every user will be able to use TKI to clean up bad data. But then, you can always dump the good data and restore it to a new database OR an advanced user can share a .dmp file that fixes the problem.

Now a better solution might be to share user-level commands rather than low-level changes. These commands can be validated as they are executed. Data fixes can still be handled by sharing .dmp files with a trusted user, of course. But this all starts to get a bit messy when the data is build by a combination of user commands, TKI commands, conversion scripts and shared .dmp files.

My conclusion here is that .dmp files are a reasonable, if not perfect, means of sharing data between users--so long as we can restrict updates to selected directories.

The DataSet Capability

I would like to introduce a DataSet capability to replace the Realm capability.

The idea of a DataSet is to be able to define a collection of WellKnown files, possibly of different types. Included in this capability would be the DataSetItem property, which would reference a set of WellKnown files.

There are several differences between Realms, as currently used, and DataSets.
  1. Realms were subordinate to an application, while a DataSet may reference files from more than one application.
  2. DataSets may not include Metadata.

The DataSet capability, like the WellKnown capability, is intended to support the sharing of data between users.

I'll also note that files with the DataSet capability can also have the WellKnown capability, facilitating the easy creation of a heirarchy of data collections. Also, a WellKnown file can belong to more than one DataSet.

Saturday, December 04, 2004

WellKnown vs Role

My intention is to drop the Role capability in favor of a new WellKnown capability. There are several differences:
  1. The WellKnownName property will allow for duplicate names.
  2. WellKnown files will have unique (generated) file names.
  3. WellKnown files will not include metadata.

The reason for these differences it that WellKnown files are intended to be shared. Files in the /Applications directory are not shared (as user data), as a different trust model is needed, so they should not have the WellKnown capability. Files in the /Loaded directory should also not be WellKnown, as they reflect the local system state and should not be shared.


The WellKnown Capability

Metadata (like TKCS Descriptors, Capabilities, Operators, Properties and Scripts) is very different from data, though I am always surprised when I learn of another way where they differ. Data can be validated, but it is difficult or impossible to write code to validate metadata. The trust model for data and metadata are quite different. And metadata can be made to conform simple hierarchies, but this is often not the case for data.

In TKCS, there is a requirement that every file have a unique name within a directory. (I call this unique by type, as each TKCS directory holds files of only one type.) This restriction works fine for metadata. But for data, especially when combining data from multiple sources, there may be no way to ensure that names are unique. And when dealing with collections of files that may be of more than one type, uniqueness by type is simply not interesting.

So I think it would be helpful to introduce a new capability, WellKnown. WellKnown files would have a user-assigned name that is distinct from the file name, and the WellKnown name need not be unique. The existing TKCS examples would need to be reworked to use WellKnown names, with generated (assured unique) names used as the file names. Of course, the examples might now need to handle duplicate names, but that is a complication that can be overlooked for all but the Wiki example.

When reworking the Wiki, the WikiTopic property can be dropped in favor of the WellKnownName property.

Friday, December 03, 2004

active wiki

Wiki's are very interesting, as they allow for the submission of content and organizing it. But it is largely passive.

TKCS supports type definitions and operations, allowing for the development of a very active system.

Building an extensible Wiki over TKCS then gives us facilities for defining an active system and easily managing its content. The key, I believe, is to allow for different types of Wiki pages, with various operations (and content) being defined for each type of page.

Moving back to India

Ah, the first blog entry. What to say? What not to say?

I'm getting ready to move to Banglore, India as a staff engineer for Sun Microsystems. Anyway, they have given me an offer and I have accepted. It should be a nice change of pace--no programming but lots of mentoring.

Before this I've always worked as a programmer and sometimes project lead/manager. Hopefully this job will still give me time to work on my current sourceforge project:
http://compstrm.sourceforge.net