Friday, January 27, 2006

off-line for two weeks, latest updates

Going to Raipur, India for two weeks, and I will be off-line likely until I return.

Meanwhile, I've posted the latest version of AgileWiki3 at:
http://agilewiki.org/AgileWiki3/

The API for the cachefile package is shaping up, but still it has a ways to go. It will be a while 'till I'm ready to do the javadocs for this package. I expect to have it, and wdb, completed on my return. But its slow going, as any reasonable API takes time.

I'm also thinking of putting AgileWiki3 under subversion when I get back. (Comments? Suggestions?)

Till then,
Bill

Why I am so excited about doing a complete rewrite

I think of the AgileWiki as an application that runs on a rolonic framework. But the framework is TKCS and it leaves a whole lot to be desired.

The biggest problem here is that TKCS is far less rolonic than AgileWiki. So as my understanding of how to program in rolonics has grown, I've had to "make do" with a less than desirable framework. Which was fine--you need an application before you can understand what is needed in the framework.

TKCS is also a monster. A megabyte of meta-data just to configure AgileWiki. No one will ever use this framework, so it doesn't advance rolonic programming except for enabling me to write AgileWiki. Its also not very agile. :-)

My hope then is to write a much better framework based on the tables concept I've introduced. (Call them proto-rolons, if you will.) And with a much more appropriate API to program to, AgileWiki will hopefully become much more light-weight and agile too.

My goals then are:
  • Reimplement AgileWiki to support ordering more pervasively, in the order of Cabinets, Drawers and Folders, as well as much more fine-grained control over applicative context.
  • AgileWiki should run much much faster. The final versions were much more light on disk, so I can now concentrate on reducing the CPU load.
  • With a more appropriate API, the AgileWiki should be easier to understand/code/maintain. And be more flexible.
  • The new tables API should be a reasonable framework for programming rolonicly and should itself be realitively light-weight.
Meanwhile, right now I'm working on the CacheFile and related classes. I've got the functionality, so I'm in API polishing mode. (Actually, I'm carefully pulling down selected functionality from higher levels where it makes sense.)

The new thing is working with the Java TreeMap class. The code using it is pretty clean. I'm really looking forward to seeing some timing tests--I expect some very fine numbers--but this will need to wait until I've got the wdb class working.

Bill

Thursday, January 26, 2006

first draft of CacheFile in Java

You can find the first draft of CacheFile at http://agilewiki.org/AgileWiki3/src/aw3/wdb/

Its starting to shape up nicely. I've also decided to try to use java.util.TreeMap (red/black binary tree algorithm) to hold the content of a block. This should work a whole lot faster than doing inserts into a list.

One of the pain points here was creating/parsing the byte arrays for disk I/O. But at least its done with.

I have my hopes on a database that is light on both CPU and Disk I/O. But then, I'm a dreamer.
It should at least use less cpu than the Python code and also use no additional Disk I/O.

I've also worked out some nice caching logic for the new table concepts--caching in TKCS was very hit or miss, and a lot of things were simply not cached.

Bill

first draft completed for pool, cache

I've reworked the code and done the javadocs. Its pretty fast--almost 2 million allocations per second on my laptop.

You can find it all now at http://agilewiki.org/AgileWiki3/

aw3 pool and cache

Well, I've got some fast code working, but its only pool and cache.

Pool is a mechanism for reference counting. Something Python does for you, but in Java, if you want to avoid spending too much time in GC, you gotta do it yourself. I took the lazy way out here and made aw3.pool.Poolable a base class. And aw3.pool.Factory is a generic component factory which also maintains the pool of recycled components--'cause you also want to avoid creating new objects.

http://agilewiki.org/aw3/pool/

In the cache package, we've got Cachable, Cache and CacheMonitor. CacheMonitor is just a helper class for Cache. Cachable is a base class for cachable components (and extends Poolable). And Cache is a generic component cache manager which extends Factory.

http://agilewiki.org/aw3/cache/

A lot of work, but not to much to show for it. But at least I'm back to Java. And speed seems OK--my test loop runs about 10 Lack (10,00,000) a second. (Thats a million for you folk not in India.)

Usage note: when you fetch a component from the pool or cache, its reference count has already been incremented for you--remember to decriment it!.

I'll note that there is still some cleanup (and javadocs) left to do before I move on to wdb.

Bill

Wednesday, January 25, 2006

on the bright side

Well, guess I can still, after a fashon, code in Java. My half-written cache works.

See http://agilewiki.org/aw3/wdb/

Bill

Python is more fun

After 5 years of Python without typed variables, casting or templates, going back to Java isn't any fun.

I wanted to write a little cache logic. Got it half done, but haven't tested it yet. But it compiles cleanly!

Bill

Monday, January 23, 2006

design decisions for AW2

1. single writer, multiple readers

This keeps the transaction logic really simple. No locking. No file locks. And as the front ends are unaware, it can be changed later.

2. each interaction with the back end is a transaction

The advantage here is that transactions are confined to the back end--mucho reliability and stability.

The cost is that application logic must be installed on the back end.

Correction:

As genjnl wants to be a single transaction, the logic must be on the back end. But we'll have an operator front end for requesting genjnl, dump, snap, restore, dumpcabinet, snapcabinet and installcabinet.

Another front end: RSS reader?

Also for consideration: notifier for changes, published either as RSS feeds or email.

(Hey, I can dream, can't I?)

And you know, this all does NOT sound like a one man project. I sure could use some help.

Bill

High level architecture, AW2

Not everything need be in one program, if we've got a good dividing point, and I think I've got one.

On the back end:
  • wdb,
  • tables,
  • resolution engine,
  • dump/snap/restore,
  • command interpreter and
  • single writer, many reader flow control.

Front ends (more than one):

  • email
  • genjnl
  • user interface(s)

Make the back end Java. Define the interface well. And the front end can be any language or web server you find most appropriate.

Each interaction between the back end and a front end would be a transaction or a query.

Bill

How do you build a rolon?

A rolon (or a role) isn't an easy thing to build. I've been trying for 4+ years. AW topics are close but, for example, they lack fine control over applicative order. They're good enough to do a lot of things, but they build on TKCS, which is a bit of a mess.

A structure of tables seems to be what is needed, where a table is both a dictionary and an ordered list of key/value pairs. One of the tables in a rolon would be the classifiers, which would hold the parents, includes, tags, color, names and the DescriptorUnit. Entries in a classifier table would be distinguished by the key prefix. And yes, the list is ordered, not sorted. (There may actually be two classifier tables, with a separate one for non-"-" tags and color, as these do not impact the applicative order.) I've been calling these the normal tables. All the other tables referenced by a classifier table would be the top-level table of other rolons.

Another table would be the instance descriptors. Instance descriptors would, for example, control display and determine if a given type of children are ordered or sorted.

There would also be tables for the various types of children--i've been calling these tables the external tables, as they hold links to other rolons (i.e. the top-level table of other rolons) and define the non-cyclic graph of rolons.

Then there would be the top-level table (the root table for the rolon) which references all the other tables in the rolon. I've been calling this the internal table, because all its links to other tables are internal to the rolon. This is also where, for implementation purposes, I would keep the rolon name, though the name is really a classifier.

Of course, all this would be change based, where the actual state of the table depends on the given time (current or past). So the whole thing gets built on something like TKS.

And ipso facto, we have a pretty fair implimentation of basic rolonic theory. And the perfect thing to implement an AgileWiki (or just about anything you can imagine) on top of.

What did I leave out? The journal! This is a hard one to describe, as it is built on the transactional mechanism used by TKS. Basicly every transaction/JournalEntry is
  1. a set of changes, which are applied to various tables in one or more rolons, as well as
  2. the command which invoked those changes,
  3. the rolon which was the context for the command,
  4. the time of the change,
  5. a unique transaction identifier (a UUID), and
  6. the user who entered the command.

By Grorge, I think I've got it! A one-pager describing (at a high level, I'll grant you) how to implement a rolonic system.

I guess this wasn't such a bad day, after all.

bill

Java?

It occurs to me that moving to Java might be better than porting code to C/C++. It means the loss of iterators, which have been a great help. But the iterators are mostly used in TKCS (pervasively) and the reference resolution engine. If the former is replaced by TableDB, things look possible. And the later is more recursive than iterative.

This is a big change in attitude on my part, as a result of considering TableDB--there is no way TKCS could have been developed in Java.

Of course, things will be more difficult in Java. I've been spoiled by a higher-level language!

Bill

AW Database issues

Jerry, its good to hear from you again. Looks like I'm moving in the wrong direction from your perspective.

I've worked with relational data bases, 3rd normal form and all that, though it has been a while. I'm also slightly familiar with object mappings--Sun's tools provide some high-level tools to do the same. But the element that is missing from all this is time.

AW has three big aspects: a change-based database, an object model (TKCS) and the wiki user interface. I've been working with this stuff for 4+ years now and only slowly have I come to realize these distinct aspects.

For the database, I just don't see how it can be standardized. The mapping to bsddb really had way too much overhead. The biggest problem with the current implementation is that it (WDB) really needs to be ported to C or C++. (I've just finished the profiling. There are a few things I can do that will help, but I don't expect too much.)

The object model currently stinks. The new tables proposal would be a big help, while also reducing the database load for small wikis.

The wiki code itself is pretty rude and crude. My first priority is to clean up the transactional code, so I can gain some real robustness. But it is the name resolution engine which probably needs the most work. Plus we really need the master interface to make some of the more interesting features of AW accessible to more users.

And then there are the advanced features, which I don't even want to think about for a few months. But a lot depends here on how well the journal is implemented.

Too much methinks, for one person. As for the element of time, subversion probably handles it better than anything else, but it is still too simple a model for AW.

We're breaking a lot of new ground here, and I don't think needlessly. I am very happy with WDB, at least from the perspective of disk use. If we can fix the high CPU overhead, we would see some remarkable results.

More and more, I'm thinking I need to put the AW code under subversion.

On a more personal front, I may need to go to Raipur. There's been a death in the family and I feel that I am needed there.

Bill

Sunday, January 22, 2006

a fly in the ointment

Well, the AgileWiki IS faster... on my laptop, which has a 4500 RPM disk drive. Its also a lot more cpu intensive, while making far less use of disk. But my ISP is apparently running with a very slow CPU.

AgileWiki.org isn't doing so well.

:-(

Saturday, January 21, 2006

Late night thoughts

TKCS is a first-generation "do anything" database for intrinsicly persistent objects. AgileWiki is a "do everything" application which sits on this database. TKCS offers simple text properties, simple link properties, text set properties, link set properties, documents, lists of links and dictionaries. It offers internal links, normal links and external links, and an innovative persistent memory management system built on these link types. Plus it is change-based. And it takes about a megabyte of metadata to configure it for AgileWiki. But then, its a first generation that doesn't extend too much beyond graph theory.

I've been thinking of a second-generation approach that would be more powerful, retain the persistent memory management system, but require only a minimum of metadata. It would be based on 2-column tables, a key and a value.

A table is interesting. You can use the key to access the value, like a dictionary. But you can also order it, like a list. I would suggest having normal, internal and external tables, with internal tables having an associated text document (a blob).

Table names would be prefixed with a high-order token, >chr(127), to distinguish a text value from a link to a table. (Every table can contain a mix of links and text values) Further, that token prefix would itentify the type of table (normal, internal or external). And hey, that's the end of the metadata. :-)

Structures are comprised of a single internal table and the other (non-internal) tables referenced by it. And external tables can only reference internal tables. This keeps structures pretty simple, which is important 'cause they are user-level objects! (Well, not every structure needs to be accessible by every user--some would be configuration/descriptor data to drive the application logic.)

Every table must be referenced by either an internal or external table, except for the root table. Further, if you look at the links of all internal and external tables taken together, it forms a non-cyclic graph.

Let me know if you're interested in working on this. It sure would be nice to work with a group again. And it would be a good excuse to learn subversion. :-)

Oh yes, note that none of this would depend on compstrm or co-routines. But like TKS, it would be change-based.

Bill

things are lookin' mighty fine!

I'm ready to release 2.2.2.0. The de command, cr_ commands, logins and email processing are ever so much faster.

Its starting to look like a real produc!

(I'll release it after dinner tonight, I figure.)

Bill

a cabinet journal

There are only a few commands which will be made faster by 2.2.2: cr_ and de. Additionally the login events and email processing will benifit. However, the the transaction context property will make the journal more helpful if it is applied to all user commands which update the database.

We will also want a cabinet journal property which identifies the cabinet and date of a transaction. We can then implement a cjnl command to list the transactions for a cabinet for any given date.

For now I'm going ahead and updating those few commands/activities which will benifit from the increased speed and worry about the cabinet journal later. I expect this will be release 2.2.2.0.

Bill

everthing looks like its working

I've been running logs of dumps/dumpCabinets/restores/restoreCabinets in various combinations to shake out the little glitches in 2.2.2 and it looks like I've got them all.

At this point I want to add another property to hold the pathname of the transaction context. And then include that property in the jnl display. This will go a long way to making the journal more useful, as a transaction/JournalEntry is listed under all the Topics which it effects.

Friday, January 20, 2006

2.2.2 progressing nicely

Well, I got journals working again. (A bug I introduced in TKS.) I tried snap/restore and it works great.

But dump/restore does not work. Hopefully this will not be a biggie. If I can get this to work, then I start marching through the commands and speeding them up. (Including login events and email processing.)

Hmm. There's also a bug in the jnl see also's, as I recall. Gotta work on that, too.

moving on, faster and better

I'm calling 2.2.1.1 beta, but it could probably still use some testing. Meanwhile I've started on 2.2.2. Now I'm wrapping the entire genjnl as a single transaction and it is pleasntly fast. :D Still a bunch of debugging to do--I've broken journals at the moment. But all the low-level code is written.

Once 2.2.2 is working, I'll need to update every AW command that modifies the database. Fortunately they don't all need to be done at the same time. I expect to see the crc command speed up a bit more. (I'm already seeing it in genjnl.)

After 2.2.2, I've some ideas on how to make destroy much faster by enhancing TKI. This then will make it more reasonable to destroy large structures. (A very slow operation at the moment.)

Bill

2.2.1 ready for alpha release

Well, its all working, I just need to update a few html and script files for the release.

This will be an alpha release, 'cause there were just so many changes. But you gotta use it. If no one finds any problems, it will become beta. (I'm considering keeping the alpha designation for 2.2.2 as well, as that involves more deep changes.)

Note that 2.2.2 will speed updates a bit more, as well as increasing reliability, by moving transactions to the command level where they belong.

Bill

Thursday, January 19, 2006

on a more positive note

I'm testing genjnl and it really looks good.

Methinks destroy my be the last thing I need to fix before doing an alpha release. :-)

Bill

destroy

Well gosh, I recoded getPropertiesForRecords, tested it with snap and all is right with the world. Then I tested destroy. No go.

Destroy is one of the more complex parts of TKCS, and its been a while since I looked into it. I guess the time has come. :-]

I still expect to be able to release this weekend--there simply is not that much left that could go wrong!

Bill

good progress

I had changed the meaning of getPropertiesForRecord. This needs to be fixed.

Previously, getPropertiesForRecord returned all properties with a value or document assigned. At the moment, it only returns properties which have an assigned value. This is what caused problems with snap.

Now I've fixed snap by having it call getAllPropertiesForRecord--it works and it is faster. However, for things like destroy this will not work.

Snap is now working just fine. However, I still need to fix getPropertiesForRecord. And then there are a few more things I want to test. But we are getting close to a release--likely this weekend.

Bill

Wednesday, January 18, 2006

snap is looking guilty

I did a dump/restore and the content was fine. Then I did a snap/restore and lots of stuff is missing, including all up links.

So it looks like I'll be digging into snap. But I'm tired--might go to bed early tonight.

I'll be glad when this is all working.

Bill

more problems

I've been up and down so much on this project, you'd think it was a stock market investment!

All was fine, so I decided to change the read cache to 1024 buffers (at 16K bytes per buffer, that's 16MB--modist for today's computers), while leaving the write cache at 1MB (64 buffers).

So I did a dump/restore/snap/restore. Just playing with it. And then I noticed that much of the content is no longer accessible!

Hmm. More digging on the schedule, it looks like! Well, at least I had a happy morning. :-/

Bill

Finally!

Found a very "minor" error in btree.index. Fixed it. Everything is fine.

I've even done a dump/restore/run. Its all great.

I should be able to release this tonight.

Bill

Tuesday, January 17, 2006

this just keeps getting bigger

Well, I found out why dump was aborting--when I fixed one old problem, another one popped up. So I fixed that one too. Now load is more correct than ever...

But now when I do a dump, its not just the user file that is way to small, the data file has also shrunk. So whatever is happening, it's more consistent now. Is that good?

Enough for now. I'll be fresh in the morning and work on it then.

This is all very strange.

Bill

more problems

Well, I fixed the TKCS /Transactions directory. This was an old problem, and I'm glad to have found it. More, I now include code to automaticly repair the data when the problem is detected, thus protecting existing data. (It does however increase the size of .dmp file slightly.)

But now dump aborts with an error--perhaps a diagnostic I've added. The user.dmp file is still bad, but I'm now getting partial output for some reason.

So the user.dmp problem was likely unrelated to the TKCS /Transactions directory problem. Guess I've got a lot more digging to do. :-(

Bill

Monday, January 16, 2006

found something

Had a bit of time this morning and dug into TKCS. The reason for the dmp of user failing is that the TKCS /Transactions directory is foobar. Perhaps the problem is in the initial load.

At least it is a big problem, even if it is isolated. (Big problems are usually easier to find.)

Bill

Sunday, January 15, 2006

problems

I decided to test snap. Woops! The users .dmp file is broken.

Zee alpha release will be delayed. :-(

Bill

about ready for an alpha release

Made 2 fixes:

1. In WDB, items (get iterator) now works across puts by recreating its curser when necessary

2. In TKS, destroyProperty now handles document properties.

I've also changed blksize from 8K to 16K. Updates run a bit slower, but queries run faster. This means that both queries and updates run as good or better than the old TKS. (In the case of updates, much better!)

Things just seem to work, but it needs banging on. Guess that means its going to be an alpha release.

Bill

upgrading was easy, and TKS2 runs faster, too

I don't know why I had put it off for so long, it was so easy to upgrade Python, though I also had to go with a different download for Twisted...

With the new AgileWiki, Python2.4 will be required. (I've been using 2.4.1 for some time now at AgileWiki.org.) And yes, it runs much better with the new version!

Here's step by step for windows users:

1. download Python 2.4.2 from http://python.org/download/ This then installs in the python24 directory--USE THE DEFAULT!

2. change the windows path environment variable to reference python24 instead of python23. Bring up a command box, run python and verify that you are using the new version.

3. Go to http://twistedmatrix.com/products/download to get Twisted 1.3.0. Download the Windows Instaler for Python 2.4. And install.

With a broadband connection (yes, I've even got one in Bangalore), it takes less than 5 minutes to upgrade. Do it! :-)

first impressions

OK, here's the results of my first blush in using the new TKS with AgileWiki.

Right out of the box, the guess interface appears to be fully functional, but then it is pretty limited.

I fixed getDocument--it had a bug when there was no document. This fixed a lot of things, including login. Still there are bugs--delete, for example, does not work correctly.

Updates are faster, queries are slower. I'm obviously spending a lot of time in the bisect routines, but that can likely be fixed just by moving to Python 2.4--I'm currently running 2.3.5, and in 2.4 both bisect and heapq are coded in C. :-)

So its looking good so far, but with qualifications.

One drastic fix for queries would be to restrict references to only visible topics. (Currently the children of visible topics are also used to resolve references.) The problem with this is that Norm previously found this unworkable. However, that was before we added -tags for context inclusion. But this would be a last recourse, and might still be unworkable. As it is, namespace calculations are pretty heavy.

Another possibility would be improved caching of namespace results. This is already done to some extent. But lets not cross this bridge until we get there.

Saturday, January 14, 2006

some work ahead

I just fixed a bunch of errors in the TKS queries and, wow! The server displayed a page! Then I tried to login. It hung.

This thing is going to need a good shakedown. I guess the easy part is done.

Bill

most wonderful progress, some stats

No, its not all working yet. The web server is experiencing a few bugs. BUT!

I'm running with a blksize of 8K.
I restore'd my test environment in 7 min.
And then installContent took another 2 min.

Three files were produced:
  • AgileWiki.dmp --the new log file, 8.9MB,
  • AgileWiki.wdb --the database, 51.8MB and
  • AgileWiki.wdbb --the before-image file, 1.1MB.

Now the file sizes are just wonderful. And I'll note that the .dmp file is in the same form as other .dmp files--we could use it to build the database doing a modified restore. (But not good for migration to a new release, as it includes the AwDocs and AwiDocs cabinets.)

Timings are also looking good. But the real test, of course, comes after I get the wiki server running.

Now I really do need a break. I'll confess that I was quite nervious while the restore ran this first time.

Bill

better news, another milestone

I misreported--assigning a property/value takes .005 seconds, not .05. :D

And the new milestone is that I've finished a first cut of tks2! In the process, I've greatly simplified the code and may have made some queries a bit faster, too.

What's next? Integration! To date, things have been moving ahead much faster than I had anticipated.

But at this point, I'm in serious need of a break. All those combinatorics in TKS! Sheesh!

a milestone and some very fine news

I've now completed the transaction portion of tks2, so the next step is to recode all those queries in tks.

Meanwhile, a single value assignment transaction is taking less than .05 seconds--so we're still looking at an order of magnitude in speed improvement in the AgileWiki!

I would have thought those extra 6 indexes would have slowed things down much more, but so far we're in luck. Things are looking really good.

You will find wdb.py and tks2.py attached to the latest email in the AgileWiki journal.

Bill

Friday, January 13, 2006

good news--bisect

WDB now uses the bisect library. Performance no longer degrades with larger blocksizes.

Meanwhile, I've just posted the latest to the AgileWiki journal.

Bill

bad news, but not too bad

I realize now that we'll need 6 additonal indexes: RP, RV, PV, PR, VR and VP. These are needed to answer queries like, get me all the properties of a given record.

The good news is that these indexes will have no assigned values, where in g1 the value was the time/count of the last assignment. So updates will be far less frequent. And fortunately the queries are quite fast, so there will not be toooo much impact on update time. Mostly it just adds a tad to the complexity of tks2.

Bill

compression, wdb testing completed, weekend

I've now added compression (zlib, level 6) for long values (> blksize). That should help with large email attachments.

I've also completed testing wdb. The last two areas were long values and recovery. I'll likely post the (stable?) version to AgileWiki later today.

And with the weekend coming up, I hope to crank out a large chunk of tks2.py, especially as it is turning out to be ever so much shorter than the original. But its going to be interesting when I start working on all the supported queries--hopefully the four indexes I'm reimplementing (RPTC, RVTC, PVTC and TC) will prove adequate.

Bill

a faster AgileWiki

I'm able to create now 500 transactions per second, where each transaction contains a document. I'm now figuring we will get an order of magnitude increase in speed in AgileWiki with tks2.

I'm quite encouraged.

Thursday, January 12, 2006

started tks2

Did some additional work on wdb, reworking the logic that determines when a before image is written. Then finally started playing with tks, g2.

Didn't get to far, of course. :-) But I did the code to write the .dmp file header and wrote transaction and commit methods, complete with logging. Tested and working.

I expect to focus initially on document properties, as they are unindexed. This will also test long values in wdb--one area I have not yest tested.

But its late for me. Time to hit the hay. I'll post his code in the morning, IST. (IST=India Standard Time)

Bill

Wednesday, January 11, 2006

tks on wdb will be much simpler

I'm looking at TKI, and so much of the design was either to overcome limitations of bsddb, or to avoid deadlock--and AgileWiki is single writer, so that's not an issue either.

We need only 3 indexes, I believe, and all 3 can be combined in one database by using a prefix character.

Looks like the footprint on disk will be 1% of the current version (due to the log files) and with fewer indexes, there will be fewer inserts and it should run a lot faster (but likely not 100 times faster).

And yes, methinks it is time to start playing with a TKI replacement. :-)

Bill

wdb, third cut

http://agilewiki.org/wiki/uuid/aI-DsBkGtGYhd1CKY8mPbBEw

Spend the whole day on this. Looking good, if not fully tested.

I've also reworked it quite a bit to work well with longer keys. All in all, its pretty solid.

I did find that larger block sizes slows it down. 512 is a good number for very short keys (which is what I've done the initial tuning with). I think the slow down (in insert speed) may have a lot to do with list insertion, as this version avoids working with strings as much as possible.

Bill

wdb, second cut

Good progress on the worm database. Here's the second cut:
http://agilewiki.org/wiki/uuid/HdGvXx1XL6sRxDya4EUVYGGy

Seems pretty snappy too. I can do about 3,000 transactions/sec on my laptop with a slow disk (one insertion per tran).

Now that the basic logic is working, I also want to add a write cache to reduce multiple writes of the same block within the scope of a transaction.

I also need to add some threadlocking on get and prior. (Updates are single-threaded, so no problem with transactions.)

Tuesday, January 10, 2006

progress on wdb

worm-write once, read many
wdb-worm database

My new btree is starting to work. Should be through the initial debugging phase soon.

Then I've got to make sure this thing is reasonably fast for a reasonably large dataset.
Say 100,000 random numbers?

Its been a lot of fun so far.

Bill

a crude worm btree

I've written a crude transactional write once read many btree for use by tki. Here's the first cut:

http://agilewiki.org/wiki/uuid/0lloVIKPgOvU8OsfnyjuAXjV

It does no logging, but it does use a before-image file. It has only get, prior and put operations, and put is append only.

The thought here is that tki will do its own logging, in the form of a .dmp file, so we can use the restore script to rebuild a broken database from the log file.

But I have yet to do any testing.

I like python--you can do a lot in less than 400 lines of code. :-)

Sunday, January 08, 2006

Its own kind of fun!

I've been working on documenting the RuntimeEnvironment of the AgileWiki. This will be the RuntimeEnvironment Drawer in AwiDocs. Using the AgileWiki for this is its own kind of fun. :-}

Its interesting because, as with any Wiki, I'm having to develop a vocabulary of WikiWords to pull it all together. And with multiple, interconnected layers, I suspect it will keep on being fun.

Having another kind of fun, too. The AgileWiki project has a SourceForge mailing list, compstrm-wiki@lists.sourceforge.net, which I'm now journaling (along with the forum posts) in the AgileWiki Cabinet. (See https://lists.sourceforge.net/lists/listinfo/compstrm-wiki to subscribe.) Gosh, now I've got yet another means of publishing! Overload! Overload!

Bill

2.2.0.1--a small start

With the release of 2.2.0.1, we now have a place for internal documentation--AwiDocs. Now the serious work begins.

One thing I've never documented is the 2-stage bootstrap needed to create the initial database. I also want to cover the utilities briefly, as the gettingstarted page is more of a howto than a reference.

And then there's all that old documentation that could be integrated a bit better. Hmm.

Saturday, January 07, 2006

AwiDocs

AgileWiki 2.1 is complete and the directories have been reorganized. Now before diving into a rewrite of tks, it seems appropriate to rework the internal documentation a bit, with a careful eye on tks itself.

How much time on internal documentation? Perhaps a week? Lets see how it goes.

Now I want to keep AwDocs for AgileWiki user documentation. So I'm creating a new Cabinet, AwiDocs, for internal documentation. This new cabinet will be excluded from the snap and dump scripts (like AwDocs), and loaded by the installContent script as well.

I also want to spend some time with the higher-level documentation. I'm sure the directory reorganization has ravaged parts of it.

You know, its been a while since I spent time with the internal docs. Been having too much fun with AgileWiki, methinks!

Friday, January 06, 2006

plans for the weekend

My expectation is to finish email filters and then complete the reorganization of the directories. This will make it easier to move internal documentation into the AgileWiki itself.

I'm also considering a second cabinet, AwiDocs, as the place to put internal documentation--I want to clearly distinguish between user and internal documentation. This second cabinet then would be shipped with each release and installed by the installContent script.

Bill

the moment I've been waiting for

Comment from Jerry Spicklemire:

OK then, this is the moment I've been waiting for. Using the Python DB API is the most flexible way to open up the storage layer. That will leave the door open for folks who want to use alternative DBMS backends to leverage all the existing drivers. Can you point me to the TKCS modules that deal with the bsddb piece?

Actually, what I've done is integrated bsddb as a service into twistd:
  1. First I created a tiered service in twcs/services/tieredservice.py.
  2. Then I created a bsddb service in twcs/services/bsddbservices.p.
  3. Finally I configured bsddb for tks in tks/dbconfig.py.

Note that bsddb is NOT a relational database. Indeed, its used as the backend of MySql. I use it to build a natively persistent object model in tks and tkcs.

Note also that I'm building a streaming system, where responses to a query may be quite long--but only calculated incrimentally as needed and often discarded after the first few parts. This is quite differrent from a relational DB interface which delivers the entire response in one getgo.

Bill

Wednesday, January 04, 2006

email filters for AgileWiki

We have a large part of an email client (receive only) in AgileWiki. The biggest thing missing (other than sending) is email filtering.

This should be added to 2.1.4.

Bill

reorganizing directories

The directory structure of AgileWiki has been stable for some time now. But I want to change a few things. In fact, I want to turn it inside out! AgileWiki is a web server, so why not put all the source and internal documentation in its htdocs directory?

So here's the change:
  • AgileWiki2.1/tkcs/tests becomes the top-level directory;
  • AgileWiki2.1 then is placed under tests/htdocs;
  • AgileWiki2.1 becomes the src directory; and
  • tests becomes the AgileWiki directory.

One consequence of this is that we can eliminate the crark script.

Another consequence is that we can set the PYTHONPATH variable within the various scripts, allowing the same system to host multiple versions of AgileWiki.

But a more interesting consequence is that we then have the option of migrating, over time, the internal documentation into the AwDocs cabinet.

Tuesday, January 03, 2006

with some excitement

Norm and I have both now posted test journal entries to AgileWiki.org. And feel very very good about it all. We now have journals we can "blog" to, as it were. But more, these are wiki pages which can, via tags and the use of -Match, easily define a specific vocabulary of wiki words for each email posting.

More to come, of course. The next release should include the ability to specify those tags within the body of the email itself.

Bill

posting email to AgileWiki.org

To post an email to AgileWiki.org, just mail to ark@agilewiki.org. Adding a {wikipathname} to the subject then allows you to post to any topic in that Ark.

Note that the ark@compstrm.org account no longer exists.

Bill

Monday, January 02, 2006

email routing today

Currently the way incoming email is routed by the AgileWiki is pretty simple. I just looks for a WikiPathname {in braces}. If it can find the referenced topic, THAT is where it puts the email. Otherwise it journals it at Etc/$cd/Email.

To this we want to add an exception--if the destination is a cabinet which has a calendar, put the email in $cw relative to that cabinet.

Meanwhile, it looks like I took today off from writing AgileWiki code. But this small change should not take too long to implement.

Bill

Sunday, January 01, 2006

back to the future?

I guess the next step is to add journaling and tag support for emails. I know Norm is excited about this--journaling is the way he prefers to operate, and he loves using email as a means of input.

For myself, I'm tired and I miss my wife. So I got another book to read.