Wednesday, March 21, 2007

Proposed collectable data interchange format

I see Jay has made an interesting proposal to create a new mailing list with the purpose to specifically design a standard interchange format. I am in full agreement with this idea hope to see this new list formed soon.

Until then I will respond to his opening technical points here.

On Wed, 21 Mar 2007 15:25:50 -0500, Jay West wrote:
Each site who wants to participate would (in an automated fashion)
periodically (nightly?) run a program or process that takes data about their
collection items and puts it into this standard record format. The format
has things like document name, title, description, document type, owner
site, http vs. ftp, URL, FTP address, email contact, date, key words,
categories, distribution allowed, etc. For example, a program could pretty
easily be created that would take all the documents on bitsavers and puts
them into this standard record format. Then the source systems send this
record dump (or deltas from a previous dump) to the classiccmp server.

The classiccmp server takes all these records from all the various sites and
puts them together into a single database internally, and also provides a
seamless mechanism (http, and yes... gopher, archie?) for people to search
the database or browse based on given criteria. It looks like one database.
But when a user tries to pull up one of the specific entries it is actually
redirected to the sponsoring systems server to get the data.

This way each system can keep their classic data (jpegs, pdfs, disk images,
whatever) in the format they are already using without changing anything.
They just need to have something that takes their format/sources and puts it
into the standard format which is then sync'd to

Provisions could be made to the standard record format to address all types
of media, allow some items to be listed as "present" but "unavailable" for
things that can't be released due to copyright issues (but at least people
would know it had been preserved).

The first thing that stands out with this idea, is the reliance on a central repository. I think that we would be better off looking at more distributed system where an add, update, delete transaction record was polled from ones peers more like an RSS or for that matter a news feed comes to mind.

more as things develop ..

Computer Collectables

Computer Collectables

As a Computer Collector and a dedicated lurker and occasional OT(off topic) Supporter of the Classic Computing Mailing List, there has been a lot of chatter about the future of Classic Computers and both the recent and impending loss of collections and archives thought by some to be important to the hobby and history. Gallery was an attempt to build such a site was based on based on Gallery 1.x. The nice thing about the subject of Collectables is that the material never ages, while it may not always be fresh and new, what has been saved will always be current no matter the age until updated much like a Wiki.

I have moved my discussion of this subject here to for two primary reasons, one to help Jay get the cctalk mailing list back under control and on topic, while at the same time continuing my personal experimentation with public Blogs and their ease of use for something more than a bedpan in a mental hygiene ward. Thats not to say this one is or will be any different, only time will tell for sure.

For those readers not technically minded feel free to skip ahead, I get technical and use a lot of computer jargon in this Muttering, without shame. So where to start ....

A response to Mike Steins last OT: question/comment to me on the list:
>Any thoughts yet about how to organize it?

As a matter of fact yes, I have thought a lot about it lately. I think it will be "Lead, Follow, Or Get Out Of My Way" for quite a while until there there is a workable model or a budget established. I have a vision of a system where each contributer or collector will build and maintain their gallery if items. These items can take many forms both physical and as virtual images of all sorts. One thing they have in common is they primary consist of a picture or image along with a descriptive text associated with each item. When a physical object is pictured, its primary location, status, and availability, will be associated with the item. Each item can be associated with one or more Blog entries.

From a development point of view, what is easy to implement will be most likely be hard to maintain while an approach that might be hard to implement should be easily maintained.

I have a good idea of what I want to see and how I want it to work, the question is can I afford to take the time it requires to build my vision, or wait for others to get closer with their attempts at their visions and see if I can live with them.

The current and most likely to succeeds plan is based on my hacking and merging several open source tools/projects. Most of the grander designs require a business plans, tax exempt status, or at the least a budget with a paycheck to work with, would be nice.

How to build it to be scalable, mirrorable, and distributed in nature with the limited resources at hand is the current question.

I am currently working on the underlying Data Base and portability concerns.

My current DB short list include Mysql, IBM DB2 (on an AS400), or Codebase a classic DB3/Clipper/Foxpro compliant library. MySql is free, The AS400 approach requires a donation and a lot of outside help, and I already own a developers copy of Codebase.

If I use PHP and Mysql, I expect to find a lot of qualified help in setting it up and hacking it into shape.

If I use LISP and Codebase then the chances of codeing help is greatly reduced. This project would become become my pet application for years to come. I would take a lot more eBay clicks throughs to support a NEP (never ending project) with out a client or paycheck. Not to say I am not considering it.

I am currently building mostly in my head this week a Wiki that incorporates much of the Gallery project's Gallery/Album editing structure along with an yet to be chosen Blog interface. I can do this cutting and pasting a lot of open source stuff into a MySQL heap but mirroring and distributed processing might be a real chore to make work.

Where if I take the "BDUM" (Brain Dead User Mode) approach and use near flat files with indexes in DB3 style, it is easy to build a transactional system that mirrors nicely, It is just a lot of work to code it from scratch.

I am convinced the key to its success and usefulness is in the cross-indexing and mirroring of distributed data in a structure accessible and maintainable by the collective each irresponsible for their own contributions. The Users will have to simply ignoring the short guy behind the curtains pulling the strings until there are bots to do all the grunt work.

Who, What, When, Why, and Wow
That is for further study and comment...