[ oxygene - zen and the art of data ]


	unstructured storage for unstructured data In reality most data we have is unstructured. We may try to impose structure on it (for example by storing it in a database table), but by doing so we lose certain bits of information which don't fit the requirements of the structure. Oxygene approaches the problem differently. Instead of defining strict tables within a database, records may contain any arbitrary number of 'attributes'. All attributes are indexed, and retrieval of records is similar in speed to traditional structured databases. unstructuring your data Take, for example, an Oxygene database of a website with one image, one HTML file and a movie: record one: Title: "exampleImage.jpeg" Keywords: "example image jpeg keywordOne keywordTwo" Type: "JPEG" Width: 400 Height: 200 DPI: 72 Size: 30276 record two: Name: "examplePage.html" Title: "An example web page" Text: "This is an example web site with an example image and movie." Size: 92 Type: "HTML" Date Created: "11th October 2003" record three: Name: "exampleMovie.mov" Type: "MOV" Copyright: "Copyleft 2004 William Cannings" Keywords: "example movie mov copyleft william cannings keywordOne keywordTwo" Width: 640 Height: 480 Size: 267983 Although all three records contain an entirely different structure, Oxygene is able to store and retrieve the records as easily as a structured database. A search for "example" would match all three records, whilst "web page" would only match record two and "movie" would only match record three. where it's at At version 0.1 Oxygene is already mostly complete. Adding, retrieving, removing, editing and searching of records is implemented. Record attributes may be a string or integer. All string attributes are full text indexed (making it useful for web site indexing or searching over documents on a disk). In reality, Oxygene is perfectly useable right now, only a few minor additions need to be made (sorting and ranking) for it to be version one complete. A listing of other planned features (for after version one) is below. where it's going Future features planned: moving from a b-tree index to something faster, adding other types of attributes (floating-point, image, list etc.), compression of records and indexes, implementation of stemming and collocations, topic boundary segmentation, automatic query expansion (possibly using Probabilistic Latent Semantic Indexing), ranked results, sorting of results and an improved query language (possibly a modified SQL variant). licence Using Oxygene in your own projects or on its own requires no credit or copyright as the project is in the public domain. True freedom doesn't limit anybody from using anything anyway they need.