Trans-Jurisdictional Well Types and the Concept of Translation
The project is at the stage now where it’s necessary to mesh data structures that were never designed to be meshed. Specifically, the mechanism used to code various characteristics of the well in each provincial jurisdiction varies significantly because there was little if any dialogue between the jurisdictions when the coding was created. But then again, you shouldn’t really expect there to be; each jurisdiction has their own particular agenda and population to serve. Whether other jurisdictions or populations are well served was at the very best would have been a secondary consideration, or was, more likely, of absolutely no concern at all, to those creating the codes.
It is therefore necessary to implement a structure that permits the mapping of one set of codes to the others. It occurred to me that this is really the same problem as the translation of human languages. In that problem domain (and setting aside the issue of different character sets, for the time being) a translation is a mapping of one string of words to another string of words. In the case of well coding, you are mapping one string of characters to another string of characters. While it may seem like overkill, I created a series of four tables that fully implement the translation concept. Interestingly enough, I believe it to generic enough to be used for the trans-jurisdiction coding problem which is the subject of this post, and the maintenance of human language translations as well.
base.Translation This table contains the strings of characters which form the basis of the translation; fromTxt contains the text from which you are translating, and toTxt is the text to which you are translating. Both are NVARCHAR(100). In the case of well codes, for example, the BC code for multi-zone gas is MGAS and would therefore be the fromTxt, and this is going to map to a generic substance code GAS, which in turn, will be the toTxt. But of course, these strings of characters are simple enough that overlap with other problem domains is fairly likely. Hence…
base.TranslationCollection An instance (ie. record) within this table is the parent of zero or more base.Translation records, and the two tables are related through Translation.id_TranslationClass >> TranslationClass.id. The combination of fromTxt and toTxt must be unique for a given instance of TranslationCollection. Continuing with the well code example from above, the TranslationCollection.nm would be BC. A second instance of TranslationCollection.nm will be MB. So any overlap between coding schemes of the two jurisdictions has been eliminated through the TranslationCollection instances to which a given Translation is related. The problem which results, though, is that BC and MB are likely values in a wide variety of scenarios, and thence…
base.TranslationCollectionClass This table groups of TranslationCollection instances through TranslationCollection.id_TranslationCollectionClass >> TranslationCollectionClass.id. Again, using the well code example, the two instances of TranslationCollection (BC, MB) would be the children of an instance of TranslationCollectionClass.nm of Produced Substance. TranslationCollection.nm must be unique for a given TranslationCollectionClass. There is still a slight possibility Produced Substance concept may conflict across problem domains. To address this, the table…
base.Schema identifies the problem domain to which a given TranslationCollectionClass belongs through TranslationCollectionClass.id_Schema >> Schema.id. To complete the example from above, the Schema.nm to which the Produced Substance instance of TranslationCollectionClass belongs would be Energy. The Schema could also be addressed through the its lbl, which is simply E. TranslationCollectionClass.nm must be unique for a given Schema, and both Schema.nm and Schema.lbl must be unique.
Sharp-eyed readers will notice that the foreign keys for TranslationCollection, TranslationCollectionClass and Schema are all based on an INT key id. However, a decision was made to use a UNIQUEIDENTIFIER uid primary key for the base.Translation table, because it’s felt that records in this table could well be developed on a distributed basis and brought together at some subsequent time. This will be particularly true if the structure described above is used for human languages, as opposed to short lists of codes.
Posted on 30th April 2008
Under: Developers' Journal | 1 Comment »