Archive for June, 2008

Traction

Of course, every ‘eureka‘ tends to be followed by a few things which bring you back to earth.  The first attempt to load the entire set of ST1s, right through to present, went deceptively well.  Turns out, though, the use of T-SQL TRY...CATCH syntax without a RAISERROR statement in the CATCH block was swallowing up errors.  Including ones which needed attention.  On the bright side, though, ERCB.dtsx still read a vast majority of the ST1s, and at the end of it all, there were something like 150,000+ unique wells in the database.  But most importantly, it really felt like there was traction on the problem — this Alberta data is going to be on Google Earth sooner, rather than later.

There were also a few surprises with the source data downloaded from the ERCB.  The big one was the ‘reiterated’ records described in That Light at the End of the Tunnel… turned out to be a passing trend.  The last one showed up was 2004-03-19 (reiterationEndDt).  After this, the ERCB went back to the earlier approach of stating the amendment information at the bottom of the ST1 only.  Actually, there was exactly one more, on 2007-08-01, but something funny was going on because there were other parts of the same record which appeared to have been edited ‘by hand’.  In all of the ST1s, there was no other reference to the license number (0380046), so the August oddball was simply edited from AMENDMENT to NEW.   Similarly, there was exactly one situation where a given license (0024639) referenced two different wells.  Well, two different producing zones, but they have different unique identifiers.    These bits of ERCB whimsy almost brings a tear to my eye.  Or maybe it just makes we want to cry, I’m not sure.

Along the same lines, in addition to NEW, AMENDMENT, RESUMPTION and RE-ENTRY identified as possible values for the WELL PURPOSE field on the ST1, another one showed up; DEEPENING, which appeared in 51 ST1s.  TRAINING showed up as a well type exactly once, as well.  And there was some truly duplicate data (license 0302041), a few six digit license numbers and some incomplete well identifiers, but nothing that couldn’t be sorted out, one way or another.

On another note, the primary key for E._WellDetail is uid_Well, at the end of the table, which seems like a break from the standard.  It’s not, really, and thus warrants a little explanation.  There’s a one-to-one relationship between E.Well and E._WellDetail, so uid_Well is both the foreign key back to E.Well, and the primary key.  Instances of E._WellDetail exist solely at the pleasure of E.Well.  For that reason, the trigger E.WellInsertTrg creates a stub instance of E._WellDetail when an instance of E.Well is created.  The foreign key is configured so a deletion of an E.Well instance cascades onto E._WellDetail.  It should never be necessary to INSERT or DELETE an instance of E._WellDetail directly.  So why have two tables, then?  Simple — keeping normalized and denormalized data separate.

Code Shavings  In addition the above, a few other miscellaneous items;  The seq column was added was added to ERCB.LicenseRecord table, and was made a computed column.  It looks at statusCd, and if it’s an AMD (amended) , it changes seq to 1.  If it’s CNC, it changes seq to 2. Otherwise, it’s zero.  It is used to ensure the amendment and cancellation records are sequenced correctly in the ERCB.LicenseRecordRecent.  Speaking of which, the latter is the VIEW which looks at ERCB.LicenseRecord and filters out all but the most recent (hence the name) instance for each well license.  The scalar-valued, user-defined function ERCB.getLicenseRecordAmt() returns the total number of records which need to be handled by ERCB.putLicenseRecord.  If it returns zero, there’s nothing to do, and the rest of the logic in the stored procedure can be skipped.

Posted on 27th June 2008
Under: Developers' Journal | No Comments »

Inspiration

Actually, this is a link to a UNESCO site, but really, it was just the image I was after.  It seemed like a good presentation of the 'eureka' moment described in the post.  But, by all means, go and take a look at the linked site -- there's some pretty interesting stuff there.As I drove home last night, it suddenly struck me I was pursuing two objectives which are somewhat in conflict with each other.  One is to read, parse and normalize (RPN) the ST1 into a series of discrete events which would form part of a comprehensive history of the well.  The other — and in some respects, more important in the short term — was to know the current state of the well, as a result of those events having occurred.  The derived, current state of the well could then be used to create KML, spreadsheets or whatever format of data is required.   To this point, I had always assumed the latter would be built once the former is complete.  However, that doesn’t necessarily have to be case.  It’s possible to skip directly to the desired end result — the light goes on.

The stored procedure ERCB.putLicenseRecord reads the ST1, and creates records in the ERCB.LicenseRecord table.  What’s Old is New Again talked about not emptying out ERCB.LicenseRecord after each ST1 is read.  Rather, all license records extracted from the ST1 are accumulated in ERCB.LicenseRecord.  With a few minor modifications (such as incorporating license cancellation events), all that is required is to simply retrieve the most current license record (ie. MAX(dt)) in LicenseRecord, and you have the current state of the well.  In other words, the denormalized table E._WellDetail  — which was being used to reflect the current state of the well to aid in the RPN process — can be populated directly from ERCB.LicenseRecord.  The RPN process can be skipped, for now.  The temporary downside is lack of the discrete historical events which brought the well to its current state.  But this is something the initial round of users of the data won’t care about that much.  And, of course, the history can be built up after the fact.

E._WellDetail was first introduced as a concept at the beginning of this month in Finishing Off the Load of (Most) 2003 ST1s… where it was employed to dramatically improve the speed at which the KMLs were generated.  So how perfect is that?  In nearly one fell swoop, it will be possible to go from the raw ST1 right to the process of generating the KML, simply by using ERCB.LicenseRecord to populate E._WellDetail directly.  So, as we speak, the ERCB.dtsx package has been reconfigured slightly so it reads each ST1 in turn, and calls E.putLicenseRecord.  A new stored procedure — ERCB.put_WellDetail — then grabs the most recent license record for each well, and creates instances of E._WellDetailEureka.

Posted on 27th June 2008
Under: Developers' Journal | 2 Comments »

Alberta ERCB ST1 Load, Round 5

Every time it seems like the end is in sight with respect to the load of ERCB ST1s, something comes up to set things back a little.  But each time, a little more is learned about the structure the ST1, particularly as one relates to the next.  Round 5, Session 2 of the ST1 load is currently in progress –  for ST1s issued in 2002 — with statistics to this point shown in the chart at the left.  Each round of loads involves purging the previous data using the E.purgeSessionContent stored procedure, and then starting the load over again.  This stored procedure examines each table, and using the standard uid_Session_create column, determines if a given instance of a table relates to a particular instance of base.Session.  If it does, it gets deleted.   Similar to createUdt and modifyUdt (based on the standards established in www.butzi.ca/tech*), each table in the Intellog database has a uid_Session_create and uid_Session_modify column that relates a given instance of a given table to an instance of base.Session.  Of course, this all hinges on a base.Session instance (or instances) being created for each round of loads.

One of the significant accomplishments since the last post is the implementation of detailed logging so it’s possible to analyze the impact of loading each individual ST1.  The stored procedure ERCB.putLog, and the VIEW ERCB.LogDetail work together to provide this functionality.  putLog creates an XML (validated with iSentence) with instance counts for various tables and views.   ERCB.LogDetail extracts the individual elements in the XML into a columnar format which can be analyzed with a spreadsheet, or similar tool.

Each attempt to load the entire set of ST1s (2,731 as of this writing) has resulted in one of two outcomes, at least so far.  One is degradation of performance once the volume of data increases, or acceptable performance with some sort of logical flaw in the data.  In either case, the outcome is usually not known until a significant amount of data gets loaded.  The logging process described above is intended to provide additional information to spot and resolve these two types of problems that arise, so the next attempt to load can begin. 

Specifically, performance challenges seem to crop up with either ERCB.putLocus and/or ERCB.putEvent which are two of the more complex stored procedures.  When a problem with either one of them is spotted, it’s an opportunity to open them up and see what can be done to improve their performance.  This usually is resolved through the caching of data in some sort of temporary table, or reviewing indexing of the tables.

Previously, E.WellIdentified() and E.WellAffected() were discussed.  These are table-valued, user-defined functions which return identification and event information respectively.  A similar function was created — E.WellLocated() — which returns the geographic location (elevation, latitude and longitude) of a well.  Currently, there is only one type of location defined (SFC, or surface), but it will work just as well once other location types have been established, and populated.  These three functions work great, and seem to have decent performance, with one significant exception; JOINing uid_Well to uid_Well between two of these function results in very poor performance, something which will have to be remedied in the future.

As each round of loads is in progress, there is certainly time to review the data successfully loaded to that point.  One significant area of interest is whether wells where geographic locations are amended are ‘relocated’ correctly.  At the point they were checked, something like 375 locations had been amended in this manner.  These were projected onto Google Earth, and each well reviewed in this context.  About 70 did not appear to map through to any underlying artifact.  The possibility exists the lat/long calculations associated with these wells is flawed, but there are other explanations as well.  For the time being, however, it’s reasonably safe to say the lat/long calculation incorporated into the Intellog database appears to be working reasonably well.

*In the interest of full disclosure the author of the www.butzi.ca/tech is the same guy who writes this blog.

Posted on 25th June 2008
Under: Developers' Journal | No Comments »

Overhauling E.putEvent, the Creation of E.WellAffected(), and the Data Doughnut

After the changes described in the previous post were implemented, the performance of ERCB.putEvent stored procedure deteriorated.  This was the last holdout when it came to achieving decent performance for the load of the entire ST1.  So putEvent was pulled apart and re-implemented, making liberal use of the new E.WellIdentified() table-valued, user-defined function.  The other problem with ERCB.putEvent was it couldn’t be re-run without creating duplicate events.  So, while the putEvent stored procedure was open, it was decided to fix this, as well.

The success of the E.WellIdentified() function for returning the current identities of the well led to the creation of E.WellAffected();  this table-valued, user-defined function is similar in that it takes the name of the TypeCollection as an input parameter, and returns a table with the current state of the well with respect to the specified TypeCollection.  It also returns along with the date it achieved that state, and a uid_Well back to the well itself.  What WellIdentified() is to identities, WellAffected() is to states.

Hence, the new definition of a well is a confluence of it’s current identities, it’s current states, and the geographic location associated with it.  Or stated another way, in conventional database design, the Well table would have been the centre of the diagram, containing most of the information about the well.  Ancillary tables arranged around this table’s logical perimeter would provide subordinate functionality such as integrity checks and the like.  In the Intellog database design, the Well table contains little more than the uid, and the previously subordinate tables actually take over in terms of the definition of the well.  It’s a data doughnut…lots of good stuff around the outside, and not much (if anything) in the middle.

In order to make them more robust, TRY...CATCH syntax was implemented for all of the stored procedures.  At least those which require multiple statements.  This implies that the entire transaction succeeds or fails, and never leaves the database in an inconsistent state.  For example, in the example illustrated in What Happens When You Break Your Own Rules?,  instances created in E.Well and base.Identity that are not linked through an instance of E.identifies are orphans.  So you really want all three INSERT statement to succeed — or fail — together.

Now, back to testing the complete sequence of load events associated with the ST1, to ensure everything is hanging together.

Posted on 23rd June 2008
Under: Developers' Journal | 1 Comment »

Reworking ERCB.putWell to Reflect Identity-identifies-Well Structure

What appeared to be a minor problem with ERCB.relabelWell resulted in a reasonably significant change, as described in the previous post, and likely leads to the early retirement of relabelWell.  ‘Relabelling’ the well will now consist of simply inserting a new Identity instance, and associating it to the well through E.identifies.  Because there is no definitive structure within E.Well to visibly identify the well to human eyes, two columns were added to help; dsc and xml (which is validated with iSentence). 

As a result of the change, above, it was necessary to rework ERCB.putWell fairly significantly; it now does everything necessary to create instances of E.Well, E.identifies and base.Identity for the license number (licenseLbl) identity for the well.  Furthermore it rolls these actions together as a single database TRANSACTION, controlled with TRY-CATCH T-SQL syntax.  What results, when ERCB.putWell completes, is a combination of Identity-identifies-Well instances which uniquely identifies the well, and the license number under which it was created and/or amended.

The rest of the work involved going through the ERCB.dtsx package to ensure it was still working properly after the change described above.  However, as a result of the changes, the procedures are now running too slowly again, and the next thing on the agenda will be to rework them again to get the speed necessary to load all of the ST1s.

To assist with the testing, a simple stored procedure called base.putSession was created.  It simply closes the current open session and updates the description (if the @oldDsc parameter is populated), and opens a new session, and give is the description found in @newDsc.

Also, a new table, called ATS._SectionBenchmark, was created to contain a snapshot of the ATS.SectionBenchmark(’NE’) table-valued, user-defined function.  This function had been used to retrieve benchmark latitude/longitudes from the ATS.CoordinateFile table, and the large number of records was significantly slowing down the performance of the ERCB.Point view.

Finally, the E.WellIdentified() table-valued, user-defined function was created.  It takes the nm of the Identity type, and returns a table containing uid_Well and the most current txt for that particular type of Identity.  For example, the syntax…

SELECT
     *
FROM
     E.WellIdentified('licenseLbl')

results in a table consisting of the licenseLbl and uid_Well, but only the most recent version for each well.

Posted on 20th June 2008
Under: Developers' Journal | 1 Comment »

What Happens When You Break Your Own Rules?

I’ve learned with database design if a problem seems particularly intractable, there is a pretty good chance there is some sort of fundamental flaw in the underlying design which has been overlooked.  This turned out to be the case with the stored procedure ERCB.relabelWell, and the VIEW on which it is built, ERCB.Well.  These structures certainly represented evolutionary thinking, but in the end, they could not paper over a fundamental flaw in the approach I was taking with respect to the label (UWI) being associated with the well.

To understand the problem, it’s worth revisiting an aspect of the fundamental structure of the Intellog database design, as illustrated in the diagram to the left.  The E.Well and base.Identity tables are related through E.identifies, which resolves a many-to-many relationship between the two*.  Many wells can be associated with a given identity, and many identities can be associated with a given well.  But if you take a closer look, you will also see E.identifies contains udt (UTC date/time), and id_Type columns.  udt enables date effective identities, and id_Type enables an unlimited number of identity types.  For example, well names can change — in fact, they can change many times.  A given instance of E.Well is related through instances of E.identifies to all of those different names found in the base.Identity table.  Want to know what the current name for the well is?  Find the most recent identifies.udt for the well in question, and now you know.  Want to know what the name of the well was a year ago, prior to the last three name changes?  Look for the most recent identifies.udt which is also more than a year old, and there it is.  Decide you need to maintain a ‘familiar’ name for the well, in addition to its legal name?  No problem.  Simply identify a new group of E.identifies instances through a new id_Type.

The fundamental flaw, mentioned above, was this approach wasn’t applied consistently to the three different identities being maintained for each well; name (wellNm), well label or UWI (lbl), and the license number (licenseLbl).  The structure shown in the diagram was being applied to wellNm and licenseLbl, but not to lbl.  In the case of the latter, I simply could not get my head around a structure for E.Well which did not include at least one column which identified the well in some way.  Hence, the original existence of the E.Well.lbl column.  But the truth is, the problems which kept cropping up when the lbl column changed with a UNIQUE ID: type amendment, were a direct result of not following the standard structure.  So, today was the day to rework things slightly so the three identities for each well are handled in precisely the same way.  If you look closely in the E.Well table in the diagram, you will see the lbl column has been renamed to lblx.  This is to determine what breaks when lbl no longer exists within E.Well.  Once those issues have been resolved, lblx gets removed entirely, to be replace with entries in E.identifies and base.Identity.

*But you knew that, because the naming of this table is based on a transitive verb, right?

Posted on 20th June 2008
Under: Developers' Journal | 1 Comment »

More Tweaking, Tuning and Testing of the ERCB ST1 Load

Continuing the work from the previous post, it was determined the approach to the amendment of geographic locations — as implemented with the ERCB.putAmendment stored procedure — was probably obsolete in light of the new method of accumulating ST1 records in ERCB.LicenseRecord.  Amendments to geographic locations needs to be implemented in ERCB.putLocus, not ERCB.putAmendment.  This is because ERCB.putLocus doesn’t just create instances of base.Locus as its name implies, but also associates them with instances of base.Point.  The BFO* was to check for the existence of a Locus using Locus.nm, prior to doing anything else.  If the Locus exists, then delete any existing instance of base.delimits linking that particular Locus to base.Point, and then re-create it.  The logic in putLocus will JOIN to the correct, current Point instances, leaving the old Points as orphans, which can subsequently be deleted (and are).

A small problem with ERCB.relabelWell had to be resolved by changing the ERCB.putWell stored procedure.  The latter has to load the well using its old label, first, and then let ERCB.relabelWell do it’s thing giving the well the new lbl.  Otherwise, ERCB.relabelWell results in a duplication error in the unique index for Well.lbl.  Also, looks like the population of E.identifies.udt somehow got overlooked in the ERBC.putIdentifies stored procedure, so that was fixed.

With these issues dealt with, a return to methodical testing was in order.   Each cycle consists of loading an ST1 as far as the ERCB.putLicenseRecord stored procedure, and then looking at what amendments are pending for the current ST1.  These are copied to an Excel spreadsheet (Testing.xls).  Then the entire load is run, and each well that has an amendment applied check manually to see that it’s correct.

Even taking all of the above into account, a successful load through the end of 2001-01 was completed, and all the amendments checked out correctly.  Given there is at least a chance the development of the logic is finished, it was decided to purge all Alberta data right back to 2001-01-01, and start the load over, this time for 2001-01, 2001-02 and 2001-03.  When that is complete, 2001-02 and 2001-03 amendments will be checked before proceeding any further.

Also, a funny thing happened with the load of the WELL0115.txt ST1.  Unlike those issued prior to (and hopefully unlike those, afterward), this particular file appears to have either been prepared by hand or some alternative means altogether.  It contained embedded tabs and other significant formatting changes which could only be resolved by manually editing it from top to bottom.  It would be nice to think this will be the only occurrence of this, because the fix is super labour intensive — a couple of hours per file, at least.

*Blinding Flash of the Obvious

Posted on 18th June 2008
Under: Developers' Journal | No Comments »

Testing, Tweaking and Tuning

Recent work has involved taking the new approach described in the previous post and slightly reworking the load logic so it’s fed from the ERCB.LicenseRecord table instead of ERCB.LicenseIssued and other VIEWs.  These changes were relatively minor, and primarily involved the ERCB.putWell, ERCB.putIdentity, and ERCB.putPoint stored procedures, and the VIEW ERCB.Point.

ERBC.LicenseRecord contains (or will contain) an instance for each and every license record which appears in the main body of each and every ST1.  However, there is still some logic which depends on knowing which ST1 was loaded most recently — the ‘current’ ST1, that is — and these instances were isolated in the view ERCB.LicenseRecordCurrent.  It simply looks at the dt column in ERCB._DailyActivityList, and SELECTs only those instances for that particular date.

One significant change was to recode ERCB.Point (the VIEW) into ERCB.Point() (the user-defined, table-valued function).  This was to enable multiple steps and temporary tables to be used to improved performance.  ERCB.Point() does not take any input parameters, which in my mind, makes the use of a user-defined, table-valued function in this situation a bit of a stretch.  Because of this, simply merging the logic of ERCB.Point() directly into ERCB.putPoint was considered, until it was discovered ERCB.putLocus was also using the previously-defined VIEW.

One other visible change to the logic was the addition of the ERCB.relabelWell stored procedure which changes Well.lbl in the event an amendment of type UNIQUE ID: is encountered.  It’s pretty simple; it uses ERCB.Well.  This VIEW does all the heavy lifting when it comes to determining the current Well.lbl, and what the new one should be.  ERCB.relabelWell simply looks at this VIEW, and then builds an UPDATE statement with it.  It’s fairly certain ERCB.relabelWell will be merged in with ERCB.putEvent at some point in the future, but for the time being it was isolated as a separate task, and added to the ERCB.dtsx package.

SQLQuery27.sql was then used to take the calculated well lat/longs, generate some KML, and project them onto the terrain with Google Earth (GE).  A check of a statistically significant sample of the wells on GE indicated nothing significant had broken with the calculation as a result of the changes described above, and in the previous post.

It then became an exercise of methodical testing.  Loading the first day for which there were records (2001-01-02), check the tables after the load is complete, then moving onto the next day, and so on.  Approximately three iterations of load, purge and starting over were completed, the most recent of which got to 2001-01-12.  It was on this date the first change of a well’s geographic location occurred due to an amendment.   The next job is to ensure the old location has been dealt with correctly, and the new location has been assigned to the well.  The specific example in question was 107/14-04-093-12W4M/00, and needs to be audited prior to loading more data.

Posted on 17th June 2008
Under: Developers' Journal | 1 Comment »

What’s Old is New Again

To this point, the design principal was to load an ST1, transform & normalize the data, and then populate the Intellog database with it.  Then, purge the previous ST1, load the next one, and start all over again.  Good theory, but there were problems in determining the precise state of the data when it was was partially loaded from the ST1.  A better approach, as it turns out, was to load, transform & normalize the data, load it into a table (ERCB.LicenseRecord), but not throw it away once the load is complete.  ERCB.LicenseRecord, in turn, will be used as a source of data to populate the Intellog database.  In this way, if it becomes necessary to determine what the previous state of the data was, it’s all there for the taking.  Surprisingly, this was an approach used in an early (early) version of this logic, and eventually abandoned.

But there was also a new idea, inspired by the newly ‘discovered’ reiterated records found in the five line version of the ST1.  It was to create an emulation of the reiterated records in the older, four line version of the ST1.  This can be done so long as an amendment relates to a well for which the original license was issued on or after 2001.  The previous version of the license record can be used as a basis for creating a homebrew amendment record which works pretty much like the amendment record in the five line ST1.  The previous version is available in ERCB.LicenseRecord because — ta da! — the records don’t get thrown away after each ST1 is loaded.

With the basic strategy in hand, it became necessary to create an ERCB.putLicenseRecord stored procedure, which takes the ST1, and puts records in ERCB.LicenseRecord.  In ST1s from the four line era, this stored procedure also creates the emulated amendment records described above, as well. 

The final job was to build a VIEW called ERCB.Well, which uses a series of nested subqueries to isolate the new well label (AKA the UWI) and the previous well label.  Both are needed; the previous label to populate the WHERE clause of the UPDATE statement, and the new label to populate the new value associated with the SET clause of the UPDATE.

With all of these changes, it was decided it was time to purge all the Alberta-related records (again!) using the E.purgeSessionContent stored procedure, and start the load again from scratch.  Also, and even though it is somewhat minor in scope, the base.getConstantDt() function was created which takes a nm parameter, and returns a DATETIME value retrieved from the base.Constant table and converted to the correct data type.  It was used to retrieve the constant called fiveLineDt which is the cut over date from the four line, to the five line ST1.  This approach is considered preferable to hard-coding the value into the database objects which depend on it.

The next thing to do is revamp ERCB.putWell, and the other stored procedures which will populate the Intellog database from ERCB.LicenseRecord.  ERCB.putWell will also employ the ERCB.Well view to rename wells, if necessary.

Posted on 13th June 2008
Under: Developers' Journal | 2 Comments »

Response to Minister Knight’s Letter (2008-06-10)

Recently, Alberta Energy Minister Mel Knight provided a letter in response to an email sent to him in which he was urged to make Alberta petroleum well identification information freely available to the general public.  In this author’s opinion, Minister Knight’s letter fell well short of a satisfactory response.  This new letter to Minister Knight provides additional information and a series of questions for the Minister.  A copy of the letter is provided here, or by clicking the PDF icon at the bottom right.  Your comments and questions are welcome and encouraged, and thank you very much for your support.Click here for PDF version.

Posted on 12th June 2008
Under: Business Development, Data Sources, ERCB | No Comments »