Archive for May, 2008

The SSIS Foreach Loop Container, Logging and ERCB.putAmendment Debugging

As it turns out, it was possible to handle GROUND ELEVATION:  and SURFACE CO-ORDINATES: type amendments using a slight modification to the logic used to handle SURFACE LOCATION: amendments — which makes sense, because they all ultimately result in a change the base.Point instance to which the well is linked.  As previously mentioned, any amendment which involves changing the location of the well requires a recall of the original surface location and offsets.  The logic for generating the iSentence XML to capture all of this information was stored in ERCB.getSurfaceCoordinatesXml() function.  Also, given these types of amendments involve deleting existing and then adding new instances from/to a variety of tables, BEGIN TRANSACTION...COMMIT TRANSACTION syntax was used to ensure the whole change rolled in.  Or didn’t. 

That done, an SQL Server Integration Services (SSIS) Foreach Loop Container was used to control the looping of the load logic and process each of the ST1 files.   A pretty good primer on the subject can be found  on the Microsoft site.  Essentially, it’s just a matter of creating the Foreach container itself, and dragging-n-dropping all of the previously-defined steps into the container.  Then, a variable is set up to capture the name of the current file being processed and feed it to the connection string.  Of course, as groups of ST1s were being loaded, small problems cropped up with ERCB.putAmendment (and related) procedures, but nothing took more than a few minutes to resolve.

In order to monitor the loading of the 2600+ ST1s, logging capability was determined to be important; hence the creation of the base.Log table, and the ERCB.putLog stored procedure to populate it.  This was another use for the iSentence XML namespace.   As a bi-product of this work, the function base.getNullXml() was created.  It’s job is to simply return an iSentence-compliant XML that doesn’t contain anything, other than the namespace declaration, of course.  base.putAmt() and base.putTxt() functions were used to add elements to it, and — voila! — you have a parsable log record which ERCB.putLog can put into base.Log.

The statistics above were used to measure the length of time each ST1 takes to load into the Intellog database, over time.  That was averaging about a minute per ST1 in the early tests, but it should be monitored as the data rolls in.  The view ERCB.WellLicensesIssuedLoad  was created, and used to generate an Excel spreadsheet (LoadStatistics.xls) which charts average load time.  The output of this application can be seen to the immediate left.  So far, the performance is a pretty flat curve, which is good. 

Finally, ABtest.kmz has been refreshed with approximately 13,000 unique locations, for wells licensed during the period 2001-01-01 through the end of 2001-10 inclusive.  It can be viewed with Google Earth.  Its just one, big monolithic file which uses the standard yellow pushpin placemark and the well label.  It can still be used to review the quality of the lat/long conversions, and as a general indicator of the progress being made on the load of Alberta data.

Posted on 30th May 2008
Under: Developers' Journal | 1 Comment »

Implementing Surface Location Amendments with ERCB.putAmendment

The logic for the application of WELL NAME: and UNIQUE ID: amendments was incorporated into the ERCB.putAmendment stored procedure, and discussed in a previous post.  Turning to SURFACE LOCATION: amendments, the logic will necessarily be more complex; it involves disassociating the well from an existing location, creating a new one, and then relating the well to the newly-created location.

It’s also a little fuzzier as to what precisely an amended surface locations implies.  For example, license 0248463 was issued on 2001-01-05, and then amended on 2001-01-08 with a different surface location.  Do the surface co-ordinates from 2001-01-05 still apply?   To fully understand the implications of the question, it helps to think about why surface locations might be amended in the first place.  One reason would be conditions on the ground necessitating a change to the spotting of the well.  If this is the case, a re-survey of the location would be necessary, which would very likely result in new surface offsets.  The only other possible reason for an amended surface location would be the LSD submitted on the original application was simply recorded incorrectly — but, realistically, how often is this likely to happen?

I was feeling lucky, so I submitted the question above to the ERCB Help Desk.  Their response will be reported when it is received, but until then, it is assumed the surface co-ordinates still apply.   Once complete, the newly developed logic for SURFACE LOCATION: was tested for ST1s issued on 2001-01-01 through 2001-02-28.  This resulted in just a handful of SURFACE LOCATION: amendments.  When they were presented on Google Earth, roughly half of them seemed to line up over some artifact on the ground. 

On a positive note, the logic developed for SURFACE LOCATION: will only require slight adaptation to handle GROUND ELEVATION:  and SURFACE CO-ORDINATES:  amendment types, also.   However, the process of running the load logic for a group of ST1s is becoming onerous, so implementing some sort of logic to load multiple files needs to be investigated.

Posted on 28th May 2008
Under: Developers' Journal | No Comments »

When Events Spin Out of Control…(The iSentence XML Namespace)

It’s amazing when sometimes the simplest of problems crops up, and despite the belief it’s going to take five minutes to solve, the day still pretty much slips away dealing with it.  This was the case when I tackled the problem of amendments to surface locations.  I realized I wasn’t capturing the surface offset information;  it was being used to calculate the original lat/long and then thrown away.  Here’s the problem; if a surface location is amended, the surface offsets are needed again to a do a new lat/long calculation.   Whoops.

There were three options; 1) Simply add the necessary columns to base.Point.   This seemed to bloat the fairly clean definition of this table.  Surface offsets also seemed a little too industry-specific for a table associate with the base schema, which is intended for objects which are industry-agnostic.  2) Create a complementary E.Point table that captured the energy-specific information.  Again, this didn’t seem elegant, as there would be a lot of JOINs required between base.Point and E.Point to get anything useful done.  3) The final, and best, alternative was to flesh out the use of the base.Point.xml column (which is of the XML data type) so it could be employed to store surface offset information in those situations where it is required.  Way back during the conversion of UTMs, a brief note was made regarding base.Point.xml, with vague mention of its future use to store data of indeterminate type.   Surface offset information storage is precisely the scenario for which base.Point.xml was intended to be used.

Actually, to achieve the objective of storing surface offset information, not much more was required than to wrap those parameters in some well-formed XML and put it in the base.Point.xml column.  However, past experience indicates a world of hurt ensues when steps are not taken to keep the XML consistent.  Hence the development of the iSentence namespace, which dictates a simple and flexible XML structure for storing numeric and text values.  SQL Server permits the embedding of namespace information in the database using the CREATE XML SCHEMA COLLECTION syntax.  Once this is done, the XML schema can be referenced in object definitions to ensure XMLs meet the standard set out in the namespace.  Specifically, iSentence was associated with the base.Point.xml column, which means that forever more, if XML is stored in that column, it must be iSentence compliant.

With that in hand, it was a matter of coming up with some general purpose functions in input and output values from the XML.  These were base.putAmt(), base.getAmt(), base.putTxt() and base.getTxt() scalar-valued functions, and the base.amt() and base.txt() table-value functions on which they are depend.

Now, I can carry on with ERCB.putAmendment.

What’s in a name?  So what’s up with the iSentence name — where did that come from?  Well, one of the long term objectives of Intellog is to foster an intelligent dialogue of information between industrial communities.  While the concept is not yet fully fleshed out, it’s reasonable to expect any sort of dialogue — human, systematic or otherwise — consists of sentences that are exchanged between parties.  An Intellog iSentence is simply a package of one or more values pairs; each value pair consisting of a name, and an associated value.  In the foggy, not-quite-certain future, it’s possible to visualize an ‘intellog’ consisting of iSentences flowing back and forth between parties.  The use of the lowercase ‘i’ as a prefix is not exactly original, but it seemed like a better choice than simply Sentence.

For the moment, the iSentence is limited to two data types; values and text strings, but will eventually be expanded to include other types such as universally unique identifiers, dates, and the like.    Also, if there are iSentences, is there room in this world for iParagraphs consisting of one or more iSentences?  Quite possibly, but the latter will simply be allowed to evolve if and when necessary.

Also, on a completed unrelated note; it seems like it may have possible to use the iSentence concept to store the UTM-related information as well.  Whether to keep the existing UTM-related structure in base.Point, or replace it with iSentence, is an issued that remains to be resolved.

Posted on 27th May 2008
Under: Developers' Journal | 3 Comments »

There Are Amendments, and Then There Are Amendments

Previous posts talked about amendment events as well-related events which don’t impact the state of the well.  Actually, that’s not exactly true, because there are also amendments which will drive specific changes to the Intellog database.  For example, if the amendment indicates the unique well identifier for the well has changed (and yes, contrary to popular belief, unique well identifiers do change), a change has to occur to the lbl column of E.Well at some point.

The spreadsheet AmendmentAnalysis.xls was used to separate the amendments found on the ST1 into the two categories described above.  The specific amendments which result in a change to the database were then codified in the view ERCB.LicenseAmendmentDetail, which is a view built on top of ERCB.LicenseAmendment.  It creates a series of  columns which, when paired with the LicenseAmendmentDetail.typeTxt column, return information related to the specific database update required.  For example, if typeTxt is populated with WELL NAME:, then the column wellNm can be assumed to contain the amended name for the well.  In the event typeTxt is populated with SURFACE CO-ORDINATES:, eastWestBoundaryCd, eastWestOffsetAmt, northSouthBoundaryCd, northSouthOffsetAmt will be populated with the new values (and wellNm and other detail columns will be NULL).

The coding of the LicenseAmendmentDetail view involved some fairly involved parsing of text, which resulted in the collateral development of the base.getMonospaceTxt() and base.getNumericTxt() user-defined functions.  Also, over the course of the recent development effort, the entire month of January, 2001 was test loaded and reloaded a couple of times.  It was found on the load of the 2001-01-15 ST1, an error was returned in the ERCB.DailyActivityList view.  This was caused by the wacky folks over at ERCB spontaneously embedding an ASCII 9 in the date field.  The resolution to this problem was handled by increasing the robustness of the date parsing logic.

The ERCB.putAmendment stored procedure was created, and it uses ERCB.LicenseAmendmentDetail to apply amendments to the database.  The logic for applying amendments for WELL NAME: and UNIQUE ID: type amendments was implemented and tested.

On an unrelated note, and in preparation for the eventual load of all ST1s, I submitted a request to the ERCB for the ST1s for 2003-12-07 and 2005-06-30, which appear to be missing from the archives they have on their website.  Their comment back was "If there were wells issued on those days, the data should be there."  I thought about it for a moment, and then decided that really wasn’t an answer, and resubmitted the question.  Keep in mind ST1s published on days where no licenses were issued (as is the case on most Saturdays and Sundays, for example) a report still comes out.  It only has a heading a no records, but is still comes out.  But in the case of the two dates above, these are a Wednesday and Thursday, respectively, so it’s reasonable to assume licenses were issued on those days.  So they really are missing, I believe.

Posted on 27th May 2008
Under: Developers' Journal | 1 Comment »

Improving the Performance of ERCB.Point View using a Common Table Expression

In the previous post to the Developers’ Journal, there was lots of talk of surface and bottom lat/longs, but the net of it was a change to ERCB.Point which involved comparing the two fields on the ST1 containing location information and acting accordingly.  No problem, except performance of the view deteriorated dramatically.  This resulted in the ERCB.dtsx package taking hours to run, instead of a minute or two, which was what was being enjoyed previously.

The ERCB.Point view was recoded using a common table expression (SELECT with the WITH clause), and the time for the view to return complete results went from over 37 minutes to about seven seconds.  So, anyone who thinks the SQL Server query optimizer is infallible — it’s not.  It’s good, just not perfect.  In theory, the two different approaches to coding the view should get you the same results, but clearly they do not. 

To be entirely frank, I was so stunned by the difference between the two, I copied the result sets from each, and compared them in Excel to make absolutely sure they were returning the same thing.  They were.  As a result, the ERCB.dtsx package was back to taking much less than a minute to run from start to finish.  ABtest.kmz using the results from the entire 2001-01 period — 2000+ surface locations.

Posted on 26th May 2008
Under: Developers' Journal | No Comments »

Bottom Location vs. Surface Location on the ERCB’s ST1

Something which has been consciously overlooked in the ERCB load logic, to this point, is the situation where the surface location and the bottom hole location of a well have different lat/longs associated with them.  Such is the case with deviated or horizontal wells, for example, and there are lots of them.  The ERCB’s ST1 contains this information but it requires the examination and/or comparison of two fields, and taking one or another course of action based on what the fields contain.

The physical proximity of the UNIQUE IDENTIFIER and SURFACE CO-ORDINATES fields on the ST1 — they are right next to each other — leads to the assumption they are always used together.  This is correct in the case of a vertical well.  However, with deviated or horizontal wells, the SURFACE LOCATION field of the ST1, which is way over on right on the last line of the license record, must be used in conjunction with the SURFACE CO-ORDINATES field to calculate the lat/long of the surface location of the well.

If the SURFACE LOCATION field is not populated on the ST1, then the data in the UNIQUE IDENTIFIER field of the ST1 is used, along with the SURFACE CO-ORDINATES, to calculate the lat/long of the surface location.  On the other hand, if the SURFACE LOCATION field on the ST1 is populated, it must be used in conjunction with the contents of SURFACE CO-ORDINATES field to calculate the lat/long of the surface location.

Up until the great The Great ST1 Format Change of 2003, the SURFACE LOCATION was not populated if it was the same as the UNIQUE IDENTIFIER field.  On and after 2003-09-28, however, it appears as though the SURFACE LOCATION field is always populated, even if it is the same as the UNIQUE IDENTIFIER.  No big deal, but it requires slightly different code to handle this situation before and after the cut over date between the two formats.

The ERCB.Point view had to change so it compared the wellLbl and surfaceWellLbl columns in the ERCB.LicenseIssued view.  Depending on their contents, one or the other is used in the calculation of the lat/long of the surface location.  Pretty easy, except after the change, the performance of the ERCB.Point view seemed to fall off a cliff, even if the results it produced were seemingly accurate.  Test data was generated for the period 2001-01-01 through 2001-01-25, and can be viewed, using Google Earth, at ABtest.kmz.

Posted on 26th May 2008
Under: Developers' Journal | No Comments »

An Amendment Surprise

What with the denormalization and well event stuff out of the way, attention finally turned to the application of well amendments to the database.  Almost immediately, something unexpected came up; instead of a two-line format for each amendment detail, there can be three (or more?) lines.  On the ST1 for 2001-01-05, for example, the amendment to license 0245929 related to its PURPOSE: is two lines.  Unfortunately, this drove a truck through the logic by which amendment information was being parsed, which was based on an odd-line, even-line type idea.

This necessitated a recode of ERCB.LicenseAmendment so it is a little more sophisticated in determining which parts of the amendment detail represent the token (ie. the PURPOSE: part), and which parts represent the new information associated with the token.  The basic idea was to isolate the part of the ERCB._DailyActivityList containing amendment information, filter out everything but the values, then use the newly-created ERCB.getLicenseAmendmentTxt() function to roll all the lines of value information into a single string of text.

It’s intended to feed the ERCB.putEvent stored procedure the amendment information and have it create a AMD type event (id = 113).  The only thing missing was information expressing all of the amendment information.  Similar to ERCB.getLicenseAmendmentTxt() function above,  the ERCB.getLicenseAmendmentDsc() function was created, and used to populate the newly-added dsc column in the view ERCB.LicenseAmended.   getLicenseAmendmentDsc() rolls together all of the amendments for a given well license into one string of text.

Posted on 23rd May 2008
Under: Developers' Journal | No Comments »

Stateful and Stateless Well Events

The development of the ERCB load logic furthered the evolution of three different (but related) collections of well event types; statusCd, substanceCd and typeCd.  When events of any of these types occur, they reflect a change in the state of the well.  Therefore, interest is not only in the events themselves, but the order in which they occur.  In particular, the ‘most-recent-event-not-in-the-future’ is of specific interest, as it represents the current state.   These collections of types are grouped together as one TypeCollectionClass called Well Events - Stateful (id = 6).

When tackling amendments to well licenses, however, a new class of well event was required which is different in a small but important way.  These are events which do not necessarily impact the state of the well.  For example, you want to keep track of the fact the well license was amended, but this does not necessarily change the state of the well.  To capture this concept, a new instance (record) of base.TypeCollectionClass was created called Well Events - Stateless (id = 7).  An instance of base.TypeCollection was also created, called documentationCd (id = 27), which was related to the newly created Stateless class.  Finally, a new instance of base.Type was created (AMD, id = 113) which was related to the new documentationCd collection of types.  In the future, other events of this type might be ’service program requested’ and ’service program received’, for example.

To help keep track of the entire TypeCollectionClass-TypeCollection-Type hierarchy , the base.TypeDefinition view was created which pre-JOINs these tables and presents both the names and the identifiers associated with them.  Also, the E.WellEvent view was created to bring all of the well event-related information into one table-like structure.

Posted on 23rd May 2008
Under: Developers' Journal | No Comments »

Denormalization for Performance Improvement

To achieve improved load performance, the table ERCB._LicenseIssued was created as a denormalized analogue for the view of the same name*.  The stored procedure ERCB.refresh_LicenseIssued was then created to refresh the table from the view.    However, in rethinking this decision a little bit, it actually made more sense to apply this logic to the ERCB.DailyActivityList, as it is the table the view ERCB.LicenseIssued is built on, along with other views like ERCB.LicenseCancelled, ERCB.LicensedAmended and the like.  So, by denormalizing ERCB.DailyActivityList, you achieve improved performance with all of the views, as opposed to just LicenseIssued.

So, the denormalized ERCB._DailyActivityList table was created, along with the stored procedure ERCB.refresh_DailyActivityList which refreshes the table from the ERCB.DailyActivityList view.  To keep things clean, ERCB._LicenseIssued and ERCB.refresh_LicenseIssued were then deleted.  This broke a fair number of the stored procedures (such as ERCB.putIdentity), but these were fixed quickly and easily by running ERCB.dtsx in debug mode.  This change to the denormalization logic got the ST1 load procedure down from five or more minutes to about 40 seconds.  A worthwhile investment of time and energy, it would seem.

*The underscore character is used in this situation to differentiate between tables and views that are identical to each other.  The underscore denotes the denormalized table.

Posted on 23rd May 2008
Under: Developers' Journal | 1 Comment »

New Well Symbols, and Further Development of the ERCB Load Logic

Time to grind back into gear after the long weekend, and finish up the load logic for Alberta data, right through to displaying the correct well symbols on Google Earth (GE).  Some inspection of the data in the early, four-line version of the ERCB’s ST1 seemed to indicate neither the substance or the type of well can be precisely determined.  I suppose this is why the five-line format for the ST1 was used from 2003-09-28 onwards.

The decision, therefore, was to add a family of three new symbols; LIC-UND-(null).png, CNC-UND-(null).png, and REE-UND-(null).png.  The white dot*, which is the foundation of these symbols, reflects the fact the substance is undefined, and that’s what will get displayed on GE.  Decorations were added to reflect the three states of licensing; issued, cancelled and re-entered, respectively.  Once completed, these symbols were uploaded to the Intellog web server, and added  to the Well Symbol Library.

Instances (records) reflecting the new symbols were created in the base.Document, E.WellSymbol and E.elucidates tables, the structure and relationship of which were described previously.  In addition, a decision was made to handle E.WellSymbol.substanceCd slightly differently than statusCd and typeCd.  It was made NOT NULL, so it must always be populated with a value, even if that value is UND (undefined) as symbolized by the white dot.  It just seemed to reflect the fact the substance (ie. colour) of the well is pretty fundamental to it’s presentation on any sort of map.  Therefore, you always need to say something about it.  statusCd and typeCd remain nullable, as it’s deemed acceptable for this information simply to be missing.  These codes only govern decorations that may appear on the base symbol, and are therefore ‘less important’ than the substance.

The big part of the job was modifying the ERCB.putEvent stored procedure so that it created an ‘event’ which defines the substance.  I realize it sounds like an odd way to do it, but it does allow the substance to be modified, and to keep track of when the change was made.  This thinking may be a little longer term than absolutely necessary, but seemed reasonable, in the context of everything else.

Somewhere along the way, the E.purgeSessionContent stored procedure was hacked together.  It takes a @uid_Session parameter, and purges all content created within the context of the session; which is to say any instance (record) of any table where uid_Session_create points to the the @uid_Session provided as a parameter to the procedure.  E.purgeSessionContent was then used to purge out all the data loaded for Alberta on a test basis, so the methodical loading of each ST1 can commence.  There may be a few more false starts, in which case the procedure can simply be re-employed.

The spreadsheet ERCBDataLoadTesting.xls was created, and used to keep track of the Alberta data as it is being loaded.  The spreadsheet is populated with some ad hoc queries temporarily stored in SQLQuery11.sql.  Managed to get up to and including 2001-01-08 loaded, which generated 415 unique well locations.  These well locations were used to generate a refresh of MultiSurfaceOffsetTest.kmz (using the queries found SQLQuery27.sql).  The results were uploaded to the server and are available for viewing.

The test loads were stopped at this point, as I hit the first amendment for which there is some action required — for license 0248449 — and logic for handling amendments is not yet in place.

*This is based on the symbol used for an early load of the BC data, which was a white dot with a question mark it in.  It was felt at this time, however, that the question mark wasn’t really necessary, and only contributed to the visual clutter.

Posted on 22nd May 2008
Under: Developers' Journal | No Comments »