Archive for April, 2008

Trans-Jurisdictional Well Types and the Concept of Translation

The project is at the stage now where it’s necessary to mesh data structures that were never designed to be meshed.  Specifically, the mechanism used to code various characteristics of the well in each provincial jurisdiction varies significantly because there was little if any dialogue between the jurisdictions when the coding was created.  But then again, you shouldn’t really expect there to be; each jurisdiction has their own particular agenda and population to serve.  Whether other jurisdictions or populations are well served was at the very best would have been a secondary consideration, or was, more likely, of absolutely no concern at all, to those creating the codes.

It is therefore necessary to implement a structure that permits the mapping of one set of codes to the others.  It occurred to me that this is really the same problem as the translation of human languages.   In that problem domain (and setting aside the issue of different character sets, for the time being) a translation is a mapping of one string of words to another string of words.  In the case of well coding, you are mapping one string of characters to another string of characters.  While it may seem like overkill, I created a series of four tables that fully implement the translation concept.  Interestingly enough, I believe it to generic enough to be used for the trans-jurisdiction coding problem which is the subject of this post, and the maintenance of human language translations as well.

base.Translation  This table contains the strings of characters which form the basis of the translation; fromTxt contains the text from which you are translating, and toTxt is the text to which you are translating.  Both are NVARCHAR(100).    In the case of well codes, for example, the BC code for multi-zone gas is MGAS and would therefore be the fromTxt, and this is going to map to a generic substance code GAS, which in turn, will be the toTxt.  But of course, these strings of characters are simple enough that overlap with other problem domains is fairly likely.  Hence…

base.TranslationCollection  An instance (ie. record) within this table is the parent of zero or more base.Translation records, and the two tables are related through Translation.id_TranslationClass >> TranslationClass.id.  The combination of fromTxt and toTxt must be unique for a given instance of TranslationCollection.  Continuing with the well code example from above, the TranslationCollection.nm would be BC.  A second instance of TranslationCollection.nm will be MB.  So any overlap between coding schemes of the two jurisdictions has been eliminated through the TranslationCollection instances to which a given Translation is related.  The problem which results, though, is that BC and MB are likely values in a wide variety of scenarios, and thence…

base.TranslationCollectionClass  This table groups of TranslationCollection instances through TranslationCollection.id_TranslationCollectionClass >> TranslationCollectionClass.id.  Again, using the well code example, the two instances of TranslationCollection (BC, MB) would be the children of an instance of TranslationCollectionClass.nm of Produced SubstanceTranslationCollection.nm must be unique for a given TranslationCollectionClass.  There is still a slight possibility Produced Substance concept may conflict across problem domains.  To address this, the table…

base.Schema  identifies the problem domain to which a given TranslationCollectionClass belongs through TranslationCollectionClass.id_Schema >> Schema.id.  To complete the example from above, the Schema.nm to which the Produced Substance instance of TranslationCollectionClass belongs would be Energy.  The Schema could also be addressed through the its lbl, which is simply ETranslationCollectionClass.nm must be unique for a given Schema, and both Schema.nm and Schema.lbl must be unique.

Sharp-eyed readers will notice that the foreign keys for TranslationCollection, TranslationCollectionClass and Schema are all based on an INT key id.  However, a decision was made to use a UNIQUEIDENTIFIER uid primary key for the base.Translation table, because it’s felt that records in this table could well be developed on a distributed basis and brought together at some subsequent time.  This will be particularly true if the structure described above is used for human languages, as opposed to short lists of codes.

Posted on 30th April 2008
Under: Developers' Journal | 1 Comment »

Optimizing the STEM.putIdentifies Stored Procedure

In a previous Developers Journal post, the extremely slow execution of the STEM.putIdentifies stored procedure was identified as a significant problem.  I picked up yesterday’s work from the file Working Queries 2008-04-29.sql and started by reapplying the index to the WELL_NAME column in the STEM source data.  Also, I tried various other new indexes, all of which made little or no difference to the performance of the query.

Finally, I stumbled on an interesting fact.  The primary key of the E.identifies table was badly fragmented (97%), so I simply rebuilt it, and suddenly, the portion of the query which was causing the fundamental issue was running much more quickly.  But likely still not quick enough to be totally satisfactory.  I made the decision to query required columns and records out of E.IdentityCurrent into a temporary table (#tbl), and subsequently use this table in the two queries that actually insert the new records in the E.identifies table.

This ‘caching’ approach resulted in a stored procedure that runs in about 15 seconds, as opposed to over 2.5 hours for just the first of two queries, noted previously.  Job done.  The only thing left to do at this point was save all the temporary work in Working Queries 2008-04-30.sql, and incorporate STEM.putIdentifies stored procedure into the STEM.dtsx package.  Of course, it will also be necessary to migrate the changes made in STEM.putIdentifies back to the OGC.putIdentifies, but that can wait until the next load of BC data.

Posted on 30th April 2008
Under: Developers' Journal | No Comments »

Getting Started on Alberta Data Load

While waiting for some of the longer queries to run, and waiting on the answer to my question to STEM, I thought it would be a good time to get started on the load of Alberta data.  The ERCB publishes daily license and drilling reports, and these area available right back to the beginning of 2001.  I downloaded all of these, unzipped them and put them in a folder hierarchy, and did some manual inspection of data.  I also resurrected some old code used to parse these text files, and began to update it to get it up to the current coding standards.  By loading these files in order, and apply the updates methodically, it will be possible to have a perfectly acceptable table of Alberta data at least for any well licensed since 2001-01-01.

Posted on 30th April 2008
Under: Developers' Journal | No Comments »

Continuing to Automate the Loading of Manitoba Data

Continuing the work from the previous post, which involved methodically going through the SQL Server Integration Services (SSIS) package that was developed for BC (called OGC.dtsx), and building the equivalent steps for the Manitoba data.  The STEM.putLocates stored procedure was implemented and incorporated into STEM.dtsx, and the work on STEM.putIdentifies was started.  However, there was a problem with the two queries which make up this latter stored procedure; they are running really slowly.  The first query completed successfully, but it took it took two hours and 36 minutes to run!  Actually, in the grand scheme of things, this is acceptable, as it can run unattended, and will probably only be triggered once per week.  But it makes testing it a real bugger, so it has to be speeded up.

Started the optimization work by picking apart the query, and it turns out that it’s the OUTER JOIN to E.IdentityCurrent that is causing all the grief.   Comment that portion of the query out, and it runs lickety-split.  Uncomment it, and it’s back to over two hours, or at the very least an unreasonably long time.  Thought that the problem could be solved by adding an index to the WELL_NAME column in STEM.Well, because it was involved in the JOIN, but there was no impact.   Then decided to take more drastic action, and created a table variable the cached a subset of E.IdentityCurrent, and used that in the JOIN.  Stored the work to that point in the file in Working Queries 2008-04-29.sql.

Also, ran into what might be a gap in the Manitoba data that was sent — it doesn’t seem to contain the actual substance being produced (oil, gas, etc.).  This particular question was addressed to the contact at STEM.

Posted on 30th April 2008
Under: Developers' Journal | 1 Comment »

Utilizing the base.toLatLong Stored Procedure for Bulk Conversions

In the previous post to the Developers’ Journal, a stored procedure called base.toLatLong was created (in C#, no less), which called the TerraServer web service to convert UTM co-ordinates into lat/long.  Now it was time to use it to convert the 6000+ distinct sets of co-ordinates provided by  STEM into their lat/long equivalent.   Some initials tests indicated performance was pretty good.  1000 UTMs where converted in 129 seconds, which is ~7.5 conversions per second — certainly acceptable in light of the fact there will only be a relatively small number of conversions per update from source data.   The conversion of the full set of 6,128 locations to lat/long took 51 minutes and 32 seconds, or an average of nearly two conversions per second.  Although obviously not quite the ~7.5 experienced earlier, it is still perfectly acceptable.

The logic to do the conversion was wrapped up in a stored procedure called STEM.putPoint.  It looks at the base.Point table and determines which are new points and creates a SQL cursor which iterates once for each new point.  In each iteration, it calls the base.toLatLong stored procedure, which does the conversion to lat/long, and then inserts both the UTM and the lat/long in the base.Point table.  The logic for storing both was explained in an earlier post to the Developers’ Journal.  As part of this development, a user-defined function, base.getAmtTxt, was developed that does a standard conversion of a float value to text, with lots of decimal precision.

At this point, it’s simply a matter of methodically working through the conversion in a manner that parallels the work done for BC.  In this session, this involved creation the STEM.Well view — which takes the spreadsheet data provided by STEM, JOINs it, and renames the columns so they are in keeping with standard Intellog column names.  The theory being, if anything changes with respect to how STEM provides the data, all that should need to be recorded is the STEM.Well view. 

Following this, the STEM.putWell and STEM.putIdentity stored procedures were completed, and the STEM.putLocus stored procedure was started, although not completely finished.

Posted on 29th April 2008
Under: Developers' Journal | 1 Comment »

Well Data Provider Update #4

This is the fourth in a series of updates (see #1, #2 and #3, posted previously) regarding arrangements with the government agencies responsible for the collection and dissemination of oil & gas data in the Western Sedimentary Basin (WSB) and other areas.  From west to east;

British Columbia — No update from #1, #2 and #3, however, this work is pretty much done.

Alberta — In the previous update, it said a meeting was scheduled with the Level III* manager on Friday, 2008-04-18.  It was disappointing to find it was essentially a reiteration of the previously stated position.  It would appear further discussion with ERCB staffers in the chain of command is not likely to bear fruit in a reasonable period of time.  Although it’s more effort, a good workaround for the testing period is to build the Alberta portion of the database using well license information, which is already in the public domain, back to 2000-01-01.  Intellog has built the logic to parse these text files and turn them into the standardized format for redistribution.  This will provide more than sufficient data during the testing period, during which the work toward a resolution of the issue with the ERCB can continue.

Saskatchewan — Update received today from our contact at the Saskatchewan Ministry of Energy & Resources indicates they are still on track to implement a new FTP server from which the required data can be downloaded.

Manitoba — Currently in progress, and is written up in lots of detail in current posts to the Developers Journal.

and, for the first time…

Newfoundland and Labrador — The Canadian-Newfoundland and Labrador Offshore Petroleum Board publishes weekly reports, and an initial enquiry has been made as to the possibility of adding this activity to the Intellog database.

Please feel free to comment below.  Getting a solid database of well identification data into the public domain really is in everybody’s best interest, and we really appreciate the help!

One other point; there is no specific reason why Intellog is limiting itself to the WSB, other than that’s where we’re based and with which we are most familiar.  If you know of another jurisdiction that has well data available and would value the Intellog approach to presentation and re-distribution, by all means, get in touch, and thank you, again, for your help.

*These is not an ERCB designation, but rather a method for readers to understand how many levels of management are being engaged in this discussion.  Level I is where our enquiry entered the system, which reports to Level II, and so on.

Posted on 28th April 2008
Under: Business Development | 1 Comment »

Calling a TerraServer Web Service from a Stored Procedure

Rather than design, implement and maintain logic for the conversion of UTM coordinates to lat/long, it makes sense to take advantage of the many web services that provide this functionality.  The one chosen for this application was from TerraServer, and the specific web wethod was ConvertUtmPtToLonLatPt.  The challenge was to make this web service accessible from within SQL Server as a stored procedure.  This implies the use of C# to implement the stored procedure, which Microsoft turbidly refers to as CLR IntegrationThere’s a pretty good Microsoft article on the subject, and there is also an article by David Hayden that provides pretty good guidelines as well.  The latter, not only for the stored procedure part of it, but also, for calling a web service.  Co-incidentally, Mr. Hayden’s example uses TerraServer as well, making it particularly well suited to this work 

To start off, the .NET Framework Tools include a dandy tool, wsdl.exe, that generates a C# source file which is a proxy class for the web service itself.  In simple terms,  it creates code which emulates the behaviour of the web service which executes on the local host.  Borrowing heavily from Mr. Hayden’s example, modifying is slightly for this particular application, the command-line syntax is as follows;

wsdl.exe /o:TerraServer.cs /n:TerraServer http://www.terraserver-usa.com/TerraService2.asmx

which takes the definition of the web service found at the url, generates a .cs file as specified with the /o: option, and incorporating the  namespace of TerraServer, specified with the /n: option.  The resulting .cs file then needs to be turned into a .dll with the C# compiler, as follows;

csc.exe /target:library TerraServer.cs

The resulting  TerraServer.dll was added as a reference to the C# project within Visual Studio 2005, and the line

using TerraServer

was added to the C# source code. It’s now possible to build source referencing the web service, right down to getting all that good IntelliSense stuff, just as if it were a local class.  Here’s the source code which illustrates the mechanism to make the C# code accessible from SQL Server, and also, how the web service is called;

using System;
using System.Data;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using TerraServer;

public class HelloWorldProc
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void toLatLong(SqlDouble eastingAmt,
		out SqlDouble latitudeAmt, out SqlDouble longitudeAmt,
		SqlDouble northingAmt, SqlString zoneLbl)
    {
        TerraService ts = new TerraService();
        UtmPt utmPt = new UtmPt();
        utmPt.Zone = 14;
        utmPt.X = (Double)eastingAmt;
        utmPt.Y = (Double)northingAmt;
        LonLatPt lonLatPt = new LonLatPt();
        lonLatPt = ts.ConvertUtmPtToLonLatPt(utmPt);
        latitudeAmt = lonLatPt.Lat;
        longitudeAmt = lonLatPt.Lon;
    }
}

This source code was given the entirely cliché name HelloWorld.cs and then compiled with the following syntax;

csc.exe /target:library HelloWorld.cs /reference:TerraServer.dll

Which churns out HelloWorld.dll.  The final step, prior to going over to SQL Server, is to use the sgen.exe  utility (also provided as part of the .NET Framework Tools) which "creates an XML serialization assembly …in order to improve the startup performance…when it serializes or deserializes objects of the specified types"  Now that sounds kind of optional, but it seems as though it’s mandatory when accessing a web service from within SQL Server.  In any event, the syntax for the utility was;

sgen.exe /a:TerraServer.dll

which automatically creates TerraServer.XmlSerializers.dll.  It’s important to note all three .dll files created in the steps above were located in one folder.  It’s not clear whether that’s mandatory, convenient, or necessary at all.  But that’s the way this particular exercise was set up.  The Visual Studio/C# part of the exercise is now done.

There are a couple of things that need to be checked on the instance of SQL Server used.  CLR Integration is disabled by default, presumably for security reasons.  Microsoft has a good, simple article on how this is enabled.  Secondly, you have configure your database so that it’s deemed to be trustworthy, which is done with the Transact-SQL (T-SQL)  syntax;

ALTER DATABASE [myDatabase] SET TRUSTWORTHY ON

In a database which is accessible to the public, implementing the steps noted above may not be a good idea.  But, if the database is accessible, there’s a reasonable chance the management of the database is in the hands of a DBA, anyway, so the problem can simply be assigned to them to solve.  Now you have to issue the T-SQL commands to create the link between the stored procedure and the .dlls, as shown below;

USE [myDatabase]
CREATE ASSEMBLY HelloWorld
	FROM 'C:\HelloWorld.dll' WITH PERMISSION_SET = UNSAFE
CREATE ASSEMBLY [TerraServer.XmlSerializers]
	FROM 'C:\TerraServer.XmlSerializers.dll' WITH PERMISSION_SET = SAFE 

It’s looks really scary to have PERMISSION_SET=UNSAFE, but all it’s indicating is the stored procedure will be reaching outside of the local SQL Server to accomplish its objective, which again, appears to be for security reasons.  You can get around this with certificates and the like, but for the purposes of this development exercise, it was chosen to avoid this route.  The next step is to create the procedure object within SQL Server, as follows, using T-SQL;

CREATE PROCEDURE toLatLong
	(
	@eastingAmt FLOAT
	,@latitudeAmt FLOAT OUTPUT
	,@longitudeAmt FLOAT OUTPUT
	,@northingAmt FLOAT
	,@zoneLbl NVARCHAR(3)
	)  AS EXTERNAL NAME HelloWorld.HelloWorldProc.toLatLong

The setup steps required are now complete.  There should now be a stored procedure accessible like any other stored procedure within SQL Server.  The only differences, of course, are the fact the procedure is written in C#, and the heavy lifting of the stored procedure is actually implemented by an external web service.  Here’s a sample of how the stored procedure would be called with T-SQL;

DECLARE @longitudeAmt FLOAT
DECLARE @northingAmt FLOAT

EXEC toLatLong
	@eastingAmt = 368437.78
	,@latitudeAmt = @latitudeAmt OUTPUT
	,@longitudeAmt = @longitudeAmt OUTPUT
	,@northingAmt = 5430829.5
	,@zoneLbl = '14U'

SELECT
	@latitudeAmt
	,@longitudeAmt 

Sharp-eyed readers will notice the zone is being passed as an NVARCHAR, and yet, the C# code ignores it and supplies its own value for the UTM zone.  It’s simply an implementation detail that got overlooked prior to writing up this post. The C# code should, of course, parse out the numeric portion of the UTM label and pass that through to the web service, rather than hard-coding. 

Finally, if and when the time comes the stored procedure is no longer required, here’s how to remove it from the database.

USE [myDatabase]
DROP PROCEDURE toLatLong
DROP ASSEMBLY [TerraService.XmlSerializers]
DROP ASSEMBLY HelloWorld

Posted on 27th April 2008
Under: Developers' Journal | 4 Comments »

Beginning the Automation of Load Procedures for Manitoba Data

The workflow to process the raw, spreadsheet-based data provided by Manitoba Science, Technology, Energy and Mines (STEM) into the format ready for the Intellog website is being facilitated with SQL Server Integration Services (SSIS).  As a step in the process is identified and coded, it is incorporated into a workflow in SSIS.  To this end, a new SSIS package was created STEM.dtsx.  The first step in processing the STEM data was to automate the import of the Excel spreadsheets into tables that are identical in structure in SQL Server.  This was done with an SSIS Data Flow Task for each of the spreadsheets.   These tasks were proceeded and followed by the stored procedures called STEM.purge and STEM.cleanup respectively.  They clean out any previous contents of the tables, and then do a little post processing of the imported data to clean it up a little.

The next step was to create a unique record in base.Point for each unique physical location described in the STEM data.  The Intellog database requires a contemporaneous conversion to lat/long when a Universal Transverse Mercator (UTM) location is being stored, which is what STEM provides.  To this point, this conversion had been accomplished with a spreadsheet, but it was determined a mechanism better suited to scripting would be required.  The desired objective was a stored procedure or user defined function that could be accessed from T-SQL, and hence incorporated into a stored procedure which can subsequently be incorporated into STEM.dtsx.

Some searching turned up a web service hosted by TerraServer.  In theory, it should be possible to wrap a call to this web service in C#, which in turn can be used to build a stored procedure with managed code.   There was already a Point class in the Intellog class library, so a new constructor was added to it that takes eastingAmt, northingAmt and zoneLbl.  At this point, a call to the web service is initiated, and the latitudeAmt and longitudeAmt properties are then populated with the converted values.  The step beyond this is to incorporate that logic  into a stored procedure written in C#.

A comparison of locations converted using this approach with those done by the spreadsheet indicate they are very, very close.  However, I have sent an email to the author the spreadsheet to get his thoughts as to the accuracy of his own conversion as compared to the TerraServer conversion to help understand where the discrepancy may be coming from.

Posted on 25th April 2008
Under: Developers' Journal | No Comments »

Inspection of Manitoba Data to Validate Conversion Logic

Some inspection of the converted Manitoba data described in the previous post revealed mixed results.  Some of the plotted symbols seem to be located over reasonable artifacts on the terrain, and a significant number appear to be in the middle of nowhere.  This would indicate that either there aren’t any artifacts on the ground, or the Universal Transverse Mercator (UTM) to lat/long conversion logic was faulty. 

There was obviously a need to do a more selective extract from the well table sent by Manitoba Science, Technology, Energy and Mines (STEM), which I was able to do based on the column well.CURRENT_STATUS.  The coding of this column is quite a bit different from BC, which was to be expected.  With the assistance of the contact at STEM (well_status_types.xls), the code Capable of Oil Production (COOP) was chosen as the criteria for extraction.  This resulted in a list of 2,646 wells.  These were put through the now-standard conversion from UTM to lat/long base on Prof Dutch’s spreadsheet, and the results were put in Test UTM Conversions.kml.  This enabled visual inspection with Google Earth (GE).  Good news…much higher correlation between the plotted symbols and artifacts on the ground.  That’s a subjective assessment indicating the conversion is basically on the right track.  The link on the website;

http://www.intellog.com/E/data/well/v0.0/kml/Test UTM Conversions.kmz

has been updated, so if you have a mind to look over the resulting conversion, have at it.  For the time being, the wells are simply identified with a non-specific sequential number and the same green dot for everything.  Putting a more meaningful label and more selective symbology on the well is the next logical step.

Posted on 23rd April 2008
Under: Developers' Journal | No Comments »

Email to Alberta Energy Minister Knight (2008-04-18)

After four months of intense effort to obtain basic well identification information from the ERCB – described in the Well Data Provider Update series of posts – it appeared unlikely that success would be achieved through dialogue exclusively with the ERCB.  As a result, an email was drafted and sent to Alberta Energy Minister Mel Knight, to whom the ERCB is ultimately responsible.  The email outlined the case for making this data available freely available to the public and can be found here, or by clicking the PDF icon below.  Any response from the Minister will be provided in a future post.  You are welcome and encouraged to provide your questions or comments below.Click here for PDF version.

Posted on 19th April 2008
Under: Business Development, Data Sources, ERCB | 2 Comments »