Continuing with Development of the ERCB Load Logic

Continuing with the development of the logic to load data from the ERCB’s daily ST1, the ERCB.putInstant procedure was created.  It simply queries ERCB._LicenseIssued.dt column, and creates an instance of base.Instant, if necessary.

It was then time for the ERCB.putEvent stored procedure.  This creates an instance of base.Event, as you would expect, from the name of the procedure.  This is linked to the previously created base.Instant instance though base.defines (’Instant-defines-Event‘…get it?).  ERCB.putEvent then creates an instance of E.affects, which links the event to E.Well (’Event-affects-Well’).   The presence of a well in the ERBC.LicenseIssued view triggers the creation of a license event (id_Type = 92), and the presence of a well in ERCB.LicenseCancelled triggers the creation of cancellation event (id_Type = 84).

I was under a slightly incorrect impression of how the re-entry region of the ST1 works.  I had assumed it was referring to a previously issued well license.  But it turns out, it refers to a license in the same ST1.  This led to the conclusion there are really two types of licenses issued; ‘normal’ ones, and re-entry licenses.  To handle this, a new instance of base.Type was created (id = 111), to reflect the re-entry license.  Logic was then added to ERCB.putEvent which LEFT OUTER JOINs ERBC._LicenseIssued and ERCB.LicenseReentered.  If there is a ‘hit’ on LicenseReentered, id_Type = 111 is used, otherwise id_Type = 92.   The steps above were then edited into ERCB.dtsx.

This all tested OK, so the ST1 for 2001-01-03 was used as a target for the logic, and it seemed to work just fine.  It resulted in a combined total of 187 wells.  SQLQuery27.sql was modified to JOIN E.Well, E.locates, base.Locus, base.delimits and base.Point and generate the necessary KML tags.  The results were placed in MultiSurfaceOffsetTest.kmz.  The lat/long conversion logic seem to be holding up well, with lots of them lining up right over artifacts on the ground when viewed with Google Earth.

Next up, implementation of events for typeCd and substanceCd.  Sadly, the ERCB didn’t start specifically identifying the substance on the ST1 until 2003-09-28.  On that date, they went from a four line format to a five line format, and started to include substance, as well as drilling operation, well purpose, and well type.  For the period from 2001-01-01 though 2003-09-27, therefore, something else will have to be used to determine these characteristics for the well.  I think it will be possible to get this from ERCB.LicenseIssued.laheeClassificationDsc.

Posted on 16th May 2008
Under: Developers' Journal | No Comments »

Well Data Provider Update #5

This is the fifth in a series of updates (see #1, #2, #3, and #4 posted previously) regarding arrangements with the government agencies responsible for the collection and dissemination of oil & gas data in the Western Canadian Sedimentary Basin and other areas.  From west to east;

British Columbia — No update from #1, #2, #3 and #4, however, this work is pretty much done.  The test data can be viewed at BC.kmz.

Alberta — There are lots of current posts in the Developers’ Journal describing the ongoing work to load Alberta data from the daily Well Licenses Issued (ST1) report.  A small test sample can be viewed at MultiSurfaceOffsetTest.kmz.  It doesn’t have any of the symbology or other detail, but clearly demonstrate’s Intellog’s capability to load the ST1 information into our database and subsequently onto Google Earth

At the same time, the effort continues to establish a working relationship with the ERCB which is equivalent to that which is enjoyed with the other jurisdictions.  Given the stagnation of discussions with ERCB staffers noted previously, a letter has been sent requesting the assistance of Alberta’s Minister of Energy, Mr. Mel Knight.  The letter itself will be tabled in the Alberta Legislature, and we’re also receiving the assistance of Mr. Harry Chase’s and Ms. Alana DeLong’s constituency offices in pursuing this matter.  Please contact us if you would like a copy of the letter.

Saskatchewan — The new FTP server mentioned in the previous update is up and running, and the data for Saskatchewan has been downloaded.  The work to load it into the Intellog database will commence immediately following the Alberta test load, described immediately above, is complete.

Manitoba — Similar to BC, this work is pretty much done for the time being, and the test data can be viewed at MB.kmz.

Newfoundland and Labrador — The data from the Canadian-Newfoundland and Labrador Offshore Petroleum Board has been obtained, and will be loaded into the Intellog database shortly.

Please feel free to comment below.  Getting a solid database of well identification data into the public domain really is in everybody’s best interest, and we really appreciate the help!

One other point; there is no specific reason why Intellog is limiting itself to the WSB, other than that’s where we’re based and with which we are most familiar.  If you know of another jurisdiction that has well data available and would value the Intellog approach to presentation and re-distribution, by all means, get in touch, and thank you, again, for your help.

Posted on 16th May 2008
Under: Business Development | No Comments »

Reverse Engineering of OGC and STEM Logic to ERCB

The ERCB.LicenseIssued, ERCB.LicenseCancelled and ERCB.LicenseAmended views all query the ERCB.DailyActivityList to return rationalized, consistently-name tables for their respective regions of the ST1.  These will be used when the OGC and STEM load logic is reverse-engineered onto the ERCB data.  There was just one other region in the ERCB.DailyActivityList which needed to be parsed out, which relate to well re-entries.  Hence the new view ERCB.LicenseReentered, which retrieves this information.  In addition to these four, one other view was created, ERCB.LicenseAmendment containing the details of the amendments, given there can be more than one amendment to a license on a given ST1.

The general idea for the loading of ERCB ST1 data into the Intellog database is to import the first ST1, fully process it, dump it, load the next one, process it, and to do so chronologically from oldest to newest.  This was all going along swimmingly until it was discovered the ERCB.LicenseIssued view was taking about five minutes to complete, and it is called multiple times during the load.  This problem was solved by creating an analogue table for the ERCB.LicenseIssued view, called ERCB._LicenseIssued.  (The leading underscore is the method used to identify the special relationship between the view and the table.)  The ERCB.refresh_LicenseIssued stored procedure flushes out any previous content of ERCB._LicenseIssued and reloads it from ERCB.LicenseIssued.  Retrieving results from ERCB._LicenseIssued is virtually instantaneous.

OK, now — for sure! — all of the preliminary work is done, and it’s possible to methodically go through the stored procedures created from OGC and STEM and adapt them to the ERCB load.  This was just a matter of referring to OGC.dtsx and STEM.dtsx diagrams, and creating the equivalent code in a new package, ERCB.dtsx.  The stored procedures ERCB.putWell, ERCB.putPoint, ERCB.putIdentity, ERCB.putLocus, ERCB.putLocates and ERCB.putIdentifies were all created using this methodology.

Somewhere along the line, the table base.Constant table was created.  It simply stores constants which are used throughout the application.  Its first use was to store the conversion factor for  miles to kilometres, and was employed in the ERCB.Point view.

Posted on 16th May 2008
Under: Developers' Journal | No Comments »

Finishing Up the Conversion of Lat/Long from Surface Coordinates with the ERCB.Point View

The final step to convert the ERCB’s ST1 surface co-ordinates to lat/longs was to wrap all the logic discussed in previous posts so it can subsequently be used in a stored procedure similar to OGC.putPoint.  putPoint stored procedures create an instance in base.Point for each unique point in the source data. 

To this end, the newly-created ERCB.Point view takes surface co-ordinate information from the ERCB.LicenseIssued view, and combines it with a JOIN to ATS.SectionBenchmark() to look up the lat/long of the northeast corner of section.  It then returns the lat/long after calculating their values with the getLatitudeAfterOffsetAmt() and getLongitudeAfterOffsetAmt() user-defined functionsElevation information is also provided in the ST1, so it is also returned in ERCB.Point, as is the wellLbl.  

After some fiddling with units and the sign of the offsets, ERCB.Point finally worked as advertised above, and a test file was generated.  It can be be viewed at MultiSurfaceOffsetTest.kmz.  The script to generate the test file was temporarily stored in SQLQuery27.sql.

Posted on 16th May 2008
Under: Developers' Journal | No Comments »

Lat/Longs from Surface Offsets…a Blinding Flash of the Obvious

OK, it’s one of those things where a good (or even a so-so) night’s sleep leaves you with quite a different view of the problem than you had the previous day.  In yesterday’s post, I talked a lot about great circle distances, averaging lat/longs and bunch of other things which may well be unnecessary.  It then dawned on me (in the dawn’s early morning light, I suppose); sections are a constant size — one mile square, regardless of latitude.

Duh.

Therefore, for the purposes of this initial round of calculated lat/longs, I am assuming sections are exactly one mile on each side.   If surface offsets are provided from the north and eastern boundaries of the section, and you have the lat/long for the NE corner (virtually all sections have such a benchmark), you can do a fairly simply calculation down and over from the NE corner, and there’s the lat/long.  It’s the same if the offsets are from the other boundaries, except you have to subtract the offset from a mile, first.  (And do the imperial to metric conversion, of course).

Two C# user-defined functions were created for SQL Server; getLatitudeAfterOffsetAmt(latitudeAmt, offsetAmt) and getLongitudeAfterOffsetAmt(latitudeAmt, longitudeAmt, offsetAmt)   These return lat/long values with the offsets applied.  The apparent lack of symmetry between the two is based on the fact latitudes can be assumed to be a constant width apart from equator to pole, whereas distances between lines of longitude get shorter towards the pole.  So the getLongitudeAfterOffsetAmt() requires both the longitude and the latitude, in order to produce an accurate calculation.  By convention, positive offsets are considered to be north and east, and negative offsets are considered to be south and west.

To isolate benchmarks at specific positions within the section, the table-value function ATS.SectionBenchmark() was created.  It can be passed the code for the particular benchmark, and it returns only those specified.  For example, offsets calculated from the benchmark in the northeast corner of the section would use the SELECT ... FROM ATS.SectionBenchmark('NE') syntax.

Using these new functions, I roughed up a single well test, and the results appear to be pretty good.  The test file can be viewed here, if you’re interested. It does require Google Earth (GE) be installed.  In the test file, the placemark named NE is taken directly from the Alberta Township System (ATS) Coordinate file, whereas the one marked Offset is one calculated using the logic described above.  It overlays a GE artifact on the ground pretty well.  This isn’t a statistically significant sample, obviously, but definitely seems to demonstrate being on the right track.

So rather than have fuzzy-headed thinking cluttering up the database design, the table ATS.Section table, the ATS.SectionBoundary & ATS.SectionDimension views, and the ATS.putSection stored procedure are like headed for retirement –  just one day after starting work.  That is, just as soon as it’s been established they really serve no useful purpose.

On a complete separate note, I’m pretty sure I found some errors in the ATS file.  For example, if you look at the benchmarks associated with section 12-035-29W5M, the benchmarks appears to have the wrong lat/longs associated with them.

Posted on 14th May 2008
Under: Developers' Journal | No Comments »

Calculation of Lat/Longs from Surface Offsets

Surface latitude and longitude for Alberta wells needs to be calculated from the surface offset information provided by the ERCB on their daily Well Licenses Issued (ST1) report.  I did manage to get this done, once before, when working on a preliminary version of this application.  However,  it was unbearably slow, if not pretty accurate when it finally finished running.  This new effort will centred on producing equal-or-better accuracy of results at a much better clip, hopefully.

Surface offsets on the ST1 are reported in metres from one of the four section boundaries.  So the basic method is to get accurate lat/longs for the boundaries, and then treat the section as a Cartesian co-ordinate system.  The initial work, therefore, was to obtain the boundary lat/longs.  AltaLis describes their Alberta Township System (ATS) Coordinate file as "an ASCII data file containing geographical coordinates (latitude and longitude in degrees and decimals thereof) for every governing quarter section corner in the province of Alberta."  Exactly what is needed to calculate accurate section boundaries, and it’s available for free download.

A load of ATS information into a newly-created table called ATS.CoordinateFile reveals 823,012 lat/longs for benchmarks for each section of the province.  For the vast majority of sections, there is a lat/long for the northeast corner, and then a lat/long for the quarter section boundary due south, and due west.  For benchmarks on the eastern boundary of a section, for example, the average of the one-or-more longitude values was calculated, and stored it in the eastLongitudeAmt column of the newly-created ATS.Section table.  This same method was used to calculate as many boundaries as possible directly from the ATS.CoordinateFile.  After this process was complete, any residual NULL value in the eastLongitudeAmt, northLatitudeAmt, southLatitudeAmt, westLongitudeAmt columns for a given instance of ATS.Section indicates a section boundary which will need to be calculated, in the event an ERCB-supplied surface offset is measured from the unknown boundary.

The base.getGreatCircleDistanceAmt() SQL Server user-defined function, written in C#, implements Chris Veness’ logic published on Movable Type.  The function is supplied with two sets of lat/longs, and it returns the great circle distance — in kilometres — between them.  Combinations of benchmark lat/longs are fed to this function, and the averaged results are store in ATS.Section.eastWestDistanceAmt and northSouthDistanceAmt.  As a result, ATS.Section contains the known boundaries of a given section, and the distance to the opposing boundary.  This makes it possible to calculate the lat or long of an unknown boundary using the known boundary plus the calculated dimensions of the section.  Like most things, it’s not perfect, but probably close enough for the job at hand.  The logic to populate ATS.Section as described above was bundled up and put in the stored procedure ATS.putSection

Posted on 14th May 2008
Under: Developers' Journal | No Comments »

Alberta Bound…Migrating Structure from Old EUB Schema to New ERCB Schema

Quite some time ago, I made an attempt to load well license information from the ERCB Well Licenses Issued (ST1) report into a preliminary version of the Intellog database.  I assumed the code related to that effort would be obsolete, and it would be necessary start again, more-or-less.  Fortunately, this didn’t turn out to be the case, and I was able to re-use much of what was written previously, with only minor changes.

The job of loading Alberta well data from the ST1 is quite different from what was undertaken for BC and Manitoba.  These were both reformatting exercises from other tabular structures.  The ST1 is text-based, and while it is reasonably uniform in its formatting, it has some idiosyncrasies which make parsing it, and loading it into tables somewhat problematic.  Also, the reports must be applied in sequence, from oldest to newest, because a well licensed one day could be cancelled the next.  If the report were to be loaded in reverse chronological order, for example, it would attempt to cancel a license that didn’t officially exist yet.

To facilitate the review of the existing code, I created a new schema; ERCB.  Turns out this is in keeping with a name change the ERCB itself has gone through in the last few months, but also serves as a visual reminder of what’s been reviewed and moved into the new structure, and what remains.  Once the ERCB load logic is complete, any objects left over in the old schema (EUB) will be deemed obsolete and removed from the database.  Along with the new schema, a similarly named database diagram was also created to document the ERCB-related structure, as well as a similarly named SSIS package.

The following objects were migrated to the new ERCB schema; ERCB.purge (stored procedure), ERCB.Text, ERCB.TextRegion, ERCB.TextRegionBoundary, ERCB.TextTarget (tables), ERCB.DailyActivityList, ERCB.TextRegionDefinition, ERCB.LicenseIssued and ERCB.LicenseCancelled (views).

Despite the implication of its name — Well Licenses Issued –  the ST1 doesn’t contain just new licenses; it also contains amendments, cancellations and well re-entries, and each region of the report has different formatting.  Each region is defined with an instance of the ERCB.TextRegion table.  To handle two regions not dealt with previously, two new instances (records) were added to the ERCB.TextRegion table, one for each of the RE-ENTRY WELL LICENCES, and AMENDMENTS OF WELL LICENCES regions of the ST1.

Unfortunately, each region has its own pattern of repeating groups and record layouts.  In the case of amendments, for example, the license is identified, and is followed by one or more amendments.  To handle this, the ERCB.TextRegion.repeatingRowAmt column was modified to permit NULLs, which indicates "this is not a repeating group".  The ERCB.DailyActivityList view had to be modified to be sensitive to this fact, and return correct values for its  groupRowSeq and groupSeq columns.

With all of the above sorted out, it was possible to create a new view, ERCB.LicenseAmended, which (like ERCB.LicenseIssued and ERCB.LicenseCancelled) turns the text format of the report into tabular representation of the data, with columns that follow the naming convention.   The logic to extract the data from the ST1 and populate the standard Intellog tables can be built against these views.

As and when time allows, the ERCB-related tables will be brought up to the new coding standard.  To this end,  createUdt, modifyUdt, uid_Session_update, uid_Session_modify to ERCB.TextRegionBoundary, and added ERCB.TextRegionBoundaryInsertUpdateTrg, so I could keep track of new data, vs. old.  This isn’t just me being pedantic — those four columns are very useful in keeping track of changes to the data, particularly when different sources of information are being blended together.

Posted on 13th May 2008
Under: Developers' Journal | No Comments »

E.getStyleKml and Other Details Prior to Release of Manitoba Data

Like last time, the day leading up to another release of early test data was a bit of a blur, and while an attempt was made to keep track of the steps taken, a lot got left until after the fact.  So, as best I can, here are some details with respect to this past Friday’s work, leading up to the release of the Manitoba data.

The first job was create a new stored procedure — E.getStyleKml — and use it to generate style information for each zoom level.  The resulting files were names Z1_Style.kml through Z6_Style.kml.   These .kmls contain an iteration of enhanced balloon style for each of the symbols in E.WellSymbol.  This seems a little inelegant to my way of thinking, but it will do, for now.  It’s also necessary to have one of the .kmls for each of the zoom levels because they also control the scaling of the symbols and labels, which are different at each zoom level.  Also inelegant, but acceptable, for now.  The main benefit of using these style .kmls is they enable the hundreds of .kml files containing placemark information to simply reference a style name in one of the style .kmls, rather than repeating style information over and over again.

The mysteriously named Untitled2.sql was then used to generate a sqlcmd call to E.getWellGroupKml for each file for all zoom levels, which was something like 700+ files.  Untitled2.sql makes use of the equally mysteriously named dbo.temp1() table-valued function.  In turn, dbo.temp1() queries the E.WellDetail() table-valued function and base.ZoomLevel table to generate the input parameters for the E.getWellGroupKml stored procedure.  The script generated by Untitled2.sql was then edited into regenerate.bat, and set to run over lunch hour.  It took to more than a couple of hours to generate all of the .kml page files.   Not bad, for now, but it probably has to be improved, still.

Continuing the unbroken tradition of nondescript names referenced in this post,  Untitled3.sql was used to generate a <Folder> tag for each of the .kmls produced by regenerate.bat.  The <Folder> tags are used to control at what zoom level a given well symbol is displayed, and the characteristics of same.  These tags were subsequently edited into MB.kml, which forms the entry-point for the user.

Z.kml was hacked together, and a reference to it added to MB.kmlZ.kml is a level of detail displayed at the lowest (loosest) zoom, all the way out to infinity.  It gives the user an indication there is something in southwestern Manitoba — at least enough of an indication that it’s worth zooming in.  When they do, the progressively-more populated levels of detail are displayed.  The procedure around the creation of Z.kml is pretty ragged and has to be documented and cleaned up the next time around.  My feeling is it can probably be generated along with all of the other style .kmls, but that will be clearer the next time I run through all of the steps above.

Finally, it was determined that one .kmz, containing everything described above, might be worthwhile trying, in the interest of making the protocol a little less chatty, and hence deliver better performance in slower network environments.  It was worth trying, at least.  Creating the .kmz was surprisingly easy to do; it was a matter of using the File > Save As… from the Google Earth (GE) main menu.  This allows .kmz format to be selected.  Very cool.  The only thing it seems to miss is the references to the style .kmls.  All that was necessary to solve this problem, was to open the .kmz with 7-Zip, and manually drag the style .kmls into the files folder of the .kmz.  The resulting .kmz was a shade over two megabytes, which is still a pretty reasonable size, and should not impose an unreasonable burden on users.

Posted on 12th May 2008
Under: Developers' Journal | No Comments »

Manitoba Data Available for Preliminary Review

The initial round of work to load Manitoba data is complete, which enables it to be viewed on Google Earth (GE).   Similar to the BC data that was loaded a while back, this is early, development work so it should not be taken as gospel.  However, it will give you a very good idea of how the data looks when presented on the GE platform.  There are a couple of additional things which are improvements over previous efforts;

Improved Symbols  There is a wider range of symbols that represent a wider variety of information for each well.  With the BC data, it was oil, gas or undefined — that was it.  In the new data, there are a  total of 20 different combinations of well symbols.  They show the product of the well, it’s status and a variety of other information.  These symbols are also described in a little more detail in the Well Symbol Library.

KMZ format vs. KML format  The Manitoba data is bundled into a single, compressed file, of about two megabytes.  This means that once it is downloaded, application performance will be increased, because fewer round trips back to the Intellog server are required.  Users with slower network connections, in particular, will appreciate this.  The slight downside is that you do have to do  fairly large, initial download before the data is available for viewing.

If you haven’t done so already, you will need to install Google Earth (GE) — complete directions for downloading and installing can be found here, and it’s entirely free.  You only have to install GE once, before you look at the data the first time.  When you have successfully installed GE, click here to initiate viewing of the Manitoba data.  If you need help with GE itself, the User Guide is very helpful, along with the video tutorials.

If there are others that you feel would be interested in taking a look, by all means, send them the link to this post, or if you prefer, you can just send them the following link (but make sure they know they’ll need to install GE, first);

http://www.intellog.com/E/data/well/v0.0/kml/MB.kmz

As always, you are encouraged, and we appreciate you leaving your comments in response to this post.  You can be sure that any issues or concerns you raise will be promptly followed up.  Thank you very much for your interest!

Posted on 9th May 2008
Under: Business Development | 1 Comment »

Chris Anderson on ‘Free’

Intellog’s business plan calls for a fairly significant chunk of value to be given away, at no charge.  Not as a loss leader, or some sort of gimmick, or even some sort of oddly-focused act of charity.  Intellog will give away reference data — petroleum well information, initially — for free, because it makes good business sense, and it’s aligned with a fundamental shift in the way goods and services are bought and sold in information economies.

Chris Anderson, editor of Wired magazine, and author of the The Long Tail, has recently been talking a lot about Free.  I first got wind of this when Guy Kawasaki put a post on his blog regarding a video of Mr. Anderson presenting at Nokia World 2007 — saying it was ‘must-watch video’.  Watch, I did.  Mr. Anderson, amongst a number of topics, talked about a time when it was thought nuclear energy would produce electricity so cheaply, it wouldn’t be worth metering.  Alas, that never did happen, but that wasn’t his point. 

Rather, he invited us to speculate about what the world would have been like if there really had been free electricity.  How about nothing but electric cars, for example?  How about no burning of fossil fuels to heat our homes?  How about no global warming?  It doesn’t necessarily mean nuclear energy was a good idea — or maybe just an idea whose time has not yet come — but one thing is absolutely certain; free electricity would have had a profound effect on just about everything in the world as we know it today.

But Chris Anderson’s real point, of course, is there are some things which really are free, or getting there, very quickly.  Almost every aspect of information technology — certainly the trinity of storage,  computing power and network bandwidth — are all on seemingly irreversible curves of more, for less (and eventually for nothing).   If computing is free, could it be profound changes are inevitable, with every bit as much impact as those described above?  Anderson thinks so.

The obvious example of a business thoroughly walloped by Free was the recording industry, and, surprisingly, it has absolutely nothing to do with illegal downloading.  Increasing file compression efficiency and decreasing cost of bandwidth combined to reduce the cost of manufacturing & distribution to effectively nothing.   It made it inevitable there would be a sea change in how music (and subsequently video), was bought, sold and delivered.   Furthermore, this trend could easily have been spotted 10 years ago.  Yet, the recording industry missed it, Apple didn’t, and the rest — as they say — is history.  What you couldn’t have predicted is how two minute videos of stupid pet tricks would turn out to be a multi-billion dollar idea which would shake traditional media to its very foundation.  This is the profound and capricious nature of Free.

Free computing could well be the free electricity of the 21st Century.  Except that it’s already here, and it’s already having a profound impact on just about everything.

While it’s undoubtedly a bi-product of the book tour phenomenon, Mr. Anderson’s thoughts seem to be entering the mainstream media, like the recent article in The Globe and Mail.   With The Long Tail, Chris Anderson demonstrated a knack for wrapping a simple, concrete phase around an abstract macroscopic concept.  It defines and explains it at the same time.  I’m willing to bet Free is on the same track.  Something that has been quietly be going on for a while — and having a profound impact on the world in which we live — could succinctly be described as FreeIntellog is tapping into this exciting and dramatic change with our approach to freely downloadable reference data.

Posted on 8th May 2008
Under: Business Development | No Comments »