Archive for October, 2008

XSL to Achieve Relational(-like) JOINs

The plan it to use Amazon’s SimpleDB to host the millions of rows that will result from the application of the concordance ideas described previously.  Keywords gathered on the form inputSearchCriteria.php will be handed to SimpleDB, and it will hand back the Document.uid for the documents in which those keywords appear.  But what of the other information the user is going to want to know, other than the cryptic uid?  The URL where the document can be found is a good example of what they’ll want to, but also, the source of the information, perhaps the author, when it was modified last — the list goes on an on.   SimpleDB does not claim to be a relational database, and would simply recommend bundling these additional values into the domain* with the uid and modify the query to return not only the uid, but the other values the users want to see when they hit a document.

There are a couple of problems with this approach.  First, there is the potential for it to be costly — well, more accurately, not ‘cost free’.  SimpleDB is a pay as you go system, so if you have to store the verbose URL 4000 times because there are 4000 keywords indexed in the related document, you are, in effect, paying for each character of each redundant copy of the URL to be stored.  The amount paid per character is extremely small, but not zero.  However, a bigger issue would be longer term maintenance of the SimpleDB-resident keywords.  If at some point in the future, a new piece of information is associated with each indexed document, you potentially have to go back and touch each record in the SimpleDB domain — this could be millions of records.  So, a much better approach would be to maintain a lookup table from which the other values can be retrieved based on the uid handed back from SimpleDB.  Fairly traditional relational thinking, actually.

Wanting to stay away from an instance of MySQL for the time being for a variety of reasons, the choice was made to investigate whetheran XSL could be used to take the uid and retrieve values from an XML-based version of document library.  Much to my surprise, and pleasure, it can.  What’s more it would appear as the the xsl:key syntax even makes it an indexed lookup.  The XML immediately below is an analogue for the result set coming back from SimpleDB;

<?xml version='1.0' encoding='UTF-8'?>
<?xml:stylesheet type='text/xsl' href='test.xsl'?>
<Result>
	<Hit>
		<uid>FFF4A3CB-BCF2-485F-942A-25200BA7AAB1</uid>
	</Hit>
	<Hit>
		<uid>8730192F-B35F-422B-9EA5-5FEC5ED491CE</uid>
	</Hit>
	<Hit>
		<uid>0008BA12-B476-4F42-8B11-7584271A2843</uid>
	</Hit>
	<Hit>
		<uid>906A68F6-7F5F-4A19-A026-A65CE8D21194</uid>
	</Hit>
	<Hit>
		<uid>906EB740-C62C-414E-BF57-49B9FD09B053</uid>
	</Hit>
	<Hit>
		<uid>906ED85B-5FFF-4474-861A-9417D3C6547D</uid>
	</Hit>
</Result>

And the following XML is a fragment of the table from which values will be looked up, in this case the urlTxt element.

<Library>
	<Document>
		<uid>0008BA12-B476-4F42-8B11-7584271A2843</uid>
		<urlTxt>http://E.intellog.com/data/from/alberta/st1/2007/08/23.txt</urlTxt>
	</Document>
	<Document>
		<uid>000DA073-1E71-41C4-BD9A-61F063DF1497</uid>
		<urlTxt>http://E.intellog.com/data/from/alberta/st1/2001/07/26.txt</urlTxt>
	</Document>
	<Document>
		<uid>0013F4EF-333C-4E38-9BD2-10080307CD64</uid>
		<urlTxt>http://E.intellog.com/data/from/alberta/st49/2002/09/03.txt</urlTxt>
	</Document>
	<Document>
		<uid>0014F12E-BD48-474A-9FB9-14ABC0BF9808</uid>
		<urlTxt>http://E.intellog.com/data/from/alberta/st1/2002/02/27.txt</urlTxt>
	</Document>
	<Document>
		<uid>002440D7-E977-48F9-A031-D39135DE48CC</uid>
		<urlTxt>http://E.intellog.com/data/from/alberta/st1/2003/02/28.txt</urlTxt>
	</Document>
	<Document>
		<uid>0055402B-551F-41F6-9674-839F0FC5ADBB</uid>
		<urlTxt>http://E.intellog.com/data/from/alberta/st49/2008/05/19.txt</urlTxt>
	</Document>
	<Document>
		<uid>005B9E6A-AFC4-4532-B1CA-EB5D76715EA9</uid>
		<urlTxt>http://E.intellog.com/data/from/alberta/st1/2003/01/05.txt</urlTxt>
	</Document>
	.
	.
	.
</Library

And then the following XSL;

<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform
	version="1.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	>

	<xsl:output
		method="text"
		omit-xml-declaration="yes"
	/>

	<xsl:key name="uid_Document" match="Document" use="uid"/>

	<xsl:variable name="uid" select='document("Library.xml")/Library'/>

	<xsl:template match="Hit">

		<xsl:text>The uid </xsl:text>
		<xsl:value-of select="uid"/>
		<xsl:text> represents the URL </xsl:text>

		<xsl:apply-templates select="$uid">
			<xsl:with-param name="currentHit" select="."/>
		</xsl:apply-templates>

	</xsl:template>

	<xsl:template match="Library">

		<xsl:param name="currentHit"/>
		<xsl:value-of select="key('uid_Document', $currentHit/uid)/urlTxt"/>
		<xsl:text>. </xsl:text>

	</xsl:template>

</xsl:transform>

returns a result set consisting of the uid from the first first XML above, followed by the urlTxt from the second XML.  Currently, there are about 5000+ Document elements in the lookup table XML, and to this point, the performance of the join appears to be quite good.

Code Shavings  Much thanks and much of the credit goes to Uche Ogbuji at Fourthought for his great article on IBM’s developerWorks website.  It provided the technical insight, and a clear example which served as a template upon which my work was heavily based.

*This is SimpleDB terminology that is roughly analogous to the table concept in a relational database.

Posted on 28th October 2008
Under: Developers' Journal | 1 Comment »

Cross Browser XML Horror Show

It’s like I woke up and Bill Clinton was still president, and Al Gore had just invented the Internet.  Or maybe it was just some cruel Hallowe’en prank.  But my first hands-on experience with client-side, cross browser XML seemed to harken back to the dark ages of 56K dial-up — what a horror show.  Or maybe there is something really obvious which is getting missed, and this is a whole lot easier than it seems to be.

The specific objective was to embed a little XML island in the HTML page.  This XML will contain a short list of parameters, which will subsequently be used to populate some of the elements in the HTML.  The latter is accomplished by the bodyOnLoad() JavaScript which executes when each page loads.  For example, rather than hard-code the TITLE element on each page, why not simply dedicate a tag or two in ApplicationDefinition.xml to the value for TITLE, and have bodyOnLoad() pick the value out of the embedded XML and pump it into TITLE.

Long story short, Internet Explorer (IE) does it’s own thing with respect to the instantiating the XMLDOM object, whereas ROW (rest-of-world) uses an alternative syntax, as exemplified by the following;

try { //Internet Explorer
	xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
}
catch(e) {
	try { //Firefox, Mozilla, Opera, Safare etc.
		xmlDoc = document.implementation.createDocument("", "", null);
	}
	catch(e) {
		alert(e.message)
	}
}

try {
	xmlDoc.async = false;
	xmlDoc.load("test.xml");
	document.write(xmlDoc.getElementsByTagName("nm")[0].childNodes[0].nodeValue);
}
catch(e) {
	alert(e.message)
}

This would appear to accomplish the objective.  Running IE or Firefox, it runs just fine, picking out the first occurrence of the nm element and displaying its contents.  However, Safari gets hung up, but not for the reason first imagined; it would seem that the load() method is not supported…oh brother! 

At this point, it was necessary to get on and make a living, so parameterDiv.xsl was created, which was utilized by getPageElement.php to populate parameterIfrm on each page, in a very similar manner to other page elements.  Once loaded parameterIfrm contains a series of DIVs for which the IDs reflect the names of the parameters being passed to the page.  The getParameterTxt() function was added to Intellog.js, which parses values out of the nested DIVs.  The parsed-out values are then used to populate the TITLE element, and other elements in the future.  The DIV element is fully supported across the three target browser environments, so it was determined to go this route for the time being, and move on.

Code Shavings   This isn’t the first such horror show.  Similar issues cropped up with CSS, a few months back, which was described in Cross Browser Menu Integration Horror Show.

Posted on 27th October 2008
Under: Developers' Journal | No Comments »

Web Application Page Naming Standards

Before the application directories described previously become too cluttered, some standards regarding the naming of application pages.  For the time being, these are limited to HTML (static) and PHP (dynamic) pages, and are as follows;

[verb][Noun].[htm|php]  The standard format is [verb], all lower case, followed by [Noun], with an initial capital letter, followed by either of the three letter extensions denoting the file type.  Standard verbs are as follows, in alphabetical order;

  • get  Retrieves information, but not from the screen (eg. getPageElement.php)
  • input  Retrieves information from the user of the application using a screen form.
  • output  Returns information back to the user, on the screen.
  • put  Sends information information to a process or a data store (example putLogout.php)

The list of nouns will potentially be a lot longer, but here are some to start, each of which is intended to describe a discrete, atomic entity, described in as few words as possible — hopefully just one.  As with most of the other naming standards, they are always singular.  In alphabetical order;

  • BusinessAssociate  A person, organization or company with which the user has some sort of relationship.  For examples, both customers and suppliers fit this definition.  At a minimum, a business associate should have a unique identifier and a unique name.
  • Login The workflow and data associated with logging in to the application, and initiating a session.  
  • Logout  The workflow and data associated with logging out of the application.  inputLogout.php, for example would collect information about the logout process (such as a comment or feedback, as well as a confirmation).  By way of comparison,  putLogout.php would physically process the logout information provided by inputLogout.php.
  • OpenID  In practical terms, a string of characters which are going going to look a lot like a URL, without the http:// at the beginning.  It contains information regarding the site to which the user authenticates, along with a means for identifying the specific user to be authenticated.  See prior post for further information.
  • PageElement  An artifact on a web page, as described in Standard Page Elements on Intellog Web Application Pages.
  • SearchCriteria  One or more criteria used to extract information from a database.  The simplest example would be the keywords the target information will contain. 
  • SearchResult  The data which meet the criteria specified in the SearchCriteria, described immediately above.
  • UserIdentity  Email address, first and last name, other ‘tombstone’ data associated with a user.

The [Noun] should be held in common between pages which implement different parts of the same workflow.  For example, inputOpenID.php would allow for the input of information regarding OpenID, whereas outputOpenID.php would report some information back out, also about OpenID.  Note that when multiple nouns are required to fully describe an entity, then camel case should be used to differentiate between these multiple nouns.

Posted on 24th October 2008
Under: Developers' Journal | No Comments »

Krakatoa, Part Deux

In yesterday’s post, a strange bug with Internet Explorer (IE) was described.  When populating an IFRAME from a URL, IE corrupted the HTML a little — it hacked off an opening FORM tag, leaving the closing tag intact.  It also did some other strange (but less consequential things) like omitting the quote characters used on some of the tag attributes.  It really looks like something got forgotten in the IE code here*.   A little more testing revealed that both Safari and FireFox were well-behaved in this regard, with IE being the only problem child.  Chrome and Opera were not tested. 

In the code snippet below, the getElementById() method is used to suck the HTML out of the inputIfrm DIV, from whence it is — or was — blown directly into the inputDiv DIV.  Now, the HTML is held in the variable txt momentarily, gets examined for the buggy content, gets fixed if necessary, and then gets blows it into inputDiv.  This is ugly in the extreme, and pretty fragile, but most importantly, simply ridiculous that it’s necessary at all.  But believe it or not, the code hack shown actually works.  All of the problems with the implementing form reset functionality were completely resolved, and development was able to carry on. 

var obj = document.getElementById("inputDiv");
if (obj != null) {
	var txt = document.getElementById("inputIfrm").contentWindow.document.body.innerHTML;
	if (txt.substr(0, 17) == '<div id="inputFrm">')
		obj.innerHTML = '<div id="inputFrm"><form id="frm">' + txt.substr(17, txt.length - 17);
	else
		obj.innerHTML = txt;
}

After removing this particular obstruction, things flowed fairly well.  The effort to separate and rationalize the JavaScript was successful, and the bodyOnLoad(), resetFrm() and submitFrm() functions were placed in a library file called, for now, Intellog.js, located in [siteRoot]js.  The header in several of the .php files was modified to eliminate the reference to the old, discrete bodyOnLoad.js file, and everything carried on its merry way.

Code Shavings  Thanks to </dream.in.code> for their tip on how to present source code in the neat little scrolling box.  Whodathunk you could do all that with one little, styled DIV element.  ♦  Even with modern implementations of JavaScript, true method overloading is still not supported.  But some of overloading can be achieved by recognizing parameters not passed in the  method call will show up in the executing method as null.  For example the Intellog library function resetFrm(fieldLbl, lbl) can be called as resetFrm(), in which case, the parameter fieldLbl and lbl will be null.  Call it as resetFrm('firstNm'), and fieldLbl will be populated with firstNm, and lbl will be null  Call it as resetFrm('firstNm', 'employeeFrm'), and both the parameters will be populated.

*Of course, if somebody out there can provide a rational explanation as to why this is happening, then please, post a comment below and let us all know!

Posted on 24th October 2008
Under: Developers' Journal | 1 Comment »

Krakatoa, East of JavaScript

The eruption of Krakatoa, photographed in August 26-27, 1883. As described Standard Page Elements on Intellog Web Application Pages, the Intellog application pages are defined in ApplicationDefinition.xml*, and the process of modifying this file was continuing with the new inputLogout.php page.  Each application action button on the fixed panel at the bottom of the screen can be associated with some JavaScript, using the .../btn/javascriptTxt element.  However, the seemingly innocent task of adding a ‘clear’ button to the inputLogout.php page, to enable the clearing of the input form of all data, pretty much blew apart the plans for the day.

The desired clearing action seemed to work just find with Safari, but when the same page was launched with IE, no joy — the clearing of the form field did not take place.  On this particular page, there is only one field (which permits the entry of comments) so it’s really not a big deal.  But every time there’s some sort of discrepancy between the way browsers work, it’s important to take a closer look to see what’s going on below the covers, to make sure the problem is not symptomatic of a bigger issue.  As explained in excruciating detail in the post above, the HTML starts life as a definition in XML, gets transformed to HTML with an XSL, gets loaded into an IFRAME when the main page loads, and is then transferred to its final location using a few lines of JavaScript in the bodyOnLoad() function.  Complicating things still further, the detailed JavaScript found in ApplicationDefinition.xml was being relocated into a library-type .js file in [siteRoot]js. This would enable the button make a succinct call to the library code, rather than have all the code embed itself in the rendered form.  So, unfortunately, there were quite a few places to look to find the source of the problem.

But here’s what it netted out to, basically; Safari doesn’t seem to have a problem with the fact there are two identical <DIV ID="inputFrm">s on the same page.  There is one within the <DIV ID="inputIfrm"> where the HTML initially loads, and one in the location to where the HTML is transferred with bodyOnLoad(), inside <DIV ID="inputDiv">.  IE does seem to have a problem with this, and what’s more, it manifests itself in an extremely strange way.  If the outerHTML property of <DIV="inputDiv"> is examined closely, IE seems to hack off the opening <FORM> tag contained within it!  The closing tag is there, but believe it or not, the opening tag is gone.  It would seem this is something to do with the IE load page load logic confusing the first occurrence of the <FORM> tag — within <DIV ID="inputIfrm"> — with the second, within <DIV ID="inputDiv">.  But really, it’s anybody’s guess at this point, and not worth spending any more time on.

The solution will be to reorganize the many nested DIVs (that was on the agenda, anyway), along with finding a way to load the post-transformation HTML directly to its final location without the interim step of loading to an IFRAME.  And I was wondering what I was going to do today.

*Or appDefinition.xml, as it was called ‘back then’.

Posted on 23rd October 2008
Under: Developers' Journal | 1 Comment »

More on getPageElement.php

Previously, getPageElement.php was described as being able to transform appDefinition.xml into standard page elements based on XSL transformations.  The XSLs were named after the page elements they create, hence applicationActionBstp (application action button strip), breadCrumbWkflw (bread crumb workflow) and poweredByImg (’powered by’ image) page elements are created with the active participation of applicationActionBstp.xsl, breadCrumbWkflw.xsl and poweredByImg.xsl files, respectively.  Attention then turned to inputFrm.xsl, which was intended to take definition information found in appDefinition.xml and generate a standard data input form using a TABLE element with embedded INPUT, SELECT, TEXTAREA elements and the like.  To start, it was intended to implement inputOpenID.php and inputSearch.php.  The former provides OpenID login functionality which is common to all Intellog applications.  The latter is specific to the Roundabout application — Intellog’s industry-specific search and document exchange engine.

According to the standard, therefore, inputOpenID.php would be located in [siteRoot]php, whereas inputSearch.php would be located in [appRoot]/php/php/inputOpenID.php and /E/app/Roundabout/php/inputSearch.php, respectively.  And yet, there was nothing fundamentally different about the forms they implement, so it made sense to have them share the same inputFrm.xsl.  To accomplish this, getPageElement.php was modified to implement a limited notion of inheritance.  For example, /php/inputOpenID.php looks for inputFrm.xsl in /xsl, which is to say the same level of the application hierarchy.  Makes perfect sense.   However, the page /E/app/Roundabout/php/inputSearch.php also requires inputFrm.xslgetPageElement.php is (now) smart enough to know if it can’t find inputFrm.xsl in /E/app/Roundabout/xsl, it should look up one level in the hierarchy.  If it can’t find one in /E/app/xsl, then look up one more level.  If it doesn’t find it in /E/xsl, then finally, look in /xsl, where it finally finds /xsl/inputFrm.xsl — the same one used to do the transformation for inputOpenID.php  Now, when implementing a new application, it’s only to  necessary to create an XSL in cases where the HTML resulting from the transformation is unique to a specific market, application suite, or application.

Similarly, some ’spot’ inheritance regarding data was also implemented in getPageElement.php.  For instance company directory information was deemed either to be ‘global’ in nature (shared by all applications, all markets), or ‘market-oriented’ in nature.  The name and address of Revenue Canada, for example, is (sadly) of interest to all markets, all applications, whereas the the name and address of a drilling company would only be of interest to those working in and around the energy industry.  Therefore, all applications in the energy-related suite of applications would share the same company reference information.  As a result, CompanyDirectory.xml is found in [siteRoot]data or [marketRoot]/data/data/CompanyDirectory.xml and /E/data/CompanyDirectory.xml respectively.  In the case of a page which implements a company drop-down list, getPageElement.php is (now) smart enough to know which of these two XMLs to use, depending on what level of the application hierarchy is requesting the control.

This is all predicated on the notion that somehow, magically, getPageElement.php ‘knows’ where it is being called from.  The better part of last Saturday morning was spent attempting to figure out how to do this, but all roads led back to either examination of browsing history (requiring the implementation of session logic) or worse yet, the use of the information in $_SERVER['HTTP_REFERER'].  It was eventually determined that simply tacking on a reference to $_SERVER['PHP_SELF'] in each call to getPageElement.php was the best route to go.  This is now the case, and the information it contains is passed to getPageElement.php through the launchPathTxt parameter.  getPageElement.php code examines the parameter, and from it, makes a determination which level of the hierarchy the call originates from, and decides which XSL and data to use.

Code Shavings  The name chosen to store the application definition seemed a little inconsistent with other naming decisions, hence appDefinition.xml has become ApplicationDefinition.xml, and is now located in the xml subdirectory of [siteRoot] and each of the [appRoot] directories.  ♦  Right now, applications are defined (with the aforementioned ApplicationDefinition.xml) in [siteRoot]xml and [appRoot]/xml.  In the future, however,  there could also be ApplicationDefinition.xml files in [marketRoot] and [appSuiteRoot], if there were applications defined at those levels of the hierarchy.  ♦  Thanks to W3Schools for the valuable assistance with PHP syntax.

Posted on 21st October 2008
Under: Developers' Journal | No Comments »

Directory and Object Structure Standards for Web Applications

With application development well underway, it was necessary to delve into and describe the structure that implements the Intellog application organization described previously.  This is a restatement of the information found in the various taxonomy-related posts, but this time, the site taxonomy is described from the point of view of a web application, as opposed to a simple directory hierarchy. There are four basic levels of directories;

  • [siteRoot]  When the application is in production, this is most likely the root directory of the website or what, in Apache parlance, is known as the DocumentRoot.  [siteRoot] contains all of the [marketRoot] directories, and may also contain one or more of the Standard Application Objects (SAO) described below.  The application objects found in this directory implement the Universally Consistent properties of the Intellog applications.
  • [marketRoot]  All of the [marketRoot] directories are located exactly one level below the [siteRoot].  By convention, they are named using short, one or two capital letter identifiers.  Example: [siteRoot]/E is the [marketRoot] for the energy industry.  [marketRoot] may contain one or more of the SAO described below.  The application objects found in [marketRoot], implement the Market Consistent properties of the application suite.  [marketRoot] will also contain the following objects;
    • app  This is the [appSuiteRoot], described immediately below.
    • data  This directory contains data which is directly related to the market being served by the application suite, and its structure will likely vary by market.  It may also be complemented with a market-specific subdomain for management of very large objects, a very large number of objects, or both.  XML data for which the primary role is the storage of data would be located in this directory.  Example: CompanyDirectory.xml, which contains company ‘tombstone’ information, subsequently used to populate drop-down lists.
    • index.htm  This is the default page if a user types in the URL for [marketRoot] verbatim.  It should provide a redirect to the Intellog home page, because there is no way of telling which specific application the user is looking to start — once there is more than one, of course*.
  • [appSuiteRoot]  This is the app directory, located directly under [marketRoot].  It will contain one or more of the SAO described below, which will implement the Application Consistent properties.  In addition, this directory will contain;
    • index.htm  Similar to the [marketRoot]/index.htm above, this is the default page if the user types the URL for [appSuiteRoot] verbatim, and should provide a redirect to the Intellog home page.
  • [appRoot]  This directory is located directly under [appSuiteRoot], and is named using the Intellog product names, as shown immediately below.  Each [appRoot] will contain the logic which implements the Application Unique properties of the respective application.
    • Roundabout   Intellog market-specific document search and exchange engine.

The relative relationship of the directories described above is significant.  If a particular application needs to access an object in its parent [appSuiteRoot], for example, the object should be able to reference it with the  ../../[object] syntax, if necessary.  If an application object in [marketRoot] needs to access an object in [siteRoot], it should only be necessary to specify ../[object].

Standard Application Objects  The four types of directories described above — [siteRoot], [marketRoot], [appSuiteRoot] and [appRoot] — may contain one or more of the following directories and files.  Note that in the following, the extensions for files found within the directories should be the same as the name of the directory itself (except where noted, in square brackets, at the end of the bullet);

  • cssCascading style sheets which are used to govern the appearance of HTML elements rendered by the browser.
  • htm  HTML files, which contain strictly static application content.  Example: contact.htm, which contains the logic for the Contact page.  
  • image  The directory containing directly image files.   Note the use of the singular, which makes the name of this directory consistent with the other directories at this level of the hierarchy.  [.png, .jpg, .gif, .ico]
  • index.htm  Similar to [marketRoot]/index.htm, and [appSuiteRoot]/index.htm, described above.
  • js  JavaScript files, which are used to provide client-side scripting for web pages.
  • php  PHP files, which contain dynamic application content.
  • xml  This is XML data for which the primary role is configuration information.  Example: ApplicationConfiguration.xml.  Note that XML data can also be located in [marketRoot]/data directory, but in the latter case, it would be  providing primarily database functionality.
  • xsl  XML style sheets used to perform XSL transformation (XSLT).

Finally, some examples, to full illustrate how the structure is used;http://www.intellog.com/E/app/Roundabout/inputContact.htm  The form which collects information related to the ‘contact us’ functionality.  The .htm extension implies this is a static web page.http://www.intellog.com/E/App/Roundabout/outputContact.php  The page which processes the parameters prepared and passed by inputContact.htm.  The php extension indicates it is a dynamic web page.  The connection between the two part of the application interaction is made through the use of a shared none.

Code Shavings  To date, placeholders have been shown enclosed in the less than and greater than symbols (’<’  ‘>’).  It seems I have spent one too many minutes unsnarling the use of these from their legitimate use in XML syntax.  Henceforth, therefore, the square brackets ‘[’ and ‘]’ will be used in the placeholder role.  ♦  Speaking of placeholders, they are always singular, to avoid the awkward looking [appRoot]s.  ♦  Installing a new version of an application, ideally, should involve simply by recursively copying [appRoot] from the development server to the equivalent directory on a production server.

*It might actually be necessary to implement the redirects using .htaccess, rather than the deprecated the ‘meta refresh’ method.

Posted on 21st October 2008
Under: Developers' Journal | 4 Comments »

Turning the ERCB’s ST104A into XML

It was the time to implement the Company drop down list on the inputConfirm.php page.  This implied the need for a fairly solid list of companies to work from, both on the producer and service company sides of the equation.  The ERCB publishes a free monthly update to their list of ‘business associates’, called the ST104A, and it was determined this would be a fairly decent starting point for the data to populate the drop down list.  It was turned into XML using the following steps completed within SQL Server 2005;

  • ERCB.st104a  A TABLE was created to emulate the structure found in the CSV formatted ST104A file downloaded from the ERCB website.  As usual, getting SQL Server 2005 to do a simple import of simple data turned out to be a royal pain.  Some of the latter was alleviated by loading it into Excel, first, and then doing the import.  I stand by my opinion, however, that bulk imports of simple data are way too complicated in SQL Server.
  • ERCB.BusinessAssociate  A more-or-less arbitrary decision was made that in cases where the name recorded by the ERCB was duplicated verbatim in ERCB.st104a, then both of the offending instances would be filtered out.  This VIEW accomplishes this objective, and  also renames columns to standard.
  • ERCB.Company  This second VIEW was built on top of ERCB.BusinessAssociate and filters out instances so what remains are fairly likely to be current, up-to-date, operating companies.  For instance, the VIEW looks for strings of text like inc, limited and similar within the nm. It also suppresses any instances where either the address or phone number are null, or where the leading characters of the company names were numeric.  This cuts the list down from 13,000 or so, to around 5,000. 
  • Intellog - base - Team  A ‘company’, at least as far as it can be determined right now, is a collection of people.  It therefore fits the definition of the base.Team TABLE.  Every Team instance will have two base.Identity instances linked to it through base.identifies, and lbl and nm will be of base.identifies.id_Type of  139 and 140, respectively.  These design decisions where documented through the creation of the Intellog - base - Team database diagram.
  • ERCB.putCompany  The PROCEDURE ERCB.putCompany was created — it takes the instances of ERCB.Company and deconstructs them into instances of  base.Team and base.Identity, and then links these with instances of base.identifies.  Most importantly, this is where the uid is assigned to the instance of base.Team.  Team.uid is the permanent, unique-in-the-known-universe, never-to-be-duplicated, never-changing identifier for the company — important stuff.  In addition, one final filtering task is completed, which is to eliminate all but the first instance of companies where the first word of the name was the same.   Once again, this is somewhat arbitrary, but was intended to eliminate the long list of aliases which seem to show up on a long-in-the-tooth list such as the ST104A.  This knocked the list down to about 3,800 distinct companies.
  • base.getCompanyDirectory  And finally, this procedures produces the content for CompanyDirectory.xml.  The only minor struggle in this regard was trying to get the XML declaration at the beginning.  But it was eventually resolved by simply editing in when the file is created.  Good enough for now.

So, in the end, the raw data downloaded from the ERCB ST104A has been transformed into a tidy XML called CompanyDirectory.xml.  It will be located in [marketRoot]*/data, at least for the time being, as it is a company directory for the energy industry.  In the future, other CompanyDirectory.xmls will be located in other [marketRoot]/data directories, or even the [siteRoot]/data for a universally applicable company directory.

This doesn’t necessarily reflect all of the logic which will need to be executed each time there is an ST104A update from the ERCB.  But rather, it’s a good, solid start from which to work, and the results can certainly be used to populate the drop down box on inputConfirm.php page.  This data will continue to be augmented and refined over time, and users of the application will be encouraged to provide feedback on the quality and currency of the list.

Code Shavings  The column naming standards above are taken from SQL Server 2005 Object Naming Standards - II: Columns, on http://butzi.ca/tech.  Full disclosure though; I write that blog, as well.  ♦  To facilitate the deconstruction described in ERCB.putCompany, the ERCB.Company.lbl and .nm columns were used to create an iSentence, which was subsequently stored in base.Team.xml.  There was no pressing reason to null out Team.xml after this was complete, but it could be, without impacting the integrity of the data.  ♦  Doesn’t CompanyDirectory.xml represent some sort of new naming standard?  Well, sort of.  By convention, it is named the same as the root element contained within it.  Within the root element are multiple instances of the Company element, each of which contains the nm, uid, and eventually other information about each individual company.  And it’s just that I have this natural aversion to the use of the plural in identifiers; hence the use of the collective (and singular) CompanyDirectory.

*[marketRoot] and [siteRoot] are terms described in the future post entitled Directory and Object Structure Standards for Web Applications.

Posted on 16th October 2008
Under: Developers' Journal | 1 Comment »

Standard Page Elements on Intellog Web Application Pages

In the previous post to the Developers Journal, getPageElement.php was introduced as the code used to transform appDefinition.xml into HTML using an XSL.  The resulting HTML can then be dropped into a standard, Intellog page, and styled using a linked Cascading Style Sheet (CSS).  This technique has subsequently been used with the poweredByImg and applicationActionBstp page elements to work out the general mechanics of the approach.  This same technique will now be used to generate a bread crumb trail element, so a detailed description of all of the steps in the process can be documented. 

Generally described, a bread crumb trail is a page element used to orient users as they navigate through chains of application pages.  With Intellog applications, this element will be located immediately to the right of the poweredByImg element, on the fixed footer.  It’s important to stress there’s really no magic as to how the actual steps in the bread crumb trail are determined.  The code immediately below shows a fragment of the XML found in appDefinition.xml

<breadCrumbWkflw>
   <lnk>
      <lbl>Intellog</lbl>
      <seq>-1</seq>
      <tipTxt>Intellog home page.</tipTxt>
      <txt>http://www.intellog.com</txt>
   </lnk>
   <lnk>
      <lbl>inputOpenID</lbl>
      <seq>0</seq>
      <tipTxt>This page.</tipTxt>
      <txt>/inputOpenID.php</txt>
   </lnk>
   <lnk>
      <lbl>[OpenID Provider]</lbl>
      <seq>1</seq>
      <tipTxt>Third-party password authentication.</tipTxt>
      <txt/>
   </lnk>
   <lnk>
      <lbl>confirmOpenID</lbl>
      <seq>2</seq>
      <tipTxt>Confirm information and continue.</tipTxt>
      <txt>/confirmOpenID.php</txt>
   </lnk>
</breadCrumbWkflw>

It’s simply a series of sequenced HTML link definitions.  Whether or not they bear any relationship to the page on which the breadCrumbWkflw element is hosted is entirely under the control of the developer.  Also, by convention, the lnk where seq is 0 is assumed to be the current step in the workflow.  Where seq is -1, this is step immediately prior to the current step, and where seq is 1, this is the step immediately following, and so on.   After the transformation with breadCrumbWkflw.xsl, the following HTML results;

<DIV ID="breadCrumbWkflw">
   <TABLE CLASS="wkflw">
      <TR>
         <TD>
            <A
               HREF="http://www.intellog.com"
               TITLE="Intellog home page.">
               Intellog
            </A>
         </TD>
         <TD CLASS="dividerCell">::</TD>
         <TD CLASS="currentCell">
            <A TITLE="This page.">inputOpenID</A></TD>
         <TD CLASS="dividerCell">::</TD>
         <TD>
            <A
               HREF=""
               TITLE="Third-party password authentication.">
               [OpenID Provider]
            </A>
         </TD>
         <TD CLASS="dividerCell">::</TD>
         <TD>
            <A
               HREF="/confirmOpenID.php"
               TITLE="Confirm information and continue.">
               confirmOpenID
            </A>
         </TD>
      </TR>
   </TABLE>
</DIV>

At this point, the heavy lifting is complete — the definition of the element has been extracted from the XML, and the resulting HTML can be rendered with any popular web browser.  To get the right look-n-feel, however, it is necessary to style it with CSS, such as the following;

DIV.breadCrumbDiv {
   FONT-SIZE: 10px;
   LEFT: 110px;
   POSITION: absolute;
   TOP: 5px;
}

DIV.breadCrumbDiv TD.currentCell {
   TEXT-DECORATION: bold;
}

DIV.breadCrumbDiv TD.dividerCell {
   COLOR: silver;
}

Sharp-eyed readers may be wondering why the breadCrumbDiv does not appear in the HTML, but is referenced within the CSS itself.  breadCrumbDiv is the container found in the page which hosts the element.  To more fully explain, it helps to understand the sequence of steps involved in loading a standard Intellog web page.  The host page (inputOpenID.php in this case) contains one or more iterations of the following code (one per standard element);

<IFRAME
   CLASS="ifrm"
   ID="breadCrumbIfrm"
   SRC="/getPageElement.php?lbl=breadCrumbWkflw&pageLbl=inputOpenID">
</IFRAME>

These IFRAMEs are styled with;

IFRAME.ifrm {
   BORDER: 0px;
   HEIGHT: 0px;
   WIDTH: 0px;
}

which renders the IFRAME invisible.  It’s there simply because it’s permissible for an IFRAME to have SRC attribute, whereas with a  DIV, it’s not.  When the host page is loaded, the IFRAME is populated automatically by the URL found in SRC.  In turn, the innerHTML property of breadCrumbDiv is populated by a couple of lines of JavaScript which is executed when the host page is loaded, as follows;

function bodyOnLoad() {
   var obj = document.getElementById("breadCrumbDiv");
   obj.innerHTML = \
      document.getElementById("breadCrumbIfrm").contentWindow.document.body.innerHTML;
}

So in summary, the standard page element on an Intellog web page starts life as a definition within  appDefinition.xml.  It is transformed with an XSL to produce HTML.  The resulting HTML finds its way onto the web page by first being loaded into an invisible IFRAME, and then transferred to the host DIV using JavaScript triggered when the page is loaded.  Finally, the HTML is styled with a CSS to provide the final look and feel of the page element on the rendered page. 

At this point, it’s intended the Intellog web application pages will be generated dynamically, as described immediately above, when they are requested.  The main benefit of this approach, of course, is the ability to update the bread crumb trail and other standard elements of the application pages simply by editing appDefinition.xml, and have those changes show up immediately for the user

Code Shavings  The only noteworthy technical challenge encountered with the implementation of breadCrumbWkflw was ensuring the lnk elements were presented in the correct sequence.  A significant hint as to how to accomplish was found at xml.com — thanks for the help!  ♦  One interesting quirk of the IFRAME construct above — it seems like it must be written exactly as shown.  Substituting the shorthand for ‘empty element’ (ie. the / character inside the >, instead of the closing tag) seems to negate the influence of the SRC attribute.  In other words, if you use the shorthand notation, the HTML you expect to find in the innerHTML property of the IFRAME, as a result of the SRC attribute, isn’t there!

Posted on 14th October 2008
Under: Developers' Journal | 3 Comments »

getPageElement.php

This short PHP code file generates a standard HTML element based on application definition information which is transmogrified with an XSL transformation (XSLT).  It’s assumed the application definition will be always be found in the current directory, in a file called appDefinition.xml.   When this PHP code is called, it is mandatory to pass it two additional parameters in the URL;

  • lbl  Determines what kind of standard element is going to be produced.  By convention, lbl will be assigned to the ID of the resulting HTML container element.  Also by convention, it will be the base name (ie. no extension) of the XSL used to define the specific nature of the transformation.   For example, if lbl=applicationActionBstp is passed to getPageElement.php, the resulting <DIV> element will have an ID=applicationActionBstp, and it can be assumed the XSL file used for the transformation will have been applicationActionBstp.xsl.
  • pageLbl  The identifying label for the page where the element will eventually be located.  For example, with the applicationActionBstp (the strip of buttons on the lower right, fixed region of the standard Intellog page layout), the number and function of the buttons will vary from page to page.  It is therefore necessary to identity the specific page for which the buttons are destined.  Also by convention, the pageLbl is the base name (no extension) of the related PHP code file.  For example, if pageLbl=inputOpenID, this would be the element definition related to the file called inputOpenID.php.

In other words, when you call getPageElement.php?lbl=applicationActionBstp&pageLbl=inputOpenID, this is effectively saying; "Please give me the HTML element for the strip of Application Action Buttons as required by the inputOpenID.php page."

Posted on 8th October 2008
Under: Developers' Journal | 2 Comments »