Using uHTML

Introduction

To make a long story short uHTML is an elegant way to extend HTML by defining new tags according to the requirements of a particular website. It allows to connect HTML tags with code chunks and insert function results into the tag attributes. It is irrelevant if a tag is part of a regular HTML tag set or means an extension of it. A function connected to a tag can leave it unchanged, modify it, replace it with a generated content or skip it completely. The library that performs the connection consists of two packages: uHTML and uHTMLnode. This documentation refers to the version 1.2 of the uHTML library.

package uHTML

The first package, uHTML, connects tag names with appropriate functions and invokes the processing of the uHTML code. It loads modules from the script directory (cgi) to bind website specific tags. The loaded modules have to meet one of two possible naming conditions. At first they can be located in a subdirectory of the script directory named uHTML and match the pattern uHTML/*.pm. At second they can be placed in the script directory itself and match the pattern *-uHTML.pm. As third they can be placed in a uHTML subdirectory located in any global perl library directory. The first option is mainly meant to install site-locally libraries used across many websites (projects) like e.g. the uHTML::std library, the second option is meant for project specific libraries while the third option is to be favoured on servers running uHTML on several websites.

Library modules can be put into the uHTML subdirectory of the script (cgi) directory, when there is no access to the system. In this case the strictly project bound modules are supposed to match the pattern *-uHTML.pm and be located in the script (cgi) directory to avoid confusion. Keeping this standard eases the sharing of uHTML modules among several projects.

package uHTMLnode

The second package uHTMLnode provides the interface to a tag structure. This structure is passed to the functions bound by uHTML to a particular tag.

Example

The following full working example shows the basic integration of uHTML into a website. It do not show the use of functions in attributes or request initialisation, but including any of this two is trivial. Please mind that uHTML is opposite to HTML case sensitive.

Adding an <include> tag to uHTML

The <head> data usually remains mostly unaltered across all HTML files of a website. The vantage of keeping the constant part in a separate file is obvious. What we need to do it like this is to create an <include> tag that inserts the content of one file called head-data in several html files. We suppose that we have an include directory in our document root where we keep our overlapping chunks of html. Our uhtml file could then look like this:

index.uhtml
  <html>
    <head>
      <include file="/include/head-data">
      ...
    </head>
  ...
  </html>

The functionality of the <include> tag is an asset in many projects. Following the mentioned naming convention proposal the perl code is placed in a file named include.pm located in the subdirectory uHTML of the script directory. The file include.pm looks then like that:

include.pm
  use uHTML ;

  sub Include($) {
  my $Node = shift;
  $Node->map(join('',<FH>),'')
    if $Node->Attr('file') and
      open FH,$ENV{'DOCUMENT_ROOT'}.$Node->Attr('file');
  }

  uHTML->registerTag('include',\&Include);

Using with CGI and Apache

To link this small library into a website a cgi hook is needed. We place the necessary code in a file called hook.pl in the script directory:

hook.pl
  #!/usr/bin/perl

  use uHTML;

  open $FILE,"$ENV{'DOCUMENT_ROOT'}$ENV{'PATH_INFO'}"
   or die "File: $ENV{'PATH_INFO'} not found";
  print "Content-type: text/html\n\n";
  print uHTML::recode(join('',<$FILE>));

To get the functionality “magically” into all *.uhtml files, we add some lines into the .htaccess file:

.htaccess  DirectoryIndex index.uhtml
  RewriteEngine on
  RewriteRule ^(/?)(.*\.uhtml) $1cgi-bin/uhtml.pl/$2 [L,QSA]

Using with FCGI, plack and nginx

Nginx serves only static files and expects an external server to handle dynamic content. uHTML works well with Plack over the FCGI interface. For this purpose uHTML must be installed system wide.

To link the small library above with our website a FCGI hook is needed. We place the necessary code in a file called uHTML.psgi in the script directory:

uHTML.psgi
 use uHTML;
 use Encode;

 sub
 {
   my $env  = shift ;
   my( $FILE,$DATA,$HTML,@HEAD,$LEN ) ;

   if( open $FILE,$env->{'PATH_TRANSLATED'} and
       $LEN = -s $FILE and
       read( $FILE,$DATA,$LEN ) == $LEN )
   {
     $HTML = uHTML::recoded_list( $DATA,$env ) ;
     $LEN  = 0 ;
     $LEN += length Encode::encode( 'UTF-8',$_ )
        foreach @{$HTML} ;

     push @HEAD,'Content-Type','text/html; charset=UTF-8' ;
     push @HEAD,'Content-Length',$LEN ;
     push @HEAD,'x-powered-by','uHTML' ;
     return [ 200,\@HEAD, $HTML ] ;
   }

   return [ 404,
            [ 'Content-Type' => 'text/plain' ],
            [ 'File Not Found' ]
          ] ;
 }

Now we can start the FCGI server with using: plackup -s FCGI -S /tmp/uHTML -a /srv/cgi-bin/uHTML.psgi

In the nginx configuration we add the section redirecting uHTML requests to our plackup server:

server.conflocation ~ \.uhtml$
{
  try_files $uri /index.uhtml =404 ;

  fastcgi_keep_conn on ;
  fastcgi_split_path_info ^()(.*uhtml)$;

  fastcgi_pass   unix:/tmp/uHTML ;
  fastcgi_index  index.uhtml;

  fastcgi_param  URI                  $uri;
  fastcgi_param  SCRIPT_FILENAME
      $document_root$fastcgi_script_name;
  fastcgi_param  SCRIPT_NAME          $fastcgi_script_name;
  fastcgi_param  PATH_TRANSLATED
      $document_root$fastcgi_path_info;
  fastcgi_param  QUERY_STRING         $query_string;
  fastcgi_param  REQUEST_METHOD       $request_method;
  fastcgi_param  CONTENT_TYPE         $content_type;
  fastcgi_param  CONTENT_LENGTH       $content_length;
  fastcgi_param  REQUEST_URI          $request_uri;
  fastcgi_param  REQUEST              $request;
  fastcgi_param  REQUEST_SCHEME       $scheme ;
  fastcgi_param  REQUEST_FILE         $request_filename;
  fastcgi_param  DOCUMENT_URI         $document_uri;
  fastcgi_param  DOCUMENT_ROOT        $document_root;
  fastcgi_param  SERVER_PROTOCOL      $server_protocol;
  fastcgi_param  GATEWAY_INTERFACE    CGI/1.1;
  fastcgi_param  SERVER_SOFTWARE      nginx/$nginx_version;
  fastcgi_param  REMOTE_ADDR          $remote_addr;
  fastcgi_param  REMOTE_PORT          $remote_port;
  fastcgi_param  SERVER_ADDR          $server_addr;
  fastcgi_param  SERVER_PORT          $server_port;
  fastcgi_param  SERVER_NAME          $server_name;
  fastcgi_param  HOST_NAME            $hostname;
  fastcgi_param  HTTPS                $https;
  fastcgi_param  HTTP_COOKIE          $http_cookie;
  fastcgi_param  HTTP_ACCEPT_LANGUAGE $http_accept_language;
  fastcgi_param  REDIRECT_STATUS      200;
  fastcgi_param  SCRIPT_ROOT          /srv/cgi-bin ;
  fastcgi_param  DATA_ROOT            /srv/cgi-bin ;
}

location /
{
}

Short uHTML Interface Reference

All in the packages uHTML and uHTMLnode defined methods and variables at a glance.

package uHTML

The uHTML package loads all modules from the script directory that match uHTML/*pm and that match *-uHTML.pm. It provides methods that assign code to tags and attributes and invokes the uHTML to HTML translation.

Methods

   uHTML::registerTagCode( $TagName,$Code ) ;

Bind the function $Code to the tags named $TagName. The function $Code will be called with a reference of the uHTML node corresponding to the tag $Code( $Node ). The function is expected to alter and adjust the tag attributes and content. The modified tag gets automatically inserted into the HTML output.

If more then one function is bound to one tag, the functions are daisy-chained. The execution order of those functions is not determined.


      uHTML::registerTag( $TagName,$Code ) ;

Bind the function $Code to the tags named $TagName. The function $Code will be called with a reference of the uHTML node corresponding to the tag $Code( $Node ). The function is expected to insert necessary data using the appropriate uHTMLnode methods $node->map( $HeadText,$TailText ) or $node->insert().


      uHTML::registerAttrCode( $VarName,$Code ) ;
      uHTML::registerVar( $VarName,$Code ) ;

Bind the function $Code to the attribute variable called $VarName. Both functions are identical. The attribute variable gets replaced by the return value of the function. The function $Code is called with a reference to the node representing the tag, the name of the attribute containing the function and the function name, followed by the function arguments: $Code( $Node,$Attribute,$Function,$Value1,$Value2, ... ).


      uHTML::register( $Name,$Code ) ;

Bind the function $Code to the attribute variable called $Name and to a tag called $Name simultaneously. The tag or attribute variable gets replaced by the return value of the function. The function $Code is called with a reference to the node representing the tag, the name of the attribute containing the function and the function name, followed by the function arguments: $Code( $Node,$Attribute,$Function,$Value1,$Value2, ... ). If the function is called in reference to a tag, $Attribute and $Function are not defined. In this case the function if necessary has to set the values $Value1, $Value2, ..., from the attributes of the tag using $Node->Attr( $Name ).


      uHTML::Tags() ;

Returns a list of all tags with a function assigned to.


      uHTML::TestTag( $Name )

Check if some code is bound to the tag $Name.


      uHTML::TestVar( $Name )

Check if some code is bound to the attribute variable $Name.


      uHTML::FileStart

Set the current file name for debug output. Ignored in production mode.


      uHTML::FileEnd

Reset the current file name for debug output to the previous name. Ignored in production mode.


      uHTML::parse( $text,$env ) ;

Parses $text into a uHTML tree. Returns a reference to a uHTMLnode node. $env provides a reference to the environment. If not given, the current environment is used.


      uHTML::recoded_list( $uhtml,$env )

Translates uHTML data $uhtml into HTML. Returns a reference to a array of HTML chunks containing the final HTML code. $env provides a reference to the environment. If not given, the current environment is used.


      uHTML::recode( $uhtml,$env ) = @_ ;

Translates uHTML data $uhtml into HTML. Depending on the context returns a scalar or string array containing the final HTML code. $env provides a reference to the environment. If not given, the current environment is used.

Debug Mode and Production Mode

uHTML produces some (sparse) error codes. It is advisable to switch them off in production mode. In production mode HTML comments get removed and the code get slightly compacted too. The production mode is activated with setting $uHTML::FileName = '' ; prior to translation of uHTML to HTML.

package uHTMLnode

The package uHTMLnode provides the hierarchical structure for the uHTML code and contains after the translation the HTML data.

Data Structure

uHTMLnode data structure is only remotely related to the HTML nodes in DOM. The data structure is intended to be manipulated only by its methods.

  • FirstChild: - first child node
  • LastChild: - last child node
  • Parent: - parent node
  • Prev: - previous node (Null for the first node in a hierarchy level)
  • Next: - following node (Null for the last node in a hierarchy level)
  • Name: - name of the node (tag name)
  • End: - true if the node has a closing counterpart (e.g. <div> ... </div>)
  • XMLClose: - true if the node has no closing counterpart but is noted in XML manner with a "/" before the closing bracket (e.g. <img ..... />)
  • Attributes: - reference to a HASH containing the attributes of a tag
  • Text: - text within a node till the first child node or end of the node (corresponds to the first text node in DOM if the first DOM child node is a text node)
  • Trailer: - text following a node (corresponds in DOM to the first text node following the node if the first following node is a DOM text node)
  • tainted: - recursive processing of the node necessary
  • HTML: - final HTML code
  • ENV: - pointer to the current environment, decisive in FCGI environments
  • Methods

       uHTMLnode->new( $Name,$Text,$Prev,$env ) ;

    Create a new node with the name $Name, a trailing text $Text and the preceding node $Prev. This method is called by the uHTML package and is seldom needed outside of it.

    
          $node->name() ;

    The name of a node. It equals to the name of the uHTML tag represented by the node. By passing a argument $node->Name($NewName) the tag can be renamed.

    
          $node->parent() ;

    The parent node.

    
          $node->prev() ;

    The preceding node.

    
          $node->next() ;

    The following node.

    
          $node->copy() ;

    Copies a node. This function is useful to generate lists. The copy of the node is not hooked into the structure of the original uHTML file, although the parent node is correctly assigned. All child nodes are copied as well. The trailing text of the node is not included in the copy.

    
          $node->prepend( $Node ) ;

    Insert a node into the uHTML tree before current node.

    
          $node->append( $Node ) ;

    Insert a node into the uHTML tree after current node.

    
          $node->embed( $Name ) ;

    Creates a new node $Name and embeds the current node in it. In effect the current node gets replaced by the new node $Name while the current node becomes the only child of the new node.

    
          $node->firstChild() ;

    First subordinated node.

    
          $node->lastChild() ;

    Last subordinated node.

    
          $node->addChild( $Child,$PrevChild ) ;

    Add a child node after the child node $PrevChild. If $PrevChild is not defined, add as new first child node, if $PrevChild equals $node->lastChild() the new node becomes the new last child.

    The node $Child mustn't be a child of $node. If $Child has its parent node set, it will be correctly moved within the uHTML document.

    
          $node->appendChild( $Child ) ;

    Add a child node as new last child.

    The node $Child mustn't be a child of $node. If $Child has its parent node set, it will be correctly moved within the uHTML document.

    
        $node->adoptChildren( $From,$Child ) ;

    Transfer the children of one node to another.

    The children of the node $From are moved to the $node and inserted if $Child is given after $Child or ahead of all children of $node if $Child is not defined.

    
          $node->findChild( $Name,$Child ) ;

    Find a child node by name.

    Find a child node of $node named $Name after the child $Child. If $Child is undefined, find first child node named $Name.

    
          $node->replace( $New,$KeepTrailer ) ;

    Replace a node.

    Replaces a node in the uHTML structure. Normally the trailing text gets replaced in process too. To keep it, $KeepTrailer must be true. Returns the detached original node if successful.

    
          $node->detach( $KeepTrailer ) ;

    Detaches a node from the uHTML structure. Normally the trailing text gets deleted in process. To keep it, $KeepTrailer must be true.

    
          $node->delete() ;

    Deletes a node from the uHTML structure.

    
          $node->attr( $Name ) ;

    The value of a singular attribute as a string. Possible attribute functions get interpreted. If more then one attribute with the same name exist, the values are concatenated. If a value get provided ($node->Attr( $Name,$Value );), the attribute get set to this value. If the attribute do not exists, it gets created.

    
          $node->rawAttr( $Name ) ;

    The original value of a singular attribute as a string. Possible attribute functions are not interpreted. If more then one attribute with the same name exist, the values are concatenated. If a value get provided ($node->RawAttr( $Name,$Value );), the attribute get set to this value. If the attribute do not exists, it gets created.

    
          $node->codeAttr( $Name ) ;

    The value of a singular attribute as a string. Possible attribute functions get interpreted. If more then one attribute with the same name exist, the values are concatenated.

    
          $node->setAttr( $Name,$Value ) ;

    Sets the attribute $Name to the $Value. If the attribute do not exists, it gets created.

    
          $node->testAttr( $Name ) ;

    Tests the existence of the attribute $Name. This is necessary to test for attributes without any value provided.

    
          $node->testAnyAttr( $Name1,$Name2,$Name3, ,... ) ;

    Tests the existence of any of the attributes with the provided names.

    
          $node->testAllAttr( $Name1,$Name2,$Name3, ,... ) ;

    Tests the existence of all attributes with the provided names.

    
          $node->addAttr( $Name1,$Name2,$Name3, ,... ) ;

    Creates the attributes $Name1, $Name2, $Name3, ,..., without assigning a value to them.

    
          $node->deleteAttr( $Name1,$Name2,$Name3, ,... ) ;

    Deletes the attributes $Name1, $Name2, $Name3, ...

    
          $node->attributes()

    Reference to the attributes of a node. E.g. the style of a tag can be accessed by $node->attributes()->{'style'}. The methods above which access single attributes should be preferred.

    
          $node->text() ;

    The text inside of a closed tag up to the first child tag. It corresponds to the first text node in DOM if the first DOM child node is a text node. Can be altered by passing a argument.

    
          $node->trailer() ;

    The text following a tag up to the next tag. It corresponds in DOM to the first text node following the node if the first following node is a DOM text node. Can be altered by passing a argument.

    
          $node->end() ;

    True, if a tag is closed (the closing tag exists). If a argument is passed, the node becomes a closed node or open node depending on the argument.

    
          $node->XMLClose() ;

    True if the tag is closed by a "/>" instead of a simple ">". Can be enforced or removed by passing an according argument.

    
          $node->map( $HeadText,$TailText ) ;

    Map a node into HTML output without tags preceding the node with $HeadText and closing it with $TailText. If a node has no closing tag, $TailText follows directly $HeadText. Practically seen it replaces the opening and closing tags with $HeadText and $TailText. This is the most common way to produce HTML output in functions hooked into uHTML using uHTML::registerTag( $TagName,$Code ) ;.

    
          $node->insert() ;

    Inserts a node's HTML code including the tags and attributes. It is meant to insert an altered node into the HTML output. This is the second way to produce HTML output in functions hooked into uHTML using uHTML::registerTag( $TagName,$Code ) ;.

    
          $node->HTML() ;

    The HTML code of a node after a map() or insert() was performed. It is empty before a map() or insert() on the node is done. It is possible to set this value directly by passing an argument $node->HTML( $html ). By setting it the resulting HTML code is replaced by $html.

    
          $node->appendText( $text ) ;

    Append $text to the existing HTML output.

    
          $node->env() ;

    Returns a reference to the current environment in which a HTTP request is performed.