Actions

ExpressionScript for developers

From LimeSurvey Manual

Revision as of 23:25, 21 September 2011 by TMSWhite (talk | contribs)

This wiki page is meant for the LimeSurvey development team and others wishing to contribute to LimeSurvey.  It provides details about how to work with, test, and extend Expression Manager (EM).

Getting Started

The best way to get started with EM is to:

EM Source Code Organization and  Purpose

EM includes the following source fies:

  1. /application/helpers/admin/expressions/em_core_helper.php
    • ExpressionManager class - which implements a recursive descent parser in PHP and JavaScript which let you create variables and securely expose a specified set of pre-defined functions.
    • Defines the set of available functions within EM ($this->amValidFunctions)
    • exprmgr_*() functions - custom PHP functions exposed to the user via EM.
    • Has built-in test cases:
      • UnitTestStringSplitter() - validates that properly extracts expressions (surrounded by curly braces) from longer strings
      • UnitTestTokenizer() - validates that strings are properly tokenized assigned the right token classification
      • ShowAllowableFunctions() - shows a table of the 70+ avaiable functions and their syntax
      • UnitTestEvaluator() - this should test all EM syntax, operators, and functions in PHP and JavaScript and confirm they generate accurate and identical results.  Currently, it contains 200+ test cases, but does not yet test every function.
  1. /application/helpers/admin/expressions/em_manager_helper.php
    • LimeExpressionManager class - this implements LimeSurvey-specific functions for accessing ExpressionManager. Initially, the goal was to keep the ExpressionManager completely separate  (e.g. de-coupled)from LimeSurvey so that developers would never need to modify the ExpressionManager class itself.  However, there is now some tight coupling between LimeExpressionManager and ExpressionManager
    • It is implemented as a singleton, so all access is via static functions.  Theoretically, this could be loaded as a library within CodeIgniter, but it isn't clear that that provides additional value, and would require significant re-write (since ExpressionManager was initially designed for LimeSurvey 1.91+)
    • Has built-in test cases:
      • UnitTestProcessStringContainingExpressions() - validates that accurately evaluates multiple expressions within a string
      • UnitTestRelevance() - implements a 16+ question survey with cascading relevance, making questions and tailoring appaer and disappear based upon answers provided.
      • Lacks a full integration test case - for that, one needs to load and test the ExpressionManager-Demo survey.
  1. /application/views/admin/expressions/test.php
    • Provides access to each of the Test cases, whose views are in /application/views/admin/expressions/test/*.php
      • Available Functions - shows the 70+ functions people can use via EM and their allowable syntax
      • Tokenizer - calls UnitTestTokenizer()
      • Unit Tests- calls UnitTestEvaluator() to test core EM functionality.  Cells in Green are correct.  Cells in Red are errors (except for dynamic functions, like rand() and date().
      • String Splitter - calls UnitTestStringSplitter()
      • Integration Tests - calls UnitTestProcessStringContainingExpressions()
      • Unit Test Dynamic Relevance Processing - calls UnitTestRelevance()
      • Running Log - Source Data - shows a color-coded dump of the current instrument definition (e.g. $fieldmap[])
      • Running Log - Transactions on this Page - shows all of the EM-related translation requests on the current survey page (pretty-printed), and their results
  1. /scripts/admin/expressions/em_javascript.js
    • LEMval() - provides access to internal LimeSurvey variable values and attributes.  Allowable attributes are:
      • .shown - the answer as displayed to the user
      • .qid - the question ID
      • .mandatory - whether the question is mandatory
      • .question - the text of the question
      • .relevance - the relevance equation for the question
      • .relevanceStatus - whether or not the question is currently relevant
      • .type - the question type (the one character code)
      • .code (or no suffix) - the internal code value for the answer
      • .NAOK - the internal code value for the answer
    • Fix for Tab-based navigation
      • Purpose:  browsers do now properly handle tabs if form elements appear or disappear.  Without this fix, if a change to one question makes another question appear immediately after it, the built-in tab functionality will tab past those new questions to whatever question happened to be next before those new questions were inserted.  Users should expect that if new questions appear, that the browser will tab directly to them and not skip over them.
      • LEMsetTabIndexes()
        • sets tabindex for all potentially visible form elements
        • binds a keydown listener to each of them for managing TAB and SHIFT-TAB
          • calls ExprMgr_process_relevance_and_tailoring() to update question visibility and tailoring)
          • calls LEMmoveNextTabIndex() to moves to next relevant form element.
          • cancels the default processing of TAB or SHIFT-TAB
      • LEMmoveNextTabIndex()
        • For TAB, uses complex JQuery to get the tabindexes for the set of relevant and active form elements following the current element:
          • Find all questions that have tabindexes greater than the current tabindex (will include current question if there are other visible elements available within the question)
          • Fiters that set to only include relevant questions (the question's displaySGQA node has value="on")
          • Finds all relevant tabindexes within that set
          • Adds enabled button and submit buttons to that set
        • Iterates through that set to find the first relevant tabindex after the current tabindex
        • Cycles through the navigation buttons, and loops back to the top of the page if needed (skipping the browser search field)
        • For SHIFT-TAB, does the equivalent, but to support movement backwards.
    • JavaScript implementations of exposed EM functions, built to exactly mirror the server-side (PHP) functionality
      • LEM*() - functions built by LimeSurvey team - these are typically functions that are not natively supported within PHP
      • Javascript equivalent of existing PHP functions (so uses php names.  These are from phpjs.org.

How EM Works

What is an Expression?

Anything surrounded by curly braces is an Expression, with two exceptions:

  1. If there is whitespace after the opening brace or before the closing brace, it is ignored.
    • This is so that EM can ignore embedded JavaScript.
    • So, if you have JavaScript that might be parsed by EM, make sure to add a space or newline  after the opening brace.
  1. Escaped curly braces are ignored (e.g. \{ and \})

Note that EM does support Expressions within strings.  Moreover, Expressions can contain nested strings, but not nested expressions.  So, the following red sections are valid Expressions and will cause substitions to occur within the containing strings.

  • ~060~img src="images/mine_{Q1}.png"/~062~
  • ~060~img src="images/mine_{if(Q1=="Y",'yes','no')}.png"/~062~
  • ~060~img src="images/mine_{if(Q1=="Y",'single quote with {nested braces}',"double quote with {nested braces}")}.png"/~062~

What does EM do with text containing expressions?

  1. A regular expression divides the source line into STRING and EXPRESSION tokens
  2. Each EXPRESSION is parsed by ExpressionManager, a recursive descent parser.
    1. If there are syntax errors, EM returns an HTML string that syntax-highlights the equation and puts red-lined boxes around syntax errors
    2. If there are no syntax errors, EM returns the result of evaluating the expression
  3. EM re-joins the STRING and EM-evaluated EXPRESSION parts.
  4. EM optionally appends the translation activity to

How can we be sure that EM accurately parses the equations?

EM was originally written in 1999-2000 by Dr. Tom White (TMSWhite) for a different project (Dialogix) in Java, using JavaCC, an open source compiler compiler (parser generator).  That Java-based project has been in production for nearly a decade, and has been fully vetted for unit and integration tests.

Since there is no production-grade parser generator for PHP and JavaScript (although Antlr is coming close), TMSWhite created a custom recursive descent parser for LimeSurvey.  To ensure its accuracy, EM's logic is based upon the JavaCC source code for Dialogix.  The JavaCC syntax mirrors the functionality needed for a recursive descent parser.  JavaCC happens to build a state-based compiler, which is a little more efficient than a recusive descent parser.  However, state-based compilers are impossible to read, understand,  or expand without JavaCC-like source code, so it did not make sense to try to port the JavaCC output directly to PHP.

Futhermore, there are comprehensive unit and integration test suites for EM. These make it easy to validate the accuracy of the EM system.  Each test suite includes dozens to hundreds of test cases, and it is trivial to add addition test cases.

How does the Recursive Descent Parser work?

EM must do the following:

  1. Tokenize the expression - separating strings, words (variable names vs. functions), and punctuation; and categorizing the types of each.
  2. Analyze the tokens to build a parse tree, checking for syntax errors along the way.
  3. Return the result of evaluating the expression (using PHP) (or return syntax-highlighted HTML if there are syntax errors).
  4. Create a safe JavaScript equivalent of the expression so that expressions can be  dynamically re-computed client-side.
  5. Determine which variables are used in each expression (so can make sure they are available client-side).
  6. Optionally record debug and/or audit-log information that can be retrieved by CI views.

How does EM integrate into LimeSurvey?

The LimeExpressionManager (LEM) class manages the integration of EM into LimeSurvey.  LEM must:

  1. Initialize all of the variables needed by LimeSurvey (e.g. for TOKENS, INSERTANS, and templatereplace())
  2. Know which Group and Question are being processed
  3. Record the results and metadata about all of the text that LimeSurvey asks it to process
  4. Output static HTML that reflects the results of that processing
  5. Output JavaScript that lets those results be dynamically re-computed if values on the page change.

The main LEM functions for integration are (in order of use);

  1. StartProcessingPage($debug=true,$allOnOnePage=false)
      • resets needed arrays
  1. StartProcessingGroup($groupNum=NULL,$anonymized=false,$surveyid=NULL)
      • If $groupNum is not NULL, then initialize all of variables for that group and register them with EM
        • This is needed so that EM can detect when authors try to use variables before they are declared or answered
        • This is also used to detect which variables are set on the current page, as dynamic updates to those  variables need to be managed.
    1. setVariableAndTokenMappingsForExpressionManager($forceRefresh=false,$anonymized=false,$allOnOnePage=false,$surveyid=NULL)
      • This private function initializes all of the needed variables within EM
      • LEM provides access to the variables by the following aliases:
        • The variable name (called question.code, or question.title)
        • INSERTANS:SGQA
        • SGQA (without needing to prefix the SGQA name with 'INSERTANS')
        • the javascript variable name (e.g. javaSGQA, answerSGQA, etc. according to question type).
      • LEM creates two JavaScript arrays:
        • alias2varName[] - this maps the four alias types to the canonical variable name
        • varNameAttr[] - this provides access to several attributes for each variable.
      • Attributes available for each variable:
        • .shown - the answer as displayed to the user
        • .qid - the question ID
        • .mandatory - whether the question is mandatory
        • .question - the text of the question
        • .relevance - the relevance equation for the question
        • .relevanceStatus - whether or not the question is currently relevant
        • .type - the question type (the one character code)
        • .code (or no suffix) - the internal code value for the answer
        • .NAOK - the internal code value for the answer
    1. ProcessRelevance($eqn,$questionNum=NULL,$jsResultVar=NULL,$type=NULL,$hidden=0)
      1. Computes the Boolean value of $eqn (server-side, using PHP).
        • If true, the question will be displayed
        • If false, the question will be output within the HTML, but start as hidden
      1. Creates a safe, JavaScript equivalent of $eqn that can be dynamically re-computed client-side
      2. Records which variables were used, plus other metadata, to streamline creation of consolidated JavaScript per page.
    1. ProcessString($string, $questionNum=NULL, $replacementFields=array(), $debug=false, $numRecursionLevels=1, $whichPrettyPrintIteration=1)
      1. Processes the $string through EM, allowing for recursive substitition of variables
      2. Each replacement is done as a named ~060~span~062~ node so that the JavaScript functions can dynamically replace the contents of those spans when needed
      3. Generates the JavaScript functions needed to do the dynamic ~060~span~062~ replacements
        • If $questionNum is not NULL, then EM can nest these replacements (tailorings) with the relevance equation.  That way tailoring only occurs for relevant questions.
    1. GetLastPrettyPrintExpression()
      • This is mostly called by the admin interface (to syntax-highlight the relevance equation and text that contains tailoring)
  1. FinishProcessingGroup()
    • This collects all of the tailoring and relevance information (e.g. results and JavaScripts) for the group
    • If there are multiple groups on a page (e.g. All-in-one survey style), each group's results are appended to an array
  1. FinishProcessingPage()
    • This collects the all of the tailoring and relevance information and stores it in two $_SESSION variables so that it can be accessed by the test suite.
  1. GetRelevanceAndTailoringJavaScript()
    • This generates a single JavaScript block that performs all of the relevance and tailoring for the page
    • The function is  called ExprMgr_process_relevance_and_tailoring().
      • It needs to  be called any time a value changes, regardless of whether there are "conditions" attached to that variable.
      • So, it is called by both check_conditions() and noop_check_conditions()
      • It is also called by LEMmoveNextTabIndex() so that relevance-related changes are made before the TAB and SHIFT-TAB destinations are computed
    • It contains the following sections:
      • A processing block for each question that either contains tailoring or has a non-constitutively-true relevance equation.
        • If (relevance is true), then
          • process all tailoring for that question
          • show the question
          • set displayQ to 'on'
          • (if an equation) write the result of the computation to the appropriate javascript variable.
          • set relevanceQ to '1'
        • If (relevance is false), then
          • hide the question
          • set displayQ to
          • set relevanceQ to '0'
      • The alias and attribute mapping arrays
        • LEMalias2varName - maps each of the 4 alias types to the canonical variable name
        • LEMvarNameAttr - maps the canonical name to the set of avaiable attributes (see above)
    • This function also detects which off-page variables are needed for relevance and/or tailoring, and create hidden input nodes for them to store their values.
    • It also create the following relevance-related hidden input nodes for each question:
      • RelevanceQ - stores the current relevance status
      • RelevanceQcodes - stores a pipe-delmited list of javascript variables names that are affected by that relevance
        • This is needed to NULL out irrelevant answers within the database

Where does EM integrate with LimeSurvey?

EM integrates with LimeSurvey in several places:

  1. dataentry.php
    • All LEM functions from StartProcessingPage through FinishProcessingPage
  1. Dtexts.php
    • Just calls ProcessString() for that question
  1. frontend_helper.php
      • checkgropufordisplay() calls ProcessRelevance() for each group to determine whether there are any relevant questions within the group
  1. Group_format.php
    • Calls all LEM functions to determine relevance of each group and generate a working HTML page
  1. printanswers.php
    • Calls all LEM StartProcessing and FinishingProcessing functions so that the questions and answers are appropriately tailored in the printed summary.
  1. question.php
    • preview() function calls StartProcessing functions so that question preview is properly tailored
    • there are {INSERTANS} and related replacements within preview() that can probably be removed
  1. Question_format.php
    • Calls all LEM functions so that the question-by-question view works.
    • FIXME - if a question is irrelevant, it will still be displayed.  Need to find way to skip it.
    • FIXME - for hidden equations, need to ensure their value is stored to the database.
  1. questionbar_view.php
    • Calls GetLastPrettyPrintExpession() for question text and help text (so that both show syntax highlighting)
  1. questiongroupbar_view.php
    • Calls GetLastPrettyPrintExpression(0 so that group description is syntax highlighted if it contains expressions
  1. replacements_helper.php
    • templatereplace() calls ProcessString() with depth=2 recursive substitution
      • It should be possible to cache computations from this function to improve performance
    • insertAnsReplace($line) returns $line - since EM will replace its values later
    • tokenReplace($line) returns $line for same reason
    • ReplaceFields() and PassthruReplace() still allow direct replacements, bypassing EM
    • It should be possible to reduce the total number of calls to EM to improve performance. See discussion here.
  1. Survey_Common_Controller.php
    • _questionbar() calls GetLastPrettyPrintExpression() to syntax-highlight the relevance equation
    • _questiongroupbar() calls StartProcessingGroup() to ensure that question display within admin mode properly syntax-highlights questions.
  1. Survey_format.php
    • Calls all LEM functions so can render all questions and groups on a single page
  1. surveysummary.php
    • Calls GetLastPrettyPrintExpression() for survey Description and Welcome message so they can be syntax-highlighted if needed.

Extending EM

Adding Test Cases

Please do!  The testing frameworks are solid. You just need to add more tests following the examples in the code.

Adding Functions

Functions are stored in LimeExpressionManager::amValidFunctions[]. Some existing examples are:

The syntax for each function (func) is:

  • 'func' => array(details)

The details which must be included in the array are:

  1. PHP function name - this is the PHP function that will be called for that func.
  2. JavaScript function name - this is the JavaScript function that will be called for that function
  3. Meaning - this is a short description of what the function does
  4. Syntax - this shows the valid syntax for the function
  5. Reference - this is an optional URL showing more details about the syntax (e.g. a link to the PHP documentation)
  6. Number of required arguments
    • Multiple values are allowed at the end of the array.  For example, substr() above can take 2 or 3 arguments
    • Negative values mean that the function accepts a variable number of arguments
    • Negative values less than -1 mean that the function requires at least abs(N)-1 arguments (so -2 means it requires at least 1 argument)

If you add a function that does not exist in PHP, then add it to the body of em_core_helper.php

  • FIXME:  Eventually we should separate out such add-on functions into their own php file

If you add a function that does not exist in JavaScript, then add it to the body of em_javascript.js

Make sure to add Unit tests for new functions to UnitTestEvaluator

  • Syntax is ExpectedResult~Expression, such as:
  • 212~5 + max(1,(2+3),(4 + (5 + 6)),[[]],[[7 + 8) + 9),( (10 + 11), 12),(13 + (14 * 15) - 16) )

If your test case should return an error, use NULL as the expected value, like this:

  • NULL~four * / seven
  • NULL~(5 + 7) = 8