Actions

Code quality guide: Difference between revisions

From LimeSurvey Manual

→‎Concatenation: Added one more example of bad concatenation using placeholders
 
(653 intermediate revisions by 3 users not shown)
Line 3: Line 3:
== Prologue ==
== Prologue ==


# '''Be risk aware'''. Too perfect code can be a business risk (slow to write, maybe over-designed). Too sloppy code can also be a business risk (hard to maintain and understand). You have to find a balance that is adapted to the current situation and risk analysis, in which code quality becomes a '''risk mitigation''' technique.
# '''Be risk aware'''. Too good code can be a business risk (slow to write, maybe over-designed, [https://en.wikipedia.org/wiki/Gold_plating_(project_management) "gold plating"]). Too sloppy code can also be a business risk (hard to maintain and understand). You have to find a balance that is adapted to the current situation and risk analysis, in which code quality becomes a '''risk mitigation''' technique among others.
# '''Be humble'''. LimeSurvey was made by developers from all over the world, with different age, education and experience. Your code might be read by a completely different team ten years from now, in the same way you are now reading code by developers that no longer work with us.
# '''Be humble'''. LimeSurvey was made by developers from all over the world, with different age, education and experience. Your code might be read by a completely different team ten years from now, in the same way you are now reading code by developers that no longer work with us, but which work pays our rent.
# Design to '''isolate change''' and to '''prepare for change''' by making parts composable and interchangeable.
# '''Performance matters sometimes''', and shouldn't be disregarded completely. In particular, database queries using the ORM and ActiveRecord can be problematic. Some surveys have thousands of questions and hundreds of thousands of responses. Fast response time is also important for a fluid user experience.
# '''Performance matters sometimes''', and shouldn't be disregarded completely. In particular, database queries using the ORM and ActiveRecord can be problematic. Some surveys have thousands of questions and hundreds of thousands of responses. Fast response time is also important for a fluid user experience.
# '''It's harder to read code than to write code'''. Don't choose patterns that are easy or fast to write, but that are easy to read.
# '''It's harder to read code than to write code'''. Don't choose patterns that are easy or fast to write, but that are easy to read.
# Make your code '''communicate intent'''.
# It's easy to forget '''cross-cutting concerns''' like translation and security. Keep a mental note.
# It's easy to forget '''cross-cutting concerns''' like translation and security. Keep a mental note.
# '''Stress affects code quality'''. If your stress level is too high to write code with proper quality, take a step back and discuss it with your boss.
# '''Stress affects code quality''' and your risk behaviour. If your stress level is too high to write code with proper quality, take a step back and discuss it with your boss, or you'll push problems to the future.


== Quality ==
== Quality ==


What is quality? Which aspects of quality can be measured?
The purpose of this guide is to increase the quality of the LimeSurvey code-base. What is quality? Which aspects of quality can be measured? Which kind of company culture gives rise to appropriate code quality?


It's usually easier to get along what is "bad" code than what's "best" code. Blacklist instead of whitelist?
It's usually easier to get along what is "bad" code than what's "best" code. Blacklist instead of whitelist?
Line 23: Line 25:
* Performance
* Performance
* Security
* Security
* Reliability
* Composability<ref>Len Bass et al, Software Architecture in Practice</ref>


It's better to avoid emotional language when describing code quality, like "clean" or "dirty". Be precise.
Can not increase all at the same time - performance vs readability.
 
It's better to avoid emotional language when describing code quality, like "clean" or "dirty". Be precise, objective and suggest solutions<ref>Jonathan Boccara, The Legacy Code Programmer's Toolbox</ref>.


Idiomatic code is more readable than non-idiomatic code. What's idiomatic depends on which context or domain you work in. We work in PHP and web application development, and have other idioms than in, say, functional or hardware-close programming.
Idiomatic code is more readable than non-idiomatic code. What's idiomatic depends on which context or domain you work in. We work in PHP and web application development, and have other idioms than in, say, functional or hardware-close programming.


Some code are "hot spots" - changed often. Those parts should require higher quality than other code.
Some code are "hot spots" - changed often. Those parts should require higher quality than other code. In principle, code quality doesn't matter (much) for code that never changes.
 
Don't fall victim to the [https://en.wikipedia.org/wiki/Broken_windows_theory broken-window theory] when working on old parts of the code, but instead always leave code better than you found it ("the boy-scout rule")<ref>Mathew Skelton, Manual Pais; Team Typologies</ref>.
 
<references/>


== Risk ==
== Risk ==


todo
=== Why a risk approach? ===
 
Risk can be used together with an agile development style, where the risk assessments decide the amount of upfront design that's needed. A waterfall process might use more design than is called for; on the other hand, an agile approach can get too code-driven and in the end lack architectural integrity. Risk can be used to find a balance in an agile workflow<ref>George Fairbanks, "Just enough software architecture"</ref>.
 
A risk approach can be another "tool in the toolbox" during a sprint iteration, together with use-cases, scenarios, [https://en.wikipedia.org/wiki/MoSCoW_method MoSCoW] specifications.
 
Risk can also be part of an architecture decision pipeline. More about that below.
 
<references />
 
=== Definition of risk ===
 
Risk can be defined as a tuple of '''probability''' and '''severity'''. Probability can be expressed as a number, or more informally as high, medium or low. Severity can be expressed in terms of business value or technical problems<ref>Terje Aven, "The Science of Risk Analysis"</ref>.
 
Another definition is a triple of consequence, uncertainty and '''knowledge base''', (C, U, K). The knowledge base is important as evidence for the believes expressed in C and U, and to be able to compare different risks. Formally it can be expressed as <code>P(A|K)</code>, the probability P of event A given the knowledge K.
 
One example is to express risk in a '''risk matrix''':
 
{|
| || Severity high || Medium || Low
|-
| Probability high || || ||
|-
| Medium || || ||
|-
| Low || X || ||
|}
 
Above, an event is seen as having low probability but high severity.
 
Another way to model risk is using a '''fault tree''' or event tree.
 
[[File: eventtree1.png]]
 
The event tree highlights a '''risk source''' (basically any change to the code-base, but some changes have lower risk than others), and the '''value''' we want to protect, in this case data integrity.
 
A more informal approach is to just list the conceived risks in descending order:
 
* Risk A (most important)
* Risk B
* Risk C
 
And then the mitigations for those risks:
 
* Mitigation 1 for A
* ...
 
Or together:
 
* Risk A
** Mitigation 1 for A
** ...
* Risk B
* Risk C
 
==== Risk attributes ====
 
A risk can be said to have three attributes:
 
* Complexity
* Uncertainty
* Ambiguity
 
Complexity decides how easy it is to express a risk, from risk source to consequence. Examples of simple risks are smoking (lots of data, clear correlation). Examples of complex risks are terrorism or nuclear plan accident (lots of moving parts, hard to gain knowledge, lots of unknowns).
 
Uncertainty decides how easy it is to estimate the probability of the risk. High uncertainty makes it likely that the estimate is wrong, and vice versa.
 
Ambiguity means it's hard to express if the risk really is a risk. It might be hard to communicate, describe and reach consensus of.
 
Also risk traceability.
 
==== Examples ====
 
Examples of risk in the LimeSurvey project:
 
* Data loss
* Regressions
* Readability and maintainability
* Bad UX
 
Some of the risks will be acceptable, some unacceptable, depending on time and context.
 
There are also risks related to the project management:
 
* Conflicts in the team
* Churn rate in the company
* Missing or unstable requirements
 
There are different risks present when adding new features (lack of testing, wrong requirements, scope creep) compared to when refactoring (regressions). Even bug fixing itself can be risky if it's not reviewed and tested properly.
 
In a '''[https://en.wikipedia.org/wiki/Spiral_model spiral model]''', you start with the high-risk items rather than the items with highest business or customer value. The purpose is to apply '''mitigation''' techniques until the risk reaches an acceptable level.
 
Unit tests and manual QA are risk mitigation techniques. Note that tests can't prove the absence of bugs, since there still can be logical or conceptual errors in code and in specification. ''Too much'' testing can be a risk if it drains too much developer time related to what customers expect. It's always about finding a balance.
 
<references />
 
=== Risk and architecture ===
 
Risk has a tight connection with quality attributes (see section above); a given risk will result in applying a certain set of quality attributes to mitigate that risk. The wanted quality attributes also decides the direction of the architecture.
 
In the future, we will try to apply an '''architecture decision pipeline''' like this:
 
Risk --> quality attributes --> architectural decisions
 
For example, if regressions are considered the highest risk, based on some evidence, then testability is the quality attribute we want to improve. This is done by a number of techniques, e.g. dependency injection, lifting out side-effects, no static methods, keeping functions small and composable.
 
=== Risk culture ===
 
DRAFT
 
High reliability organization, see signals/warnings early, whistle blower culture
 
Hindsight bias, how to relate to mistakes
 
Resilience instead of risk.
 
=== Decision making theory ===
 
To make '''rational''' decisions, you need at least:
 
# A clear '''problem''' formulation
# '''Searching''' for different solutions (consult, brainstorming, etc)
# A way to '''compare''' alternatives meaningfully
# '''Implementation'''; and
# '''Evaluation''' of the implementation and decision
 
=== RAID log ===
 
A RAID log is a log of:
 
* Risks
* Assumptions
* Issues or impediments
* Dependencies
 
It can be used to track both risks and assumptions during project development.
 
Assumptions can be listed together with confidence, how to validate it, and how to act in case the assumption turns out to be wrong.
 
== Company culture ==
 
Company culture can be defined as the "beliefs and behaviours" of the company together with company values<ref name="future">Future-Proof Software-Systems, Frank J. Furrer</ref>
 
A lot of things in the company policy and culture affects code quality. One important point to improve code quality over time is an established ''trust'' between the business part and the IT part of the company<ref name="future"/>.
 
Other issues mentioned by Lavallee et al<ref name="lavallee">Why Good Developers Write Bad Code: An Observational Case Study of the Impacts of Organizational Factors on Software Quality, Lavallee, Robillard</ref> is:
 
* Documentation, lack of enforcement on company level
* Budget constraints, the team feels forced to apply quick fixes instead of proper long-term design
* Human resource planning, where knowledge can disappear when short-term consultants quit
* "Under pressure", when managers explicitly apply pressure in wording, assuming the developers are not working their hardest
 
todo, learning culture
 
todo, psychological safety
 
=== Company structure ===
 
DRAFT
 
Code quality and company culture? Hard to find.
 
Code quality and company structure:
 
* Number of employees<ref>The influence of organizational structure on software quality: An empirical case study, Nagappan et al</ref>
* Project and intellectual ownership
* Geographical distribution
* Volatility<ref>Organizational volatility and its effects on software defects, Mockus</ref>
* Workload per developer
* Budget constraints
* Office politics<ref>Why good developers write bad code: an observation case study of the impacts of organizational factors on software quality, Lavellee et al</ref>
 
Challenge: You have to somehow factor out the affect of the code quality itself. Trust the statistical method of the researches.
 
Fault density = Number of reported bugs per LOC (line of code).
Software artifact = Function, class or module.
 
Correlation between team structure and fault density. Correlation is not causality. Again trusting the researcher to cover internal threat to validity properly.
 
Found correlation:
 
* Intellectual ownership (compare with CPDB in LimeSurvey), conflict with bus factor and Scrum (partly)
* Team volatility, someone leaves
* Multiple people touching an artifact at once (split up artifacts in smaller parts; limit class inheritance chain)
 
One paper claims that fault density is 50% due to company structure, 50% due to code quality.
 
=== Change coupling and software architecture ===
 
Logic coupling
High change coupling = features are '''scattered''' in the architecture.
Change request (or modification request), like bug fix, new feature, refactor
The perfect architecture: 1-to-1 relation between a change request and affected software artefact
Impossible due to cross-cutting concerns, like logging, authentication, PHP 8 compatibility
Fault prediction, use as empirical evidence for best practice
 
Potential fallacy: Give to much weight to code metrics that are
# Easy to measure
# Easy to change
While their correlated fault density might not be that high.
Ambition: Measure which software artifacts are affected by bug reports the most.
 
Example: Expression manager has extremely high cyclomatic complexity, but very few bug reports are related to it, because it's never changed, and when changed, only changed by people who know it (Denis...).
 
'''References'''
<references/>
 
== Class cohesion ==
 
todo, code metrics in general
 
[[File:cohesion.png]]
 
[https://www.aivosto.com/project/help/pm-oo-cohesion.html Source]
 
== Coupling ==
 
[[File:coupling.png]]
 
[https://en.wikipedia.org/wiki/Coupling_(computer_programming) Source]
 
Clearer boundaries between classes and modules make it easier to isolate change.
 
[https://www.javatpoint.com/software-engineering-coupling-and-cohesion More info]
 
todo, function coupling? degrees of separation
 
https://www.educative.io/edpresso/what-are-the-different-types-of-coupling
 
== Code duplication ==
 
Code duplication is when a piece of code is copied from one place to another, and possible slightly changed in regards of white-space, variable naming, constants, etc. It's negative for the code health, since it makes it harder to fix bugs - you will have to change each duplicate. Commits get bigger, it increases the chances of mistakes and makes testing harder.
 
Some duplication is OK, some is less OK. The important thing to remember when ''removing'' a duplicate is: Will the clones always ''change together'', or will they ''diverge''? Sometimes it's obvious, sometimes less obvious. No tool can do that job for you, but the longer the duplicate, the stronger arguments you need to keep it instead of refactor it. It's a function of probability of divergence and clone length.
 
=== Definitions ===
 
There are four main types of code clones defined by researchers:
 
* Type 1 (exact, except for whitespace and comments)
* Type 2 (exact, except for renaming of identifiers)
* Type 3 (might include removed or added lines)
* Type 4 (different but semantically same)<ref>Liuqing Li et al, CCLearner: A Deep Learning-Based CloneDetection Approach</ref><ref>Andrew Walker et al, Open-Source Tools and Benchmarks for Code-CloneDetection: Past, Present, and Future Trends</ref>
 
The '''edit distance''' can be seen as the numbers of changes needed between two clones to make them identical.
 
A '''false positive''' is a clone detected by an algorithm, which is in fact not a clone. ''False negative'' is the opposite, where a clone is present in the code base, but the algorithm is not able to detect it. Both should be decreased as much as possible.
 
Some reports paint a picture that code clones are not harmful for changeability, or can not be proven to be harmful <ref name="elmar">Elmar Juergens et al, Do Code Clones Matter?</ref>.
 
An overview of available algorithms:
 
[[File:codeclonealgos.png|Code clone detection algorithms]]
 
* Textual, based on the code as-is
* Tokens, code is parsed into token (in PHP, like T_ECHO or T_OPEN_TAG)
* Syntactic, using abstract syntax-tree or similar data structure
* Semantic, program dependency graph
* Learning, machine learning and artificial intelligence
 
Overview for token-based algorithm:
 
[[File:codeclonealgo.png|Code clone algo]]
 
Sets of clone groups:
 
[[File:codeclonesets.png|Clone group sets]]
 
Common bugs in duplicated code:
 
* Missing null check (in PHP, maybe isset() or similar)
* Index out of bounds
* Faulty transaction handling
* Inconsistent default values
* Inconsistent validation behaviour<ref name="elmar" />
 
<references />
 
=== Examples of code duplication in LimeSurvey ===
 
<syntaxhighlight lang="php">
$this
    ->connection
    ->createCommand("SQL COMMAND")
    ->execute();
$this
    ->connection
    ->createCommand("OTHER SQL COMMAND")
    ->execute();
</syntaxhighlight>
 
This can be factored out to:
 
<syntaxhighlight lang="php">
public function executeCommand($command)
{
    $this
        ->connection
        ->createCommand($command)
        ->execute();
}
</syntaxhighlight>
 
Again, the assumption to make this change is that no single command will need any other function call besides connect(), createCommand() and execute(). If they do, you'd again have to de-refactor the code (or worse, add boolean arguments, a common short-cut which does not scale well).
 
<syntaxhighlight lang="php">
public function tableName()
{
    return '{{users}}';
}
</syntaxhighlight>
 
This snippet is present in all ActiveRecord classes, the only difference is the string constant, which will be ignored by detection tools. It's part of the ActiveRecord pattern as implemented in Yii 1, and can't reasonably be removed. Further more, this snippet will never change together with another class.
 
<syntaxhighlight lang="php">
if (\PHP_VERSION_ID < 80000) {
    libxml_disable_entity_loader(false);
}
$config = simplexml_load_file($xmlPath);
$attributes = json_decode(json_encode((array) $config), true);
if (\PHP_VERSION_ID < 80000) {
    libxml_disable_entity_loader(true);
}
</syntaxhighlight>
 
This is a snippet with a bit of history. First, it was only simplexml_load_file(). Later, there was a security issue about it, and we needed to add libxml_disable_entity_loader(). Then, for PHP 8, libxml_disable_entity_loader was deprecated, and it needed to be wrapped inside a version check. Now, this is definitely a snippet that ''always'' changes together with all other instances, and must be refactored to a function. It can be hard to detect for the tools, since libxml_disable_entity_loader can happen lines apart from simplexml_load_file.
 
Other snippets look more like data than logic:
 
<syntaxhighlight lang="php">
array(
    'header' => gT('First name') . $this->setEncryptedAttributeLabel(self::$sid, 'Token', 'firstname'),
    'name' => 'firstname',
    'value' => '$data->firstname',
    'headerHtmlOptions' => array('class' => 'hidden-xs'),
    'htmlOptions' => array('class' => 'hidden-xs name'),
),
array(
    'header' => gT('Last name') . $this->setEncryptedAttributeLabel(self::$sid, 'Token', 'lastname'),
    'name' => 'lastname',
    'value' => '$data->lastname',
    'headerHtmlOptions' => array('class' => 'hidden-xs'),
    'htmlOptions' => array('class' => 'hidden-xs name'),
),
</syntaxhighlight>
 
It could be factored out to a GridColumn class to remove some of the duplication, but both the probability and severity of bug here would probably be low. It's hard to tune an algorithm to ignore cases like these. Easiest might be comments to tell the tool to ignore it.
 
=== Tools ===
 
There are multiple algorithms for code clone detection, as seen above. Not all are available as tools, and not all tools are open-source or still maintained. The most common seems to be [https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm Rian-Karp], which is based on text analysis together with tokenization and hashing.
 
Command-line tools:
 
* [https://github.com/sebastianbergmann/phpcpd phpcpd] (PHP copy-paste detector)
* [https://github.com/kucherenko/jscpd jscpd] (JS copy-paste detector, but can analyze multiple formats including PHP)
 
Full-system tools:
 
* Scrutinizer
* SonarQube
 
Full-system tools can include code clones in a general "score", as in Scrutinizer A-F grading system, together with other metrics.
 
There are two cases for applying a clone detection tool:
 
* Single file analysis (assuming one class = one file)
* Whole-project analysis
 
For single file analysis, you tool can be made more sensitive, because you can ''assume'' that code in a single class or file will change together. Whole-project analysis cannot make such an assumption, and is thus tuned to be more lenient. The purpose of this difference in configuration is to reduce both false positives and false negatives without annoying the developers.
 
It might be hard to include code clone detection in Travis/CI if there are false positives. Another approach might be seeing it as red flags which must be investigated. But by whom?
 
== Empirical evidence of code quality guidelines ==
 
TODO links from stackexchange
 
== Modernizing legacy code ==
 
TODO
 
Based on book
 
== Modules ==
 
There's no inherit module functionality in PHP, but Yii has a module system. Libraries can be seen as a module system.
 
Examples of implicit modules in LimeSurvey:
 
* Front-end (survey taking)
* Back-end (survey design)
* Back-end (non-survey specific pages, like label sets, CPDB)
* Survey theme editing
* Expression manager
* Plugin system
 
In the future, they need to be separated and implemented more clearly as whatever Yii 3 module system is in use.
 
Some data models are common for all modules, e.g. surveys and users. Can have a "Common" module for such data.
 
todo, relation to Yii config system
 
todo, yii 3 program structure, by domain instead of class type


== Classes ==
== Classes ==
* Avoid empty OOP ceremony, like defining getters and setters for all properties. Might as well make them public, ''unless'' they have invariants that need to be upheld, e.g. "this number is always between 1 and 10" or "this string is maximum length 20".
* Almost always avoid static methods. They cannot be mocked. One exception is (possibly) factory methods (Foo::make(...)).
* Inject all dependencies. This will make sure the class can be unit-tested (unit-tests have several benefits over integrity tests - they can be run in parallel and don't require any setup and teardown)
* A second way to achieve testability is to factor out the methods that access the dependencies
* Should methods that do not access any state be put in separate functions?
todo, state vs behaviour, state vs dependencies
=== OOP design ===
Start from top-down, with domain, behaviour, interaction between objects, database design, pseudo-code if needed. Can also include scenarios, use-cases and UML diagrams.
Object reification - not only domain models, but also domain actions, relations and activities can be objects, e.g. a class ActivateSurvey or ExportSurvey.
Hierarchy depth should be kept reasonable shallow to avoid conflicts between methods, spooky [https://en.wikipedia.org/wiki/Action_at_a_distance_(computer_programming) action at a distance]. Compare with Yii 3, where controllers no longer inherit anything.
Some people advocate for composition instead of inheritance. Lots of blog posts available for more info.
Multiple inheritance is done with traits in PHP. Use-case: ActiveRecords where some but not all models support encryption. Other example, injection with traits that add property and setter/getter.
=== Interfaces ===
TODO by Borys
=== State ===
Heuristics to consider but not follow slavishly.
Keep a clear separation between classes with state and classes with no state.
Classes with state should not have dependencies with IO (it's OK to depend on classes in the same application layer, e.g. other DTO classes), and vice versa. ActiveRecord is an example which violates this principle. Repository pattern separates it.
    class Survey
      // Confusion - first params are domain state, last a resource
      __construct(id, language, dbConnection)
If a method does not call "this", should it instead be outside the class? A separate class or function, maybe.
todo, elaborate
=== Types of classes ===
Different categories of classes:
* Entity (domain class; has state and validation behaviour)
* Resource (database connection, file, curl, logger; state is in the resource (open, closed))
* Repository classes (database interaction, like splitting ActiveRecord in two parts: Entity class and database interaction)
* DAO, data access object (similar to resource)
* Builders (e.g. query builder, HTML message builder)
* Data-transfer object (no behaviour; pass to views to have better documentation and intent than associative arrays)
* Value classes (e.g. Email address to wrap email string around)
* Command object (e.g. SurveyActivator)
* Helper (no state; e.g. QuestionAttributeHelper to bake attributes into different formats used by the views)
* Wrapper
todo, immutability, possible with private properties that are set in the constructor,
todo: command-query separation for classes?
todo: dependencies between layers, dependencies to the framework
todo, pure vs effectful classes, where effect is anything that reads or writes to stdin/stdout/file/database etc.
=== Types of classes in LimeSurvey ===
* ActiveRecord, used to communicate with the database, and should not do ''anything'' else
* Controllers, used for glue code; should not contain business-logic (also CLI controllers)
* Helper functions (not classes, yet...), should be short and obey command-query separation (TODO: Long functions should be separated into command objects or service classes); todo: should not have any dependencies?
* Helper classes, similar to helper functions but have dependencies that need to be mocked during tests; should not have any state
* Service classes, contains logic related to a clear and separate task, like create or import survey
* Question render classes, contains logic related to how questions are rendered (TODO: Need better separation between state and logic and more clear design); similar to widgets
* Plugin classes, using the plugin event system
* Widgets, containing HTML logic for custom HTML widgets
* HTML helpers (Yii)
* Customizations to the Yii framework by inheritance
* Unit and functional test classes
A couple of classes contain only data, like LsDefaultDataSet.
todo, no difference between state property and dependency property in OOP
=== What is "glue code"? ===


todo
todo
Putting the pieces together, e.g. service class A need data from model B, fetch it and run.
todo, add example
=== "Where do I cut?" ===
Separation of concerns, but which concerns and which type of separation? Command-query (function core, imperative shell) is one type of separation. Other separation between layers, e.g. logic layer and data persistency layer (similar in thought).
Other separation, that one class should have a clear purpose. A design pattern should only implement ''one'' pattern, not multiple, and not half a pattern.
There's a problem with "helper" or "manager" classes that they can grow indefinitely, but might be better than nothing. Example when moving business logic out of ActiveRecord class to a helper class it can sometimes hard to be more specific than "helper".
Decisions like these should be related to risk analysis from the top of the document.
=== Size ===
A class should have a single and clear purpose; compare with [https://en.wikipedia.org/wiki/SOLID SOLID].
=== Class name ===
In general, whenever possible, follow PHP, Yii (Yii 3 preferably) and PSR naming schemes.
* Suffixes: Exception, Trait, Controller, Command.
* Prefixes: Abstract.
* ActiveRecord classes are named after the database table they are bound to, but with singular instead of plural. E.g. table "lime_users" has model "User".
* Command classes are named after what they ''do'' and imperatively, e.g. CreateSurveyCommand
* Helper classes should probably be named as specific as possible, to avoid them from growing too big
* Class properties of ActiveRecord classes will follow the naming of the database, using underscore instead of camel case (and ActiveRecord classes should have ''no'' other properties than those (excluding the database connection injected by the framework))


== Functions ==
== Functions ==
Line 46: Line 570:
Different types of functions in PHP:
Different types of functions in PHP:


* Functions
* Functions (not attached to a class)
* Methods
* Methods (attached to class)
* Anonymous functions
* Static methods (attached to class, but no $this)
* Short functions
* Anonymous functions (does not capture scope automatically)
* Short functions (does capture scope automatically)


<hr>
I'm using "function" as a word for all types of functions in PHP below.
 
=== Size ===


'''Guidelines:'''
* A function should be short enough to be completely readable on one screen, approx 60 lines at most
* A function should not have too many arguments; if you have five or more, consider making it a class instead
* If a function is growing too big, it might be better to create a class instead or split it into several functions (if both functions have '''common state''', a class might be better, unless you want to pass around '''explicit state''')


NB: I'm using "function" as a word for all types of functions in PHP.
=== Contract ===


* A function should be short enough to be completely readable on one screen
* A function should not have too many arguments; if you have five or more, consider make it a class instead
* A function has a contract with its surrounding environment:
* A function has a contract with its surrounding environment:
** Pre-condition: What needs to be true before the function is executed
** Pre-condition: What needs to be true before the function is executed
** Post-condition: What will be true after the function has returned
** Post-condition: What will be true after the function has returned
** The first line in the docblock can be one sentence describing the relation between function arguments and its output<br>Example<br><code>function foo() {}</code>
* The function contract can be checked with assertions (internal invariants) or exceptions (outside world); see below
* The function name should tell you '''what''' it does; docs should tell you '''why'''. '''How''' is described by the code itself (hopefully).
* Another good way is to use unit-tests to make sure the function's contract remains stable through changes
* Function name can often be verb + noun, like  
* The first line in the docblock should be one sentence describing the relation between function arguments and its output<br>Example<br><syntaxhighlight lang="php">
* If a function is growing too big, it might be better to create a class instead
/**
* Creates a random password of length $length (defaults to 12)
* @param int $length
* @return string
*/
function createPassword($lenght = 12)
 
/**
* Creates an HTML language drop down for survey with id $surveyId, marking $selectedLanguage as selected
* @param int $surveyId
* @param string $selectedLanguage
* @return string <select> HTML
*/
function createLanguageDropdown($surveyId, $selectedLanguage)
 
</syntaxhighlight>
 
=== Lift side-effects up in the stack trace ===
 
This one is a bit abstract but very useful to increase testability.
 
First, what is a '''side-effect'''? In short, it's anything that happens in a function that ''is not'' described by the ''relation'' between function arguments and output.
 
Another way to phrase it is, that a side-effect is anything that is not always the same for the given input to a function; a function without any side-effects will always return the same result if it gets the same input.
 
'''Examples:'''
 
<syntaxhighlight lang="php">
// gT accesses the file system and will return different result depending on what's in the translation files
function gT($sToTranslate, $sEscapeMode = 'html', $sLanguage = null);
 
// If $sText and $sEscapeMode is always the same, quoteText() will always return the same string.
function quoteText($sText, $sEscapeMode = 'html');
</syntaxhighlight>
 
A function with no side-effects is called '''pure'''. A function that is guaranteed to always return (no exceptions, no errors, no warnings, no division by zero) is called '''total'''.
 
It's a good thing if functions in a code-base ''either'' have side-effects, ''or'' return a result. Not both. This is called [https://en.wikipedia.org/wiki/Command%E2%80%93query_separation command-query separation].
 
Examples of side-effects:
 
* Database or file access
* Echo, or write to stdout or stderr
* Randomization
 
Why does this matter? Because side-effects make testing harder! '''If you have side-effects, you cannot create unit tests''', ''unless'' those side-effects can be mocked. If not, you need integrity tests with database setup or more. The more side-effects you have in a function, the more mocking is required to unit test it.
 
So, one solution to this is to move side-effects higher up in the stack trace, that is, move them to the calling code.
 
'''Example:'''
 
<syntaxhighlight lang="php">
function getSurveyInfo(int $surveyid, string $languagecode = '', boolean $force = false): array;
</syntaxhighlight>
 
Since this function takes $surveyid, we know it will probably query the database. Quering the database is a side-effect, and thus it requires a database scaffold for the test (or mocking, but the database connection is not injected, so there's no way to do it). One simple solution is to lift the side-effect out and pass in a Survey object instead:
 
<syntaxhighlight lang="php">
function getSurveyInfo(Survey $survey, string $languagecode = '', boolean $force = false): array;
</syntaxhighlight>
 
Much better! The side-effect of calling the database now happens ''before'' the function call. Now we can either mock a Survey object, or make an imitating stub for our test.
 
Reading the function body, there is in fact another database access, but since this is a relation to the Survey model, it could easily be moved out too. We've just made a function testable! Hooray!
 
This principle is rephrased in an architectural settings as [https://github.com/kbilsted/Functional-core-imperative-shell/blob/master/README.md functional core, imperative shell]. This architecture helps you get a good ratio between fast unit tests and slower integrity tests.
 
Moving side-effects up makes sure we can have a healthy ratio between integrity tests with scripted browser, and unit-tests; maybe around 20-80.
 
=== Function name ===
 
* The function name should tell you '''what''' it does; docs should tell you '''why''' (if this information is not obvious). '''How''' is described by the code itself (hopefully, but can in some cases be relevant to document).
* Function name can often be '''verb [+ adjective] + noun''', like "createFieldmap", "quoteText", "getRelativePath"
* Prefix with "is" for boolean functions
 
todo: difference between create, get, fetch, generate, ...
 
'''Good examples:'''
 
* createPassword() - One verb, one noun - clear
* isCaptchaEnabled() - Using "is" signifies it returns boolean; clear purpose
* getFullResponseTable() - Verb, adjective, noun - but is "full" needed here? Is there a non-full version of the same function?
* replaceExpressionCodes() - Pretty good, but it's obvious that more information is needed in the docblock to describe what "replace" means, and what an "expression code" is
* fixCKeditorText() - "fix" is a bit vague, but maybe best there is? Pray that docblock contains information!
 
'''Bad examples:'''
 
* languageDropdown() - No verb, doesn't tell you what it does
* rmdirr() - Two r's, looks like a typo, but the last "r" actually means "recursive"! Better: rmdir($recursive = false), rmdir or removeFolder.
* CSVUnquote() - Is CSV really the namespace? Why is the verb the second word?
* incompleteAnsFilterState() - One verb, but in the middle, or is "filter state" a noun?; why abbreviate "answer" to "ans"?
* reverseTranslateFieldNames() - Two verbs - did author mean "reverseTranslatedFieldNames"? Or "reverse and translate"? If so, it should be two functions, one for reverse and one for translate
* checkUploadedFileSizeAndRenderJson() - If you have an "and" in the function name, you should consider splitting it into two functions
 
'''Controversial'''
 
* gT() - An exception to descriptive function names; since this is used everywhere, the minimal name is motivated; maybe even name it "t()"? Still hard to read for someone who just started? But Yii uses t() too? Other frameworks use double underscore (Wordpress, Laravel) as function name.
 
=== Function filename and namespace ===
 
If your function has dependencies or side-effects, it should be put into a class instead, in which the dependencies can be injected. Another alternative is to put the dependencies as argument in the function, although this muffles the relationship between input and output. Again, this is to increase testability.
 
Keep in mind that functions outside classes cannot be mocked, so only use them when you ''know'' you won't need to mock them.
 
todo, autoloading of functions and folder location
 
=== Assertions ===
 
NB: This is not assertions as used in unit-tests, but assertions as used to check invariants in production code.
 
Assertions can be used for mental checks. They are only used for internal logic - failed assertions are not supposed to be seen by the end user. For interaction with the outside world, like database or browser, exceptions should be used instead.
 
Assertions can be disabled in production mode and have then no runtime cost (since PHP 7).
 
'''Example:'''
 
<syntaxhighlight lang="php">
/**
* Translation helper function
*
* @param string $sToTranslate
* @param string $sEscapeMode Valid values are html (this is the default, js and unescaped)
* @param string $sLanguage
* @return string
*/
function gT($sToTranslate, $sEscapeMode = 'html', $sLanguage = null)
{
    assert(is_string($sToTranslate));
    assert(strlen($sToTranslate) > 0);
    assert($sEscapeMode === 'html' || $sEscapeMode === 'js' || $sEscapeMode === 'unescaped');
    return quoteText(Yii::t('', $sToTranslate, array(), null, $sLanguage), $sEscapeMode);
}
</syntaxhighlight>


<hr>
<hr>


'''Assertions'''
=== Exceptions ===
 
* Exceptions are used to check inputs from the outside world (browser, database, file system)
* Don't use exceptions for control flow, only when fatal errors happen that make further execution impossible
* Exceptions should have clear error messages that make them ''actionable'' - there should be a clear "next step" for the user or developer
* Never, ever, do an empty catch-block
* At the moment, we do not have an exception hierarchy in LimeSurvey
 
Yii uses exception handler class. We use it for CHttpException, when denying permission and other cases.
 
==== Alternatives to exceptions ====
 
Some people consider exceptions being a hidden form of "goto" which confuses the control flow of the program. They propose different other solutions:
 
* Return a tuple, like <code>[$result, $errorMessage]</code> where $errorMessage is null at success
* Return <code>null</code> at failure
* Tagged union result type (not yet possible in latest PHP)
* Result object
 
The con being that you have to manually propagate the error, where an exception can happen from anywhere in the stack trace without any other function knowing about it.


todo
=== If-statements ===
 
If-statements should not be too long or hard to understand. If they are, consider factor out the checks into separate functions.
 
<syntaxhighlight lang="php">
if ((empty($instance->oOptions->{$attribute}))
    || (
        !empty($instance->oOptions->{$attribute})
        && (
            $instance->oOptions->{$attribute} === 'inherit'
            || $instance->oOptions->{$attribute} === 'I'
            // NB: Do NOT use === here, it won't work with Postgresql.
            || $instance->oOptions->{$attribute} == '-1'
        )
    )
) {
</syntaxhighlight>
 
Can be shortened by factoring out method <code>isInherit</code>:
 
<syntaxhighlight lang="php">
if (empty($instance->options->{$attribute})
    || $this->isInherit($instance-options->{$attribute})) {
}
/**
* Returns true if $value is set to inherit
* @param mixed|null $value
* @return boolean
*/
public function isInherit($value) {
    return !empty($value)
        && (
              $value === 'inherit'
            || $value === 'I'
            // NB: Do NOT use === here, it won't work with Postgresql.
            || $value == '-1'
        );
}
</syntaxhighlight>
 
And one more time by using a variable for the value:
 
<syntaxhighlight lang="php">
$value = $instance->options->{$attribute};
if (empty($value) || $this->isInherit($value)) {
}
</syntaxhighlight>
 
More clear, right? "If it's empty or set to inherit, do this and that."
 
=== Variables ===
 
Descriptive naming.
 
* Follow PSR-12 - no underscore, use camelCaseLikeThis
* Variables of ActiveRecord classes can have same name as the class, like "$survey" or "$question"
* Add -s for plural when it's a list of items: "$surveys", "$questions"
* Never include "array" or "list" in variable names - using plural is enough (for lists, not for associative arrays - see below)
* The ''context'' of the function helps with variable naming, e..g "$name" might be fine in a function about questions; this is extra helpful when functions are short!
* When a function is short, it's much easier to figure out variable meaning based on context
* $qid, $gid, and $sid are so established in the code-base that they could be considered OK
* The -data suffix means "blob of stuff", only use it when needed, like with $viewData
* Use $result when baking a return value; add its exact type as @var annotation (worst possible name, acceptable but anything with more meaning is better)
 
Not so good:
 
<syntaxhighlight lang="php">
$result = LabelSet::model()->findAll();
</syntaxhighlight>
 
Better with model name and -s for plural:
 
<syntaxhighlight lang="php">
$labelSets = LabelSet::model()->findAll();
</syntaxhighlight>
 
* Do not use Hungarian notation (prefix with "type"), but instead <code>/** @var */</code> annotations - these can be checked automatically and will then be enforced to be correct, unlike Hungarian notation which can "rot"
 
<syntaxhighlight lang="php">
/** @var string[] An array of strings with 0-n elements */
$allFields = [];
</syntaxhighlight>
 
Psalm can infer types if the function documentation is enough. The strictness can be adjusted. Not all variables will need @var annotation, the tool will warn you when it's confused.
 
* Data that is sent to views as associative arrays will be:
 
<syntaxhighlight lang="php">
/** @var array Can be basically anything :( Better would be data-transfer objects or more precise notation */
$data = [];
// Maybe more clear as $viewData? Discuss.
</syntaxhighlight>
 
You can be more precise when you specify which type of array you're writing:
 
<syntaxhighlight lang="php">
/** @var array<string, string> This means the array key is string, and the array value is also string */
$data = [];
$data['name'] = 'Olle';
$data['occupation'] = 'Developer';
$data['age'] = 38;  // ERROR! Value must be string.
</syntaxhighlight>
 
For more info about array type notation, see the [https://psalm.dev/docs/annotating_code/type_syntax/array_types/ Psalm documentation].
 
Some smaller examples:
 
<syntaxhighlight lang="php">
$theme = App()->request->getPost('theme');  // Should be $themeType - $theme looks like a model/AR
$gridid === 'questionthemes-grid';          // Should be $gridId with capital "I", since it's two words in English
$delete = Yii::app()->request->getParam('delete');  // Tuff one - $delete is ok? Thoughts?
$files = json_decode(stripslashes($sJSON), true);  // $file is a resource, $files an array of resources, this is $fileData perhaps?
$valid_extensions_array = explode(",", $aAttributes['allowed_filetypes']); // Should be $validExtensionTypes
$language = Yii::app()->session['survey_' . $surveyid]['s_lang']; // TODO: Should this be a value class? Then Language::make(...);
$surveyInfo = getSurveyInfo($surveyId, $language);  // OK, BUT should be a separate DTO!
$fieldMap = createFieldMap($survey, 'short', false, false, $language);  // Same, make a DTO
$cquestions = array();  // What's "c"? Method name is getChildQuestions and $questions is an argument - better with $childQuestions?
</syntaxhighlight>
 
Links:
 
[https://www.approxion.com/capital-offenses-how-to-handle-abbreviations-in-camelcase/ How to Handle Abbreviations in CamelCase]
 
[https://stackoverflow.com/questions/15526107/acronyms-in-camelcase Acronyms in CamelCase]
 
=== Looping ===
 
* The bodies of foreach-loops are often a good candidate to factor out into separate functions.
 
'''Example:'''
 
<syntaxhighlight lang="php">
foreach ($questions as $question) {
  processQuestion($question);
}
</syntaxhighlight>
 
Note the variable naming from plural to singular.
 
* Loop indexing should be named i, j, k etc.
 
'''Example:'''
 
<syntaxhighlight lang="php">
for ($i = 0; $i < $length; $i++) {
    for ($j = 0; $j < $otherLength; $j++) {
        // ...
    }
}
</syntaxhighlight>


<hr>
<hr>


'''Looping'''
=== Function composition ===


todo
todo, "doThisAndThat" --> "doThat(doThis())", possibly also pipe(doThis(), doThat()).
 
=== Cases ===
 
<syntaxhighlight lang="php">
/**
* This function generates an array containing the fieldcode, and matching data in the same order as the activate script
*
* @param Survey $survey Survey ActiveRecord model
* @param string $style 'short' (default) or 'full' - full creates extra information like default values
* @param boolean $force_refresh - Forces to really refresh the array, not just take the session copy
* @param bool|int $questionid Limit to a certain qid only (for question preview) - default is false
* @param string $sLanguage The language to use
* @param array $aDuplicateQIDs
* @return array
*/
function createFieldMap($survey, $style = 'short', $force_refresh = false, $questionid = false, $sLanguage = '', &$aDuplicateQIDs = array())
</syntaxhighlight>
 
'''Good:'''
 
* Name is verb and noun
* Each argument is documented (almost)
 
'''Bad:'''
 
* Returns "array", which can be anything. A separate class for "Fieldmap" would be better.
* Not clear if it should be "createFieldMap" or "createFieldmap" - is "fieldmap" one word or two?
* Very many boolean arguments. Should probably be a class with methods like <code>$fieldmapCreator->setForceRefresh(true);</code> instead.
* Too long.
* Doesn't elaborate on what "fieldmap" really is, and does not link to any documentation about it.
* Access super-global, which makes it hard to test. Should instead return a data-transfer object, like:<br><syntaxhighlight lang="php">
$_SESSION['fieldmap'] = createFieldMap(...);
</syntaxhighlight>
* Has no test
* Mixes Hungarian notation with non-Hungarian notation in the argument list


== Defensive programming and error handling ==
== Defensive programming and error handling ==


todo
* '''Fail hard, fail early'''. The earlier the program crashes when something is wrong, the lower is the probability that you'll end up with corrupt data in the database that will later require manual labor to correct.
* Fail with '''actionable information'''
* Check input to functions and throw <code>InvalidArgumentException</code> if it violates the function's pre-conditions
* Use assert for class-internal invariants
* Use assert to make assumptions explicit for the reader


== Documentation ==
== Documentation ==


todo
=== PHPDoc ===
 
* All functions must have a proper docblock
* The first line of the docblock should describe the relation between ''all'' the input arguments and the output of the function in a single sentence
* Each parameter must have the correct type
* Don't use <code>array</code> as type if it's a list of object - instead, use <code>[]</code> like so: <code>Survey[]</code> for a list of surveys


== OOP design ==
=== Technical documentation ===


todo
todo, UML, scenarios, use-cases, Mermaid


== The PHP of yesterday ==
== The PHP of yesterday ==
Line 95: Line 966:
Old idioms and habits that should be abandoned.
Old idioms and habits that should be abandoned.


Assoc arrays vs DTO
* Don't use associative arrays, almost never. Data-transfer objects have multiple benefits: they are clearly structured, can be documented, can enforce invariants. There's ''no'' performance overhead of objects compared to arrays in modern PHP.
* Objects are already passed by reference; arrays are value types
* Use <code>[]</code> instead of <code>array()</code>
* Hungarian notation was useful before static analysis existed for PHP. These days it's better to use something that can be checked automatically: docblock @param and inline @var annotations.
 
== The PHP of tomorrow ==
 
todo
 
Union and intersection types
 
Short functions
 
Enum


Objects are already passed by reference; arrays are value types
Match expressions
 
Attributes


== Security ==
== Security ==


todo
todo, map to value objects
 
XSS (ActiveRecord filters, encode/decode)


XSS
CSRF (always use POST with CSRF token for delete, add, etc)


Database injection
Database injection (prepared statements, bind params)


Permissions and roles
Permissions and roles (always check permission in controller actions)


== Performance ==
== Performance ==


todo
todo
Loading too many rows into PHP models
Must be possible to scale; test with large datasets.
== Localization ==
===Basics===
Currently we are using the gT(),eT(),ngT() and neT() functions for translations.
Since LimeSurvey is available in 60 languages it is very important that your original English string (which will be picked up automatically for translation by our translators) is done right from the start. Imagine if we have to correct only one string at a later time - then 60 strings become invalid across the project and need to be re-translated by the poor translator souls.
* gT() is the original translation function. It will return a translated version of the string in the currently selected language. ie: echo ''gT("Hello");''
* ngT() returns multiple translations of a sentence or phrase for which alternate plural forms may apply. ie: ''echo sprintf(ngT('Please select at least %s answer','Please select at least %s answers',iMinimumAnswers),$iMinimumAnswers)'';
* eT() echos the translation directly. (ie: instead of ''echo gT("Hello");'' you can use ''eT("Hello");''
* neT() echos the multiple translations of the sentence or phrase for which alternate plural forms may apply directly.
Please follow these important rules:
===Spacing===
*'''Do not embed margin spaces''' in your translation. Instead use proper CSS formatting
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">eT('Visible? ');</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">eT('Visible?');</syntaxhighlight>
===Punctuation===
*'''Punctuation is always part of the translation'''
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">echo gT('Summary').':<br>'.gT('This is a great summary.');</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">echo gT('Summary:').'<br>'.gT('This is a great summary.');</syntaxhighlight>
===Linebreaks===
*'''Do not embed system linebreaks''' in your translation, unless there is a good reason.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">eT('This is a very long text.
We hope that you will really read it.');</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">eT('This is a very long text. We hope that you will really read it.');</syntaxhighlight>
===Capitalization===
*'''Do not capitalize words''' except where grammatically correct (like at the beginning of a sentence or for a brand name). LimeSurvey is not a newspaper.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">eT('Create A New Label Set');
eT('Google Maps API Key');
</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">eT('Create a new label set');
eT('Google Maps API key');
</syntaxhighlight>
===Full sentences===
*'''Do not split sentences ''' across several translation strings to form a sentence. If it is one sentence it should be in one translation unit.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">echo gT('This is a very long sentence and');
echo '<br>';
echo gT('this is the rest of it. This will be confusing for the translator.');</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">echo gT('This is a very long sentence and this is the rest of it.');
echo '<br>';
echo gT('This will not be confusing for the translator.');</syntaxhighlight>
===Concatenation===
*'''Do not concatenate''' several translations to form a sentence or concatenate to an additional information (like a number or string). Instead use the sprint() or sprintf() function with placeholders.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">echo gT('The sum must not be bigger than ') . $maxSum;</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">echo sprintf(gT('The sum must not be bigger than %d'), $maxSum);</syntaxhighlight>
This will avoid the problem that in other languages the word positioning of a sentence can be very different and the information $maxSum from the example above might be needed at a different position in the sentence, not just at the end.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">echo gT('The user ') . $username . gT(' was deleted.');</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">echo sprintf(gT('The user %s was deleted.'), $username);</syntaxhighlight>
Also don't concatenate words or sentences using place holders.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">echo sprintf(gT('The user was %s.')), gT( $action=='deleted'?'deleted':'created' );</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">if ($action == 'deleted') {
    eT('The user was deleted.')
} else {
    eT('The user was created.');
}</syntaxhighlight>
===Plurals===
*'''Use the n*T functions where applicable'''. These functions show a different translation depending the number of items. Currently LimeSurvey only supports the numbers/situations 1 and >1.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">echo sprintf(gT('Please select at least %d answer(s).'), $minimumAnswers);</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">echo sprintf(
    ngT('Please select at least %s answer', 'Please select at least %d answers', $minimumAnswers),
    $minimumAnswers
);</syntaxhighlight>
===Embedded HTML===
* Text '''should not contain HTML'''. If you desperately need HTML in there use sprint and %s tags.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">eT('This is an <b>important</b> message.');</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">printf(gT('This is an %simportant%s message.'),'<b>','</b>');</syntaxhighlight>
===Variables===
* Don't embed a variable or constant instead of the real string. The text will not be picked up by our translation system.
<span style='color:#F00'>Wrong</span>: <syntaxhighlight lang="php">
$strMessage = 'Very important message, that needs to be translated';
eT($strMessage);</syntaxhighlight>
<span style='color:#060'>Right</span>: <syntaxhighlight lang="php">eT('Very important message, that needs to be translated');</syntaxhighlight>
===JavaScript===
The script that analyzes the PHP files for translation strings do not work on JavScript. That means that ''all'' translatable strings ''must'' be written in PHP, using gT() or eT() functions.
The solution to this is to register a global JavaScript variable via PHP, to register translations. Example code in a controller, using heredoc and regsiterScript:
<syntaxhighlight lang="php">
$undo    = gT("Undo (ctrl + Z)", "js");
$redo    = gT("Redo (ctrl + Y)", "js");
$find    = gT("Find (ctrl + F)", "js");
$replace = gT("Replace (ctrl + H)", "js");
App()->getClientScript()->registerScript(
    "SurveyThemeEditorLanguageData",
    <<<JAVASCRIPT
surveyThemeEditorLanguageData = {
    undo: "$undo",
    redo: "$redo",
    find: "$find",
    replace: "$replace"
};
JAVASCRIPT, CClientScript::POS_BEGIN
);
</syntaxhighlight>


== Testing ==
== Testing ==


todo
Testing can be split in two parts:
 
* Manual QA-testing
* Automatic testing
 
Manual testing is part of team culture and project process.
 
Automatic testing can further be split into at least three:
 
* '''Unit-testing''', without any fixture setup, instead using mocking and stubs
* '''Functional testing''', with fixture and/or database and file setup
* '''Integrity testing''', requiring an installed software, scripting the browser
 
The biggest pro with integrity testing is the fact that you don't have to adapt the code, at all. The con is that it's slow to run, and the bigger the test suite, the slower it gets. You can improve the speed by adding more machines to your CI, or only run the complete suite during the night.
 
Unit-testing without any fixture has the pro of being possible to run without any database setup, and in parallel. For the developer, this means they can use it locally, run it after each save or before they commit to quickly check if they broke anything.
 
'''Test-driven development''', TDD, is about writing or designing your tests first, before writing any code. Even when not doing this, it's always a good idea to consider and think about how your code can be tested during the design phase.
 
The most important thing to facilitate unit-testing is to '''be explicit with your dependencies'''; each class should inject the classes it depends on, preferably in the constructor.
 
<syntaxhighlight lang="php">
// Without dependency injection
class CopyQuestion
{
    private function copySubquestions($parentId)
    {
        // Implicit dependency of Question model
        $subquestions = \Question::model()->findAllByAttributes(['parent_qid' => $parentId]);
    }
}
 
// With dependency injection
class CopyQuestion
{
    private $questionModel;
    public function __construct(Question $questionModel)
    {
        $this->questionModel = $questionModel;
    }
    private function copySubquestions($parentId)
    {
        // Question can now be mocked
        $subquestions = $this->questionModel->findAllByAttributes(['parent_qid' => $parentId]);
    }
}
</syntaxhighlight>
 
Note that it's impossible to mock functions. If a function has an implicit dependency, it better not be needed to mock in a test.


QA
<syntaxhighlight lang="php">
// Function with implicit dependency, not possible to unit-test, will need a fixture and database setup
function getGidPrevious($surveyid, $gid)
{
    $surveyid = (int) $surveyid;
    // Implicit dependency on QuestionGroup
    $qresult = QuestionGroup::model()->findAllByAttributes(array('sid' => $surveyid), array('order' => 'group_order'));
    // ...omitted code
}


PHPUnit
// Dependency is explicit in the function argument, can now be mocked and unit-tested
function getGidPrevious($surveyid, $gid, $questionGroupModel)
{
    $surveyid = (int) $surveyid;
    // Implicit dependency on QuestionGroup
    $qresult = $questionGroupModel->findAllByAttributes(array('sid' => $surveyid), array('order' => 'group_order'));
    // ...omitted code
}


TDD
// Better still, move the dependency up in the stacktrace to the client code
function getGidPrevious($qresult, $gid)
{
    // ...omitted code
}
</syntaxhighlight>
Mocking as a DSL, PHPUnit mocking tools. Mockery, Prophecy. Anonymous classes.


Mocks, can't mock functions, static methods, mocking as a DSL, PHPUnit mocking tools
Pure functions, referential transparency, side-effects.


Pure functions, referential transparency, side-effects
Negative testing (test that it fails correctly.)


Unit, functional, integration
Mutation test.


Don't test the framework; don't test PHP
Don't test the framework; don't test PHP
Adding automated tests makes sure that the function or class lives up to its specification; and it makes sure it will ''always'' live up to its specification, even in the future when other people start changing the code.


== Static analysis ==
== Static analysis ==


<code>php -l</code>, CodeSniffer, Mess Detector, Psalm
Static analysis can be used to find bugs in the code before you run it. "Static" here means the opposite of during runtime.
 
Since PHP has a mature ecosystem, it also has multiple static analysis tools to choose from.
 
<code>php -l</code>, CodeSniffer, Mess Detector, Psalm, phpcpd, metrics
 
== Database design ==
 
todo
 
unique constraint
 
Multiple rounds of senior-level review, since it's hard to change once in place.
 
== JavaScript ==
 
todo
 
classes, functions, components, tests, airbnb standard, eslint, flow/typescript for type checking
 
SonarCube
 
== CSS ==
 
todo
 
naming for id and classes, sass?
 
== HTML ==
 
todo
 
Twig, Yii view files, indentation, PHP alternative syntax.
 
== Feedback ==
 
=== "Why do I have to refactor code that works?" ===
 
Sadly, it's not enough for the code to live up to the current specification and requirements, it must also be protected against future changes, regressions. Adding tests is the best way for this. Lots of people might read and change the LimeSurvey code the next 5-10 years. The test suite creates a safety layer to make sure the specification is fulfilled even as the code changes.
 
Tests are not a magical solution, however. It does not protect us against other degradation of the code quality, like classes gaining too much responsibility (too many methods, unrelated methods, too many properties).
 
=== "What's the correct ratio between integrity tests and unit tests?" ===
 
Don't know. Different pros and cons.
 
* Unit-test can be run in parallel and are faster
* Unit-tests don't need setup and teardown
 
Cons of unit-tests is that they need a bigger effort on the side of the developer to make them possible, for example injecting the dependencies properly to enable mocking, or lifting side-effects up in the stack trace.
 
=== How to disagree ===
 
All rules can be broken. To break a rule, you should reason in terms of risk and the assumptions used in this guide.
 
'''Assumptions:'''
 
# Regressions are the biggest technical risk for LimeSurvey the last five years (2021)
# Automated tests are the best risk mitigation technique against regressions
 
'''Possible disagreement:'''
 
# A certain piece of code has very low ''probability'' of regressions (maybe we "know" no one else will every touch or change it?)
# Regression in a certain piece of code would have very low ''impact''
# There might be better techniques to reduce the risk of regression than automatic testing


== Resources ==
== Resources ==

Latest revision as of 19:44, 8 January 2024

DRAFT

Prologue

  1. Be risk aware. Too good code can be a business risk (slow to write, maybe over-designed, "gold plating"). Too sloppy code can also be a business risk (hard to maintain and understand). You have to find a balance that is adapted to the current situation and risk analysis, in which code quality becomes a risk mitigation technique among others.
  2. Be humble. LimeSurvey was made by developers from all over the world, with different age, education and experience. Your code might be read by a completely different team ten years from now, in the same way you are now reading code by developers that no longer work with us, but which work pays our rent.
  3. Design to isolate change and to prepare for change by making parts composable and interchangeable.
  4. Performance matters sometimes, and shouldn't be disregarded completely. In particular, database queries using the ORM and ActiveRecord can be problematic. Some surveys have thousands of questions and hundreds of thousands of responses. Fast response time is also important for a fluid user experience.
  5. It's harder to read code than to write code. Don't choose patterns that are easy or fast to write, but that are easy to read.
  6. Make your code communicate intent.
  7. It's easy to forget cross-cutting concerns like translation and security. Keep a mental note.
  8. Stress affects code quality and your risk behaviour. If your stress level is too high to write code with proper quality, take a step back and discuss it with your boss, or you'll push problems to the future.

Quality

The purpose of this guide is to increase the quality of the LimeSurvey code-base. What is quality? Which aspects of quality can be measured? Which kind of company culture gives rise to appropriate code quality?

It's usually easier to get along what is "bad" code than what's "best" code. Blacklist instead of whitelist?

Quality attributes related to code quality:

  • Readability
  • Testability
  • Maintainability
  • Performance
  • Security
  • Reliability
  • Composability[1]

Can not increase all at the same time - performance vs readability.

It's better to avoid emotional language when describing code quality, like "clean" or "dirty". Be precise, objective and suggest solutions[2].

Idiomatic code is more readable than non-idiomatic code. What's idiomatic depends on which context or domain you work in. We work in PHP and web application development, and have other idioms than in, say, functional or hardware-close programming.

Some code are "hot spots" - changed often. Those parts should require higher quality than other code. In principle, code quality doesn't matter (much) for code that never changes.

Don't fall victim to the broken-window theory when working on old parts of the code, but instead always leave code better than you found it ("the boy-scout rule")[3].

  1. Len Bass et al, Software Architecture in Practice
  2. Jonathan Boccara, The Legacy Code Programmer's Toolbox
  3. Mathew Skelton, Manual Pais; Team Typologies

Risk

Why a risk approach?

Risk can be used together with an agile development style, where the risk assessments decide the amount of upfront design that's needed. A waterfall process might use more design than is called for; on the other hand, an agile approach can get too code-driven and in the end lack architectural integrity. Risk can be used to find a balance in an agile workflow[1].

A risk approach can be another "tool in the toolbox" during a sprint iteration, together with use-cases, scenarios, MoSCoW specifications.

Risk can also be part of an architecture decision pipeline. More about that below.

  1. George Fairbanks, "Just enough software architecture"

Definition of risk

Risk can be defined as a tuple of probability and severity. Probability can be expressed as a number, or more informally as high, medium or low. Severity can be expressed in terms of business value or technical problems[1].

Another definition is a triple of consequence, uncertainty and knowledge base, (C, U, K). The knowledge base is important as evidence for the believes expressed in C and U, and to be able to compare different risks. Formally it can be expressed as P(A|K), the probability P of event A given the knowledge K.

One example is to express risk in a risk matrix:

Severity high Medium Low
Probability high
Medium
Low X

Above, an event is seen as having low probability but high severity.

Another way to model risk is using a fault tree or event tree.

The event tree highlights a risk source (basically any change to the code-base, but some changes have lower risk than others), and the value we want to protect, in this case data integrity.

A more informal approach is to just list the conceived risks in descending order:

  • Risk A (most important)
  • Risk B
  • Risk C

And then the mitigations for those risks:

  • Mitigation 1 for A
  • ...

Or together:

  • Risk A
    • Mitigation 1 for A
    • ...
  • Risk B
  • Risk C

Risk attributes

A risk can be said to have three attributes:

  • Complexity
  • Uncertainty
  • Ambiguity

Complexity decides how easy it is to express a risk, from risk source to consequence. Examples of simple risks are smoking (lots of data, clear correlation). Examples of complex risks are terrorism or nuclear plan accident (lots of moving parts, hard to gain knowledge, lots of unknowns).

Uncertainty decides how easy it is to estimate the probability of the risk. High uncertainty makes it likely that the estimate is wrong, and vice versa.

Ambiguity means it's hard to express if the risk really is a risk. It might be hard to communicate, describe and reach consensus of.

Also risk traceability.

Examples

Examples of risk in the LimeSurvey project:

  • Data loss
  • Regressions
  • Readability and maintainability
  • Bad UX

Some of the risks will be acceptable, some unacceptable, depending on time and context.

There are also risks related to the project management:

  • Conflicts in the team
  • Churn rate in the company
  • Missing or unstable requirements

There are different risks present when adding new features (lack of testing, wrong requirements, scope creep) compared to when refactoring (regressions). Even bug fixing itself can be risky if it's not reviewed and tested properly.

In a spiral model, you start with the high-risk items rather than the items with highest business or customer value. The purpose is to apply mitigation techniques until the risk reaches an acceptable level.

Unit tests and manual QA are risk mitigation techniques. Note that tests can't prove the absence of bugs, since there still can be logical or conceptual errors in code and in specification. Too much testing can be a risk if it drains too much developer time related to what customers expect. It's always about finding a balance.

  1. Terje Aven, "The Science of Risk Analysis"

Risk and architecture

Risk has a tight connection with quality attributes (see section above); a given risk will result in applying a certain set of quality attributes to mitigate that risk. The wanted quality attributes also decides the direction of the architecture.

In the future, we will try to apply an architecture decision pipeline like this:

Risk --> quality attributes --> architectural decisions

For example, if regressions are considered the highest risk, based on some evidence, then testability is the quality attribute we want to improve. This is done by a number of techniques, e.g. dependency injection, lifting out side-effects, no static methods, keeping functions small and composable.

Risk culture

DRAFT

High reliability organization, see signals/warnings early, whistle blower culture

Hindsight bias, how to relate to mistakes

Resilience instead of risk.

Decision making theory

To make rational decisions, you need at least:

  1. A clear problem formulation
  2. Searching for different solutions (consult, brainstorming, etc)
  3. A way to compare alternatives meaningfully
  4. Implementation; and
  5. Evaluation of the implementation and decision

RAID log

A RAID log is a log of:

  • Risks
  • Assumptions
  • Issues or impediments
  • Dependencies

It can be used to track both risks and assumptions during project development.

Assumptions can be listed together with confidence, how to validate it, and how to act in case the assumption turns out to be wrong.

Company culture

Company culture can be defined as the "beliefs and behaviours" of the company together with company values[1]

A lot of things in the company policy and culture affects code quality. One important point to improve code quality over time is an established trust between the business part and the IT part of the company[1].

Other issues mentioned by Lavallee et al[2] is:

  • Documentation, lack of enforcement on company level
  • Budget constraints, the team feels forced to apply quick fixes instead of proper long-term design
  • Human resource planning, where knowledge can disappear when short-term consultants quit
  • "Under pressure", when managers explicitly apply pressure in wording, assuming the developers are not working their hardest

todo, learning culture

todo, psychological safety

Company structure

DRAFT

Code quality and company culture? Hard to find.

Code quality and company structure:

  • Number of employees[3]
  • Project and intellectual ownership
  • Geographical distribution
  • Volatility[4]
  • Workload per developer
  • Budget constraints
  • Office politics[5]

Challenge: You have to somehow factor out the affect of the code quality itself. Trust the statistical method of the researches.

Fault density = Number of reported bugs per LOC (line of code). Software artifact = Function, class or module.

Correlation between team structure and fault density. Correlation is not causality. Again trusting the researcher to cover internal threat to validity properly.

Found correlation:

  • Intellectual ownership (compare with CPDB in LimeSurvey), conflict with bus factor and Scrum (partly)
  • Team volatility, someone leaves
  • Multiple people touching an artifact at once (split up artifacts in smaller parts; limit class inheritance chain)

One paper claims that fault density is 50% due to company structure, 50% due to code quality.

Change coupling and software architecture

Logic coupling High change coupling = features are scattered in the architecture. Change request (or modification request), like bug fix, new feature, refactor The perfect architecture: 1-to-1 relation between a change request and affected software artefact Impossible due to cross-cutting concerns, like logging, authentication, PHP 8 compatibility Fault prediction, use as empirical evidence for best practice

Potential fallacy: Give to much weight to code metrics that are

  1. Easy to measure
  2. Easy to change

While their correlated fault density might not be that high. Ambition: Measure which software artifacts are affected by bug reports the most.

Example: Expression manager has extremely high cyclomatic complexity, but very few bug reports are related to it, because it's never changed, and when changed, only changed by people who know it (Denis...).

References

  1. Jump up to: 1.0 1.1 Future-Proof Software-Systems, Frank J. Furrer
  2. Why Good Developers Write Bad Code: An Observational Case Study of the Impacts of Organizational Factors on Software Quality, Lavallee, Robillard
  3. The influence of organizational structure on software quality: An empirical case study, Nagappan et al
  4. Organizational volatility and its effects on software defects, Mockus
  5. Why good developers write bad code: an observation case study of the impacts of organizational factors on software quality, Lavellee et al

Class cohesion

todo, code metrics in general

Source

Coupling

Source

Clearer boundaries between classes and modules make it easier to isolate change.

More info

todo, function coupling? degrees of separation

https://www.educative.io/edpresso/what-are-the-different-types-of-coupling

Code duplication

Code duplication is when a piece of code is copied from one place to another, and possible slightly changed in regards of white-space, variable naming, constants, etc. It's negative for the code health, since it makes it harder to fix bugs - you will have to change each duplicate. Commits get bigger, it increases the chances of mistakes and makes testing harder.

Some duplication is OK, some is less OK. The important thing to remember when removing a duplicate is: Will the clones always change together, or will they diverge? Sometimes it's obvious, sometimes less obvious. No tool can do that job for you, but the longer the duplicate, the stronger arguments you need to keep it instead of refactor it. It's a function of probability of divergence and clone length.

Definitions

There are four main types of code clones defined by researchers:

  • Type 1 (exact, except for whitespace and comments)
  • Type 2 (exact, except for renaming of identifiers)
  • Type 3 (might include removed or added lines)
  • Type 4 (different but semantically same)[1][2]

The edit distance can be seen as the numbers of changes needed between two clones to make them identical.

A false positive is a clone detected by an algorithm, which is in fact not a clone. False negative is the opposite, where a clone is present in the code base, but the algorithm is not able to detect it. Both should be decreased as much as possible.

Some reports paint a picture that code clones are not harmful for changeability, or can not be proven to be harmful [3].

An overview of available algorithms:

Code clone detection algorithms

  • Textual, based on the code as-is
  • Tokens, code is parsed into token (in PHP, like T_ECHO or T_OPEN_TAG)
  • Syntactic, using abstract syntax-tree or similar data structure
  • Semantic, program dependency graph
  • Learning, machine learning and artificial intelligence

Overview for token-based algorithm:

Code clone algo

Sets of clone groups:

Clone group sets

Common bugs in duplicated code:

  • Missing null check (in PHP, maybe isset() or similar)
  • Index out of bounds
  • Faulty transaction handling
  • Inconsistent default values
  • Inconsistent validation behaviour[3]
  1. Liuqing Li et al, CCLearner: A Deep Learning-Based CloneDetection Approach
  2. Andrew Walker et al, Open-Source Tools and Benchmarks for Code-CloneDetection: Past, Present, and Future Trends
  3. Jump up to: 3.0 3.1 Elmar Juergens et al, Do Code Clones Matter?

Examples of code duplication in LimeSurvey

$this
    ->connection
    ->createCommand("SQL COMMAND")
    ->execute();
$this
    ->connection
    ->createCommand("OTHER SQL COMMAND")
    ->execute();

This can be factored out to:

public function executeCommand($command)
{
    $this
        ->connection
        ->createCommand($command)
        ->execute();
}

Again, the assumption to make this change is that no single command will need any other function call besides connect(), createCommand() and execute(). If they do, you'd again have to de-refactor the code (or worse, add boolean arguments, a common short-cut which does not scale well).

public function tableName()
{
    return '{{users}}';
}

This snippet is present in all ActiveRecord classes, the only difference is the string constant, which will be ignored by detection tools. It's part of the ActiveRecord pattern as implemented in Yii 1, and can't reasonably be removed. Further more, this snippet will never change together with another class.

if (\PHP_VERSION_ID < 80000) {
    libxml_disable_entity_loader(false);
}
$config = simplexml_load_file($xmlPath);
$attributes = json_decode(json_encode((array) $config), true);
if (\PHP_VERSION_ID < 80000) {
    libxml_disable_entity_loader(true);
}

This is a snippet with a bit of history. First, it was only simplexml_load_file(). Later, there was a security issue about it, and we needed to add libxml_disable_entity_loader(). Then, for PHP 8, libxml_disable_entity_loader was deprecated, and it needed to be wrapped inside a version check. Now, this is definitely a snippet that always changes together with all other instances, and must be refactored to a function. It can be hard to detect for the tools, since libxml_disable_entity_loader can happen lines apart from simplexml_load_file.

Other snippets look more like data than logic:

array(
    'header' => gT('First name') . $this->setEncryptedAttributeLabel(self::$sid, 'Token', 'firstname'),
    'name' => 'firstname',
    'value' => '$data->firstname',
    'headerHtmlOptions' => array('class' => 'hidden-xs'),
    'htmlOptions' => array('class' => 'hidden-xs name'),
),
array(
    'header' => gT('Last name') . $this->setEncryptedAttributeLabel(self::$sid, 'Token', 'lastname'),
    'name' => 'lastname',
    'value' => '$data->lastname',
    'headerHtmlOptions' => array('class' => 'hidden-xs'),
    'htmlOptions' => array('class' => 'hidden-xs name'),
),

It could be factored out to a GridColumn class to remove some of the duplication, but both the probability and severity of bug here would probably be low. It's hard to tune an algorithm to ignore cases like these. Easiest might be comments to tell the tool to ignore it.

Tools

There are multiple algorithms for code clone detection, as seen above. Not all are available as tools, and not all tools are open-source or still maintained. The most common seems to be Rian-Karp, which is based on text analysis together with tokenization and hashing.

Command-line tools:

  • phpcpd (PHP copy-paste detector)
  • jscpd (JS copy-paste detector, but can analyze multiple formats including PHP)

Full-system tools:

  • Scrutinizer
  • SonarQube

Full-system tools can include code clones in a general "score", as in Scrutinizer A-F grading system, together with other metrics.

There are two cases for applying a clone detection tool:

  • Single file analysis (assuming one class = one file)
  • Whole-project analysis

For single file analysis, you tool can be made more sensitive, because you can assume that code in a single class or file will change together. Whole-project analysis cannot make such an assumption, and is thus tuned to be more lenient. The purpose of this difference in configuration is to reduce both false positives and false negatives without annoying the developers.

It might be hard to include code clone detection in Travis/CI if there are false positives. Another approach might be seeing it as red flags which must be investigated. But by whom?

Empirical evidence of code quality guidelines

TODO links from stackexchange

Modernizing legacy code

TODO

Based on book

Modules

There's no inherit module functionality in PHP, but Yii has a module system. Libraries can be seen as a module system.

Examples of implicit modules in LimeSurvey:

  • Front-end (survey taking)
  • Back-end (survey design)
  • Back-end (non-survey specific pages, like label sets, CPDB)
  • Survey theme editing
  • Expression manager
  • Plugin system

In the future, they need to be separated and implemented more clearly as whatever Yii 3 module system is in use.

Some data models are common for all modules, e.g. surveys and users. Can have a "Common" module for such data.

todo, relation to Yii config system

todo, yii 3 program structure, by domain instead of class type

Classes

  • Avoid empty OOP ceremony, like defining getters and setters for all properties. Might as well make them public, unless they have invariants that need to be upheld, e.g. "this number is always between 1 and 10" or "this string is maximum length 20".
  • Almost always avoid static methods. They cannot be mocked. One exception is (possibly) factory methods (Foo::make(...)).
  • Inject all dependencies. This will make sure the class can be unit-tested (unit-tests have several benefits over integrity tests - they can be run in parallel and don't require any setup and teardown)
  • A second way to achieve testability is to factor out the methods that access the dependencies
  • Should methods that do not access any state be put in separate functions?

todo, state vs behaviour, state vs dependencies

OOP design

Start from top-down, with domain, behaviour, interaction between objects, database design, pseudo-code if needed. Can also include scenarios, use-cases and UML diagrams.

Object reification - not only domain models, but also domain actions, relations and activities can be objects, e.g. a class ActivateSurvey or ExportSurvey.

Hierarchy depth should be kept reasonable shallow to avoid conflicts between methods, spooky action at a distance. Compare with Yii 3, where controllers no longer inherit anything.

Some people advocate for composition instead of inheritance. Lots of blog posts available for more info.

Multiple inheritance is done with traits in PHP. Use-case: ActiveRecords where some but not all models support encryption. Other example, injection with traits that add property and setter/getter.

Interfaces

TODO by Borys

State

Heuristics to consider but not follow slavishly.

Keep a clear separation between classes with state and classes with no state.

Classes with state should not have dependencies with IO (it's OK to depend on classes in the same application layer, e.g. other DTO classes), and vice versa. ActiveRecord is an example which violates this principle. Repository pattern separates it.

   class Survey
     // Confusion - first params are domain state, last a resource
     __construct(id, language, dbConnection)

If a method does not call "this", should it instead be outside the class? A separate class or function, maybe.

todo, elaborate

Types of classes

Different categories of classes:

  • Entity (domain class; has state and validation behaviour)
  • Resource (database connection, file, curl, logger; state is in the resource (open, closed))
  • Repository classes (database interaction, like splitting ActiveRecord in two parts: Entity class and database interaction)
  • DAO, data access object (similar to resource)
  • Builders (e.g. query builder, HTML message builder)
  • Data-transfer object (no behaviour; pass to views to have better documentation and intent than associative arrays)
  • Value classes (e.g. Email address to wrap email string around)
  • Command object (e.g. SurveyActivator)
  • Helper (no state; e.g. QuestionAttributeHelper to bake attributes into different formats used by the views)
  • Wrapper

todo, immutability, possible with private properties that are set in the constructor,

todo: command-query separation for classes?

todo: dependencies between layers, dependencies to the framework

todo, pure vs effectful classes, where effect is anything that reads or writes to stdin/stdout/file/database etc.

Types of classes in LimeSurvey

  • ActiveRecord, used to communicate with the database, and should not do anything else
  • Controllers, used for glue code; should not contain business-logic (also CLI controllers)
  • Helper functions (not classes, yet...), should be short and obey command-query separation (TODO: Long functions should be separated into command objects or service classes); todo: should not have any dependencies?
  • Helper classes, similar to helper functions but have dependencies that need to be mocked during tests; should not have any state
  • Service classes, contains logic related to a clear and separate task, like create or import survey
  • Question render classes, contains logic related to how questions are rendered (TODO: Need better separation between state and logic and more clear design); similar to widgets
  • Plugin classes, using the plugin event system
  • Widgets, containing HTML logic for custom HTML widgets
  • HTML helpers (Yii)
  • Customizations to the Yii framework by inheritance
  • Unit and functional test classes

A couple of classes contain only data, like LsDefaultDataSet.

todo, no difference between state property and dependency property in OOP

What is "glue code"?

todo

Putting the pieces together, e.g. service class A need data from model B, fetch it and run.

todo, add example

"Where do I cut?"

Separation of concerns, but which concerns and which type of separation? Command-query (function core, imperative shell) is one type of separation. Other separation between layers, e.g. logic layer and data persistency layer (similar in thought).

Other separation, that one class should have a clear purpose. A design pattern should only implement one pattern, not multiple, and not half a pattern.

There's a problem with "helper" or "manager" classes that they can grow indefinitely, but might be better than nothing. Example when moving business logic out of ActiveRecord class to a helper class it can sometimes hard to be more specific than "helper".

Decisions like these should be related to risk analysis from the top of the document.

Size

A class should have a single and clear purpose; compare with SOLID.

Class name

In general, whenever possible, follow PHP, Yii (Yii 3 preferably) and PSR naming schemes.

  • Suffixes: Exception, Trait, Controller, Command.
  • Prefixes: Abstract.
  • ActiveRecord classes are named after the database table they are bound to, but with singular instead of plural. E.g. table "lime_users" has model "User".
  • Command classes are named after what they do and imperatively, e.g. CreateSurveyCommand
  • Helper classes should probably be named as specific as possible, to avoid them from growing too big
  • Class properties of ActiveRecord classes will follow the naming of the database, using underscore instead of camel case (and ActiveRecord classes should have no other properties than those (excluding the database connection injected by the framework))

Functions

Functions are one of the fundamental building blocks of programming, so it's important to get it right.

Functions that are part of a class are called "methods", if you want to be formal.

Different types of functions in PHP:

  • Functions (not attached to a class)
  • Methods (attached to class)
  • Static methods (attached to class, but no $this)
  • Anonymous functions (does not capture scope automatically)
  • Short functions (does capture scope automatically)

I'm using "function" as a word for all types of functions in PHP below.

Size

  • A function should be short enough to be completely readable on one screen, approx 60 lines at most
  • A function should not have too many arguments; if you have five or more, consider making it a class instead
  • If a function is growing too big, it might be better to create a class instead or split it into several functions (if both functions have common state, a class might be better, unless you want to pass around explicit state)

Contract

  • A function has a contract with its surrounding environment:
    • Pre-condition: What needs to be true before the function is executed
    • Post-condition: What will be true after the function has returned
  • The function contract can be checked with assertions (internal invariants) or exceptions (outside world); see below
  • Another good way is to use unit-tests to make sure the function's contract remains stable through changes
  • The first line in the docblock should be one sentence describing the relation between function arguments and its output
    Example
    /**
     * Creates a random password of length $length (defaults to 12)
     * @param int $length
     * @return string
     */
    function createPassword($lenght = 12)
    
    /**
     * Creates an HTML language drop down for survey with id $surveyId, marking $selectedLanguage as selected
     * @param int $surveyId
     * @param string $selectedLanguage
     * @return string <select> HTML
     */
    function createLanguageDropdown($surveyId, $selectedLanguage)
    

Lift side-effects up in the stack trace

This one is a bit abstract but very useful to increase testability.

First, what is a side-effect? In short, it's anything that happens in a function that is not described by the relation between function arguments and output.

Another way to phrase it is, that a side-effect is anything that is not always the same for the given input to a function; a function without any side-effects will always return the same result if it gets the same input.

Examples:

// gT accesses the file system and will return different result depending on what's in the translation files
function gT($sToTranslate, $sEscapeMode = 'html', $sLanguage = null);

// If $sText and $sEscapeMode is always the same, quoteText() will always return the same string.
function quoteText($sText, $sEscapeMode = 'html');

A function with no side-effects is called pure. A function that is guaranteed to always return (no exceptions, no errors, no warnings, no division by zero) is called total.

It's a good thing if functions in a code-base either have side-effects, or return a result. Not both. This is called command-query separation.

Examples of side-effects:

  • Database or file access
  • Echo, or write to stdout or stderr
  • Randomization

Why does this matter? Because side-effects make testing harder! If you have side-effects, you cannot create unit tests, unless those side-effects can be mocked. If not, you need integrity tests with database setup or more. The more side-effects you have in a function, the more mocking is required to unit test it.

So, one solution to this is to move side-effects higher up in the stack trace, that is, move them to the calling code.

Example:

function getSurveyInfo(int $surveyid, string $languagecode = '', boolean $force = false): array;

Since this function takes $surveyid, we know it will probably query the database. Quering the database is a side-effect, and thus it requires a database scaffold for the test (or mocking, but the database connection is not injected, so there's no way to do it). One simple solution is to lift the side-effect out and pass in a Survey object instead:

function getSurveyInfo(Survey $survey, string $languagecode = '', boolean $force = false): array;

Much better! The side-effect of calling the database now happens before the function call. Now we can either mock a Survey object, or make an imitating stub for our test.

Reading the function body, there is in fact another database access, but since this is a relation to the Survey model, it could easily be moved out too. We've just made a function testable! Hooray!

This principle is rephrased in an architectural settings as functional core, imperative shell. This architecture helps you get a good ratio between fast unit tests and slower integrity tests.

Moving side-effects up makes sure we can have a healthy ratio between integrity tests with scripted browser, and unit-tests; maybe around 20-80.

Function name

  • The function name should tell you what it does; docs should tell you why (if this information is not obvious). How is described by the code itself (hopefully, but can in some cases be relevant to document).
  • Function name can often be verb [+ adjective] + noun, like "createFieldmap", "quoteText", "getRelativePath"
  • Prefix with "is" for boolean functions

todo: difference between create, get, fetch, generate, ...

Good examples:

  • createPassword() - One verb, one noun - clear
  • isCaptchaEnabled() - Using "is" signifies it returns boolean; clear purpose
  • getFullResponseTable() - Verb, adjective, noun - but is "full" needed here? Is there a non-full version of the same function?
  • replaceExpressionCodes() - Pretty good, but it's obvious that more information is needed in the docblock to describe what "replace" means, and what an "expression code" is
  • fixCKeditorText() - "fix" is a bit vague, but maybe best there is? Pray that docblock contains information!

Bad examples:

  • languageDropdown() - No verb, doesn't tell you what it does
  • rmdirr() - Two r's, looks like a typo, but the last "r" actually means "recursive"! Better: rmdir($recursive = false), rmdir or removeFolder.
  • CSVUnquote() - Is CSV really the namespace? Why is the verb the second word?
  • incompleteAnsFilterState() - One verb, but in the middle, or is "filter state" a noun?; why abbreviate "answer" to "ans"?
  • reverseTranslateFieldNames() - Two verbs - did author mean "reverseTranslatedFieldNames"? Or "reverse and translate"? If so, it should be two functions, one for reverse and one for translate
  • checkUploadedFileSizeAndRenderJson() - If you have an "and" in the function name, you should consider splitting it into two functions

Controversial

  • gT() - An exception to descriptive function names; since this is used everywhere, the minimal name is motivated; maybe even name it "t()"? Still hard to read for someone who just started? But Yii uses t() too? Other frameworks use double underscore (Wordpress, Laravel) as function name.

Function filename and namespace

If your function has dependencies or side-effects, it should be put into a class instead, in which the dependencies can be injected. Another alternative is to put the dependencies as argument in the function, although this muffles the relationship between input and output. Again, this is to increase testability.

Keep in mind that functions outside classes cannot be mocked, so only use them when you know you won't need to mock them.

todo, autoloading of functions and folder location

Assertions

NB: This is not assertions as used in unit-tests, but assertions as used to check invariants in production code.

Assertions can be used for mental checks. They are only used for internal logic - failed assertions are not supposed to be seen by the end user. For interaction with the outside world, like database or browser, exceptions should be used instead.

Assertions can be disabled in production mode and have then no runtime cost (since PHP 7).

Example:

/**
 * Translation helper function
 *
 * @param string $sToTranslate
 * @param string $sEscapeMode Valid values are html (this is the default, js and unescaped)
 * @param string $sLanguage
 * @return string
 */
function gT($sToTranslate, $sEscapeMode = 'html', $sLanguage = null)
{
    assert(is_string($sToTranslate));
    assert(strlen($sToTranslate) > 0);
    assert($sEscapeMode === 'html' || $sEscapeMode === 'js' || $sEscapeMode === 'unescaped');
    return quoteText(Yii::t('', $sToTranslate, array(), null, $sLanguage), $sEscapeMode);
}

Exceptions

  • Exceptions are used to check inputs from the outside world (browser, database, file system)
  • Don't use exceptions for control flow, only when fatal errors happen that make further execution impossible
  • Exceptions should have clear error messages that make them actionable - there should be a clear "next step" for the user or developer
  • Never, ever, do an empty catch-block
  • At the moment, we do not have an exception hierarchy in LimeSurvey

Yii uses exception handler class. We use it for CHttpException, when denying permission and other cases.

Alternatives to exceptions

Some people consider exceptions being a hidden form of "goto" which confuses the control flow of the program. They propose different other solutions:

  • Return a tuple, like [$result, $errorMessage] where $errorMessage is null at success
  • Return null at failure
  • Tagged union result type (not yet possible in latest PHP)
  • Result object

The con being that you have to manually propagate the error, where an exception can happen from anywhere in the stack trace without any other function knowing about it.

If-statements

If-statements should not be too long or hard to understand. If they are, consider factor out the checks into separate functions.

if ((empty($instance->oOptions->{$attribute}))
    || (
        !empty($instance->oOptions->{$attribute})
        && (
            $instance->oOptions->{$attribute} === 'inherit'
            || $instance->oOptions->{$attribute} === 'I'
            // NB: Do NOT use === here, it won't work with Postgresql.
            || $instance->oOptions->{$attribute} == '-1'
        )
    )
) {

Can be shortened by factoring out method isInherit:

if (empty($instance->options->{$attribute})
    || $this->isInherit($instance-options->{$attribute})) {
}
/**
 * Returns true if $value is set to inherit
 * @param mixed|null $value
 * @return boolean
 */
public function isInherit($value) {
    return !empty($value)
        && (
               $value === 'inherit'
            || $value === 'I'
            // NB: Do NOT use === here, it won't work with Postgresql.
            || $value == '-1'
        );
}

And one more time by using a variable for the value:

$value = $instance->options->{$attribute};
if (empty($value) || $this->isInherit($value)) {
}

More clear, right? "If it's empty or set to inherit, do this and that."

Variables

Descriptive naming.

  • Follow PSR-12 - no underscore, use camelCaseLikeThis
  • Variables of ActiveRecord classes can have same name as the class, like "$survey" or "$question"
  • Add -s for plural when it's a list of items: "$surveys", "$questions"
  • Never include "array" or "list" in variable names - using plural is enough (for lists, not for associative arrays - see below)
  • The context of the function helps with variable naming, e..g "$name" might be fine in a function about questions; this is extra helpful when functions are short!
  • When a function is short, it's much easier to figure out variable meaning based on context
  • $qid, $gid, and $sid are so established in the code-base that they could be considered OK
  • The -data suffix means "blob of stuff", only use it when needed, like with $viewData
  • Use $result when baking a return value; add its exact type as @var annotation (worst possible name, acceptable but anything with more meaning is better)

Not so good:

$result = LabelSet::model()->findAll();

Better with model name and -s for plural:

$labelSets = LabelSet::model()->findAll();
  • Do not use Hungarian notation (prefix with "type"), but instead /** @var */ annotations - these can be checked automatically and will then be enforced to be correct, unlike Hungarian notation which can "rot"
/** @var string[] An array of strings with 0-n elements */
$allFields = [];

Psalm can infer types if the function documentation is enough. The strictness can be adjusted. Not all variables will need @var annotation, the tool will warn you when it's confused.

  • Data that is sent to views as associative arrays will be:
/** @var array Can be basically anything :( Better would be data-transfer objects or more precise notation */
$data = [];
// Maybe more clear as $viewData? Discuss.

You can be more precise when you specify which type of array you're writing:

/** @var array<string, string> This means the array key is string, and the array value is also string */
$data = [];
$data['name'] = 'Olle';
$data['occupation'] = 'Developer';
$data['age'] = 38;  // ERROR! Value must be string.

For more info about array type notation, see the Psalm documentation.

Some smaller examples:

$theme = App()->request->getPost('theme');  // Should be $themeType - $theme looks like a model/AR
$gridid === 'questionthemes-grid';          // Should be $gridId with capital "I", since it's two words in English
$delete = Yii::app()->request->getParam('delete');  // Tuff one - $delete is ok? Thoughts?
$files = json_decode(stripslashes($sJSON), true);  // $file is a resource, $files an array of resources, this is $fileData perhaps?
$valid_extensions_array = explode(",", $aAttributes['allowed_filetypes']); // Should be $validExtensionTypes
$language = Yii::app()->session['survey_' . $surveyid]['s_lang']; // TODO: Should this be a value class? Then Language::make(...);
$surveyInfo = getSurveyInfo($surveyId, $language);  // OK, BUT should be a separate DTO!
$fieldMap = createFieldMap($survey, 'short', false, false, $language);  // Same, make a DTO
$cquestions = array();  // What's "c"? Method name is getChildQuestions and $questions is an argument - better with $childQuestions?

Links:

How to Handle Abbreviations in CamelCase

Acronyms in CamelCase

Looping

  • The bodies of foreach-loops are often a good candidate to factor out into separate functions.

Example:

foreach ($questions as $question) {
  processQuestion($question);
}

Note the variable naming from plural to singular.

  • Loop indexing should be named i, j, k etc.

Example:

for ($i = 0; $i < $length; $i++) {
    for ($j = 0; $j < $otherLength; $j++) {
        // ...
    }
}

Function composition

todo, "doThisAndThat" --> "doThat(doThis())", possibly also pipe(doThis(), doThat()).

Cases

/**
* This function generates an array containing the fieldcode, and matching data in the same order as the activate script
*
* @param Survey $survey Survey ActiveRecord model
* @param string $style 'short' (default) or 'full' - full creates extra information like default values
* @param boolean $force_refresh - Forces to really refresh the array, not just take the session copy
* @param bool|int $questionid Limit to a certain qid only (for question preview) - default is false
* @param string $sLanguage The language to use
* @param array $aDuplicateQIDs
* @return array
*/
function createFieldMap($survey, $style = 'short', $force_refresh = false, $questionid = false, $sLanguage = '', &$aDuplicateQIDs = array())

Good:

  • Name is verb and noun
  • Each argument is documented (almost)

Bad:

  • Returns "array", which can be anything. A separate class for "Fieldmap" would be better.
  • Not clear if it should be "createFieldMap" or "createFieldmap" - is "fieldmap" one word or two?
  • Very many boolean arguments. Should probably be a class with methods like $fieldmapCreator->setForceRefresh(true); instead.
  • Too long.
  • Doesn't elaborate on what "fieldmap" really is, and does not link to any documentation about it.
  • Access super-global, which makes it hard to test. Should instead return a data-transfer object, like:
    $_SESSION['fieldmap'] = createFieldMap(...);
    
  • Has no test
  • Mixes Hungarian notation with non-Hungarian notation in the argument list

Defensive programming and error handling

  • Fail hard, fail early. The earlier the program crashes when something is wrong, the lower is the probability that you'll end up with corrupt data in the database that will later require manual labor to correct.
  • Fail with actionable information
  • Check input to functions and throw InvalidArgumentException if it violates the function's pre-conditions
  • Use assert for class-internal invariants
  • Use assert to make assumptions explicit for the reader

Documentation

PHPDoc

  • All functions must have a proper docblock
  • The first line of the docblock should describe the relation between all the input arguments and the output of the function in a single sentence
  • Each parameter must have the correct type
  • Don't use array as type if it's a list of object - instead, use [] like so: Survey[] for a list of surveys

Technical documentation

todo, UML, scenarios, use-cases, Mermaid

The PHP of yesterday

Old idioms and habits that should be abandoned.

  • Don't use associative arrays, almost never. Data-transfer objects have multiple benefits: they are clearly structured, can be documented, can enforce invariants. There's no performance overhead of objects compared to arrays in modern PHP.
  • Objects are already passed by reference; arrays are value types
  • Use [] instead of array()
  • Hungarian notation was useful before static analysis existed for PHP. These days it's better to use something that can be checked automatically: docblock @param and inline @var annotations.

The PHP of tomorrow

todo

Union and intersection types

Short functions

Enum

Match expressions

Attributes

Security

todo, map to value objects

XSS (ActiveRecord filters, encode/decode)

CSRF (always use POST with CSRF token for delete, add, etc)

Database injection (prepared statements, bind params)

Permissions and roles (always check permission in controller actions)

Performance

todo

Loading too many rows into PHP models

Must be possible to scale; test with large datasets.

Localization

Basics

Currently we are using the gT(),eT(),ngT() and neT() functions for translations.

Since LimeSurvey is available in 60 languages it is very important that your original English string (which will be picked up automatically for translation by our translators) is done right from the start. Imagine if we have to correct only one string at a later time - then 60 strings become invalid across the project and need to be re-translated by the poor translator souls.

  • gT() is the original translation function. It will return a translated version of the string in the currently selected language. ie: echo gT("Hello");
  • ngT() returns multiple translations of a sentence or phrase for which alternate plural forms may apply. ie: echo sprintf(ngT('Please select at least %s answer','Please select at least %s answers',iMinimumAnswers),$iMinimumAnswers);
  • eT() echos the translation directly. (ie: instead of echo gT("Hello"); you can use eT("Hello");
  • neT() echos the multiple translations of the sentence or phrase for which alternate plural forms may apply directly.

Please follow these important rules:

Spacing

  • Do not embed margin spaces in your translation. Instead use proper CSS formatting

Wrong:

eT('Visible? ');

Right:

eT('Visible?');

Punctuation

  • Punctuation is always part of the translation

Wrong:

echo gT('Summary').':<br>'.gT('This is a great summary.');

Right:

echo gT('Summary:').'<br>'.gT('This is a great summary.');


Linebreaks

  • Do not embed system linebreaks in your translation, unless there is a good reason.

Wrong:

eT('This is a very long text. 
We hope that you will really read it.');

Right:

eT('This is a very long text. We hope that you will really read it.');

Capitalization

  • Do not capitalize words except where grammatically correct (like at the beginning of a sentence or for a brand name). LimeSurvey is not a newspaper.

Wrong:

eT('Create A New Label Set');
eT('Google Maps API Key');

Right:

eT('Create a new label set');
eT('Google Maps API key');

Full sentences

  • Do not split sentences across several translation strings to form a sentence. If it is one sentence it should be in one translation unit.

Wrong:

echo gT('This is a very long sentence and'); 
echo '<br>';
echo gT('this is the rest of it. This will be confusing for the translator.');

Right:

echo gT('This is a very long sentence and this is the rest of it.'); 
echo '<br>';
echo gT('This will not be confusing for the translator.');

Concatenation

  • Do not concatenate several translations to form a sentence or concatenate to an additional information (like a number or string). Instead use the sprint() or sprintf() function with placeholders.

Wrong:

echo gT('The sum must not be bigger than ') . $maxSum;

Right:

echo sprintf(gT('The sum must not be bigger than %d'), $maxSum);

This will avoid the problem that in other languages the word positioning of a sentence can be very different and the information $maxSum from the example above might be needed at a different position in the sentence, not just at the end.

Wrong:

echo gT('The user ') . $username . gT(' was deleted.');

Right:

echo sprintf(gT('The user %s was deleted.'), $username);

Also don't concatenate words or sentences using place holders.

Wrong:

echo sprintf(gT('The user was %s.')), gT( $action=='deleted'?'deleted':'created' );

Right:

if ($action == 'deleted') {
    eT('The user was deleted.')
} else {
    eT('The user was created.');
}

Plurals

  • Use the n*T functions where applicable. These functions show a different translation depending the number of items. Currently LimeSurvey only supports the numbers/situations 1 and >1.

Wrong:

echo sprintf(gT('Please select at least %d answer(s).'), $minimumAnswers);

Right:

echo sprintf(
    ngT('Please select at least %s answer', 'Please select at least %d answers', $minimumAnswers),
    $minimumAnswers
);

Embedded HTML

  • Text should not contain HTML. If you desperately need HTML in there use sprint and %s tags.

Wrong:

eT('This is an <b>important</b> message.');

Right:

printf(gT('This is an %simportant%s message.'),'<b>','</b>');

Variables

  • Don't embed a variable or constant instead of the real string. The text will not be picked up by our translation system.

Wrong:

$strMessage = 'Very important message, that needs to be translated';
eT($strMessage);

Right:

eT('Very important message, that needs to be translated');

JavaScript

The script that analyzes the PHP files for translation strings do not work on JavScript. That means that all translatable strings must be written in PHP, using gT() or eT() functions.

The solution to this is to register a global JavaScript variable via PHP, to register translations. Example code in a controller, using heredoc and regsiterScript:

$undo    = gT("Undo (ctrl + Z)", "js");
$redo    = gT("Redo (ctrl + Y)", "js");
$find    = gT("Find (ctrl + F)", "js");
$replace = gT("Replace (ctrl + H)", "js");
App()->getClientScript()->registerScript(
    "SurveyThemeEditorLanguageData",
    <<<JAVASCRIPT
surveyThemeEditorLanguageData = {
    undo: "$undo",
    redo: "$redo",
    find: "$find",
    replace: "$replace"
};
JAVASCRIPT, CClientScript::POS_BEGIN
);

Testing

Testing can be split in two parts:

  • Manual QA-testing
  • Automatic testing

Manual testing is part of team culture and project process.

Automatic testing can further be split into at least three:

  • Unit-testing, without any fixture setup, instead using mocking and stubs
  • Functional testing, with fixture and/or database and file setup
  • Integrity testing, requiring an installed software, scripting the browser

The biggest pro with integrity testing is the fact that you don't have to adapt the code, at all. The con is that it's slow to run, and the bigger the test suite, the slower it gets. You can improve the speed by adding more machines to your CI, or only run the complete suite during the night.

Unit-testing without any fixture has the pro of being possible to run without any database setup, and in parallel. For the developer, this means they can use it locally, run it after each save or before they commit to quickly check if they broke anything.

Test-driven development, TDD, is about writing or designing your tests first, before writing any code. Even when not doing this, it's always a good idea to consider and think about how your code can be tested during the design phase.

The most important thing to facilitate unit-testing is to be explicit with your dependencies; each class should inject the classes it depends on, preferably in the constructor.

// Without dependency injection
class CopyQuestion
{
    private function copySubquestions($parentId)
    {
        // Implicit dependency of Question model
        $subquestions = \Question::model()->findAllByAttributes(['parent_qid' => $parentId]);
    }
}

// With dependency injection
class CopyQuestion
{
    private $questionModel;
    public function __construct(Question $questionModel)
    {
        $this->questionModel = $questionModel;
    }
    private function copySubquestions($parentId)
    {
        // Question can now be mocked
        $subquestions = $this->questionModel->findAllByAttributes(['parent_qid' => $parentId]);
    }
}

Note that it's impossible to mock functions. If a function has an implicit dependency, it better not be needed to mock in a test.

// Function with implicit dependency, not possible to unit-test, will need a fixture and database setup
function getGidPrevious($surveyid, $gid)
{
    $surveyid = (int) $surveyid;
    // Implicit dependency on QuestionGroup
    $qresult = QuestionGroup::model()->findAllByAttributes(array('sid' => $surveyid), array('order' => 'group_order'));
    // ...omitted code
}

// Dependency is explicit in the function argument, can now be mocked and unit-tested
function getGidPrevious($surveyid, $gid, $questionGroupModel)
{
    $surveyid = (int) $surveyid;
    // Implicit dependency on QuestionGroup
    $qresult = $questionGroupModel->findAllByAttributes(array('sid' => $surveyid), array('order' => 'group_order'));
    // ...omitted code
}

// Better still, move the dependency up in the stacktrace to the client code
function getGidPrevious($qresult, $gid)
{
    // ...omitted code
}

Mocking as a DSL, PHPUnit mocking tools. Mockery, Prophecy. Anonymous classes.

Pure functions, referential transparency, side-effects.

Negative testing (test that it fails correctly.)

Mutation test.

Don't test the framework; don't test PHP

Adding automated tests makes sure that the function or class lives up to its specification; and it makes sure it will always live up to its specification, even in the future when other people start changing the code.

Static analysis

Static analysis can be used to find bugs in the code before you run it. "Static" here means the opposite of during runtime.

Since PHP has a mature ecosystem, it also has multiple static analysis tools to choose from.

php -l, CodeSniffer, Mess Detector, Psalm, phpcpd, metrics

Database design

todo

unique constraint

Multiple rounds of senior-level review, since it's hard to change once in place.

JavaScript

todo

classes, functions, components, tests, airbnb standard, eslint, flow/typescript for type checking

SonarCube

CSS

todo

naming for id and classes, sass?

HTML

todo

Twig, Yii view files, indentation, PHP alternative syntax.

Feedback

"Why do I have to refactor code that works?"

Sadly, it's not enough for the code to live up to the current specification and requirements, it must also be protected against future changes, regressions. Adding tests is the best way for this. Lots of people might read and change the LimeSurvey code the next 5-10 years. The test suite creates a safety layer to make sure the specification is fulfilled even as the code changes.

Tests are not a magical solution, however. It does not protect us against other degradation of the code quality, like classes gaining too much responsibility (too many methods, unrelated methods, too many properties).

"What's the correct ratio between integrity tests and unit tests?"

Don't know. Different pros and cons.

  • Unit-test can be run in parallel and are faster
  • Unit-tests don't need setup and teardown

Cons of unit-tests is that they need a bigger effort on the side of the developer to make them possible, for example injecting the dependencies properly to enable mocking, or lifting side-effects up in the stack trace.

How to disagree

All rules can be broken. To break a rule, you should reason in terms of risk and the assumptions used in this guide.

Assumptions:

  1. Regressions are the biggest technical risk for LimeSurvey the last five years (2021)
  2. Automated tests are the best risk mitigation technique against regressions

Possible disagreement:

  1. A certain piece of code has very low probability of regressions (maybe we "know" no one else will every touch or change it?)
  2. Regression in a certain piece of code would have very low impact
  3. There might be better techniques to reduce the risk of regression than automatic testing

Resources

todo

Books, links, videos