Science is based on measurement. Improving a process requires understanding of the numerical relationships. Software Metrics this requires measurement.
Software measurement is the mapping of symbols to objects. The purpose is to quantify some attribute of the objects, for example, to measure the size of software projects. Additionally, a purpose may be to predict some other attribute that is not currently measurable, such as effort needed to develop a software project.
Not all mappings of symbols to objects are useful. An important concern is the validation of metrics. However, validation is related to the use of the software metrics. An example is a person’s height. Height is useful for predicting the ability of a person to pass through a doorway without hitting his or her head. Just having a high correlation between a measure and an attribute is not sufficient to validate a measure. For example, a person’s shoe size is highly correlated to the person’s height. However, shoe size is normally not acceptable as a measure of a person’s height.
The following are criteria for valid metrics:
- A metric must allow different entities to be distinguished.
- A metric must obey a representation condition.
- Each unit of the attribute must contribute an equivalent amount to the metric.
- Different entities can have the same attribute value.
Many times, the attribute of interest is not directly measurable. In this case, an indirect measure is used. An indirect measure involves a measure and a prediction formula. For example, density is not a direct measure. It is calculated from mass and density, which are both direct measures. In computer science, many of the ‘‘ilities’’ (maintainability, readability, testability, quality, complexity, etc.) cannot be measured directly, and indirect measures for these attributes are the goal of many software metrics programs.
The following are criteria for valid indirect metrics:
- The model must be explicitly defined.
- The model must be dimensionally consistent.
- There should be no unexpected discontinuities.
- Units and scale types must be correct.
Software Measurement Theory
The representational theory of measurement has been studied for over 100 years. It involves an empirical relation system, a numerical relation system, and a relation preserving mapping between the two systems.
The empirical relation system (E, R) consists of two parts:
- A set of entities, E
- A set of relationships, R
The relationship is usually ‘‘less than or equal.’’ Note that not everything has to be related. That is, the set R may be a partial order.
The numerical relation system (N, P) also consists of two parts:
- A set of entities, N. Also called the ‘‘answer set,’’ this set is usually numbers natural numbers, integers, or reals.
- A set of relations, P. This set usually already exists and is often ‘‘less than’’ or ‘‘less than or equal.’’
The relation preserving mapping, M, maps (E, R) to (N, P). The important restriction on this mapping is called the representation condition. There are two possible representation conditions. The most restrictive version says that if two entities are related in either system, then the images (or pre-images) in the other system are related: x rel y iff M(x) rel M(y)³
The less restrictive version says that if two entities are related in the empirical system, then the images of those two entities in the numerical system are related in the same way: M(x) rel M(y) if x rel y
Classical measurement theory authors have used both versions. The advantage of the second version is that partial orders in the empirical system can be mapped to integers or reals that are both totally ordered.