Earliest Known Uses of Some of the Words of Mathematics (W)

Last revision: Dec. 28, 2013

WALD TEST in Statistics. This principle was introduced by Abraham Wald in his "Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large," Transactions of the American Mathematical Society, 54, (1943), 426-482 although many long-established procedures can be interpreted as Wald tests.

The term "Wald test" appears in S. D. Silvey "The Lagrangian Multiplier Test," Annals of Mathematical Statistics, 30, (1959), pp. 389-407.

The scheme of three test principles (a trinity) based on maximum likelihood estimates, viz. the Wald test, the Lagrange multiplier test and the likelihood ratio test, was proposed in Silvey’s paper.


WALLIS’S FORMULA for π. John Wallis gave this in his Arithmetica infinitorum published in 1656. See Encyclopaedia of Mathematics.

WEIBULL DISTRIBUTION appears in the title of Julius Lieblein’s "On Moments of Order Statistics from the Weibull Distribution," Annals of Mathematical Statistics, 26, (1955), 330-333. (David (2001))

The Swedish physicist Waloddi Weibull used this distribution in his 1939 "A Statistical Theory of the Strength of Materials" and went on to find new applications: see "A Statistical Distribution Function of Wide Applicability," Journal of Applied Mechanics, 18, (1951), 293-297.  The distribution was already known as the third limiting form for extreme-value distributions: see R. A. Fisher and L. H.C. Tippett Limiting Forms of the Frequency Distribution of the Largest of Smallest Member of a Sample, Proceedings of the Cambridge Philosophical Society, 24, (1928), 180-190.

See the entry EXTREME VALUE.

WEIERSTRASS APPROXIMATION THEOREM. Karl Weierstrass published the theorem in his 1885 paper, “Über die analytische Darstellbarkeit sogenannter willkülicher Funktionen reeller Argumente,” Sitzungsber. Akad. Wiss. Berlin, pp. 633–639; 789–805. It is reprinted in Werke 3 pp. 1-37. There is an important generalisation in the form of the STONE-WEIERSTRASS THEOREM. See the MathWorld entry.

WEIGHT and WEIGHTED. The earliest quotations for weight given by the OED are: "The arithmetical mean of a set of observations .. is the particular case when the weights a, a´, a´´ etc. are all equal, and the sum of the errors is equal to zero. (Phil. Mag. LXV, (1825), p. 167) and "The method of finding an average is this: multiply every observation by its weight and divide the sum of the products by the sum of the weights." (A. De Morgan Essay on Probabilities (1838) p. 138.)

The OED’s earliest quotation for weighted mean is "We may..call the constant c the specific weight of the observations to which it applies, and ΣcA ÷ Σc the weighted mean." (Encycl. Metrop. II, (1845) p. 443)

The term weighted least squares was surprisingly late in arriving, given that Gauss had described the method in 1809 in his first publication on least squares. David (2001) gives Karl Pearson’s "Notes on the History of Correlation," Biometrika, 13, (1920), p. 26.


WELL-BEHAVED as “applied to different entities with varying implications as to their susceptibility to manipulation, as continuity or differentiability (of a function), convergence (of a series).” (OED).

The expression has been in general English since the time of Shakespeare but its use as a mathematical term of art appears to date from the 1930s. A JSTOR search found Adams and Clarkson writing in “Properties of Functions f(x, y) of Bounded Variation,” Transactions of the American Mathematical Society, 36, (1934), p. 712 of “functions which are to a certain extent well behaved, perhaps to the extent of belonging to the Baire classification.” The OED’s earliest citation is from Boyer Concepts of Calculus (1939): “Inasmuch as Euler restricted himself to well-behaved functions, he did not become involved in those subtle difficulties connected with the notions of infinity.” [John Aldrich]

See the entry PATHOLOGICAL.

WELL-ORDERED. The term wohlgeordnet was used by Cantor in an extensive paper, "Über unendliche lineare Punctmannichfaltigkeiten," which appeared in Mathematische Annalen in six parts between 1879 and 1884. In part five, which appeared in vol. 21 (1883), (page 548) (or Collected Papers (p.168)) he wrote:

By a well-ordered set we understand any well-defined set whose elements are related by a well-determined given succession according to which there is a first element in the set and for any element (if it is not the last one) there is a certain next following element. Furthermore, for any finite or infinite set of elements there is a certain element which is the next following one for all these elements (except for the case that such an element which is the next following one to these elements does not exist).

This translation was taken from Cantor’s Philosophical Views by Walter Purkert.

When an English literature developed the term was translated as well-ordered, an expression that had been in English since the sixteenth century.  A JSTOR search found well-ordered in E. H. Moore “On the Theory of Improper Definite Integrals,” Transactions of the American Mathematical Society, 2, (1901), p. 473 and A. N. Whitehead “On Cardinal Numbers,” American Journal of Mathematics, 2, (1902), p. 384.

WEYL’S EQUIDISTRIBUTION THEOREM. "Equidistribution" is in the title of the paper in which Hermann Weyl (1885-1955) (NAS biographical memoir) published the theorem: "Über die Gleichverteilung von Zahlen mod. Eins," Mathematische Annalen, 77, (1916) 313-352. However, as G. H. Hardy and E. M. Wright An Introduction to the Theory of Numbers (1938, p. 381) point out, "the theorem seems to have been found independently, at about the same time, by Bohl, Sierpinski and Weyl."

WHITE NOISE. Originally the term referred to a form of sound or of electrical interference but it now also refers to a type of random process. "Inside the plane ... we hear all frequencies added together at once, producing a noise which is to sound what white light is to light." (L. D. Carson, W. R. Miles & S. S. Stevens, "Vision, Hearing and Aeronautical Design," Scientific Monthly, 56, (1943), 446-451). S. Goldman’s book on radio engineering, Frequency Analysis, Modulation and Noise (1948), has a mathematical treatment of white noise.

By 1953 white noise had entered the stochastic process literature, as in "On the Fourier Expansion of Stationary Random Processes" by R. C. Davis (Proceedings of the American Mathematical Society, 4, 564-569) [John Aldrich].


WIENER-HOPF equation, factorization, technique are terms associated with the paper by Norbert Wiener and Eberhard Hopf "Uber eine Klasse singularer Integralgleichungen," Sitzber. Deutsch. Akad. Wiss. Berlin, Kl. Math Phys. Tech, 1931, pp. 696-706.

WIENER PROCESS appears in M. Kac’s "On Deviations Between Theoretical and Empirical Distributions," Proc. Nat. Acad. Sciences, 35, (1949), 252-257. The name recalls N. Wiener’s analysis of "the Brownian movement" in "Differential-space" J. Math. and Phys. 2 (1923) 131-174. (See BROWNIAN MOTION.) [John Aldrich]

The WILCOXON RANK-SUM TEST and SIGNED RANK TEST were proposed in Frank Wilcoxon (1892-1965) "Individual Comparisons by Ranking Methods," Biometrics Bulletin, 1, (1945), 80-83. The properties of these tests were studied in a stream of papers beginning in the early 1950s, including J. Hemelrijk "Note on Wilcoxon’s Two-Sample Test when Ties are Present," Annals of Mathematical Statistics, 23, (1952), 133-135. David (2001) writes of the signed-rank test that this "clever and helpful term was coined by Tukey (1949) in an unpublished but repeatedly cited technical report ["The simplest signed-rank tests."]."

WILSON’s THEOREM was given its name by Edward Waring (1734-1798) for his friend, John Wilson (1741-1793). The first published statement of the theorem was by Waring in his Meditationes algebraicae (1770), although manuscripts in the Hanover Library show that the result had been found by Leibniz.

WINDOW in Statistics, particularly time series analysis. The term was introduced in B. Blackman & J. W. Tukey’s “The Measurement of Power Spectra,” Bell System Technical Journal, 37, (1958). It appears in several forms, including data window, lag window and spectral window. An alternative term in some of these uses is KERNEL. The first window to be proposed for estimating the spectral density was the so-called DANIELL WINDOW. [John Aldrich]

WINSORIZED is found in 1960 in W. J. Dixon, "Simplified Estimation from Censored Normal Samples," The Annals of Mathematical Statistics, 31, 385-391. Dixon explains the term, "Winsor [4] and perhaps others have suggested using for the magnitude of an extreme, poorly known, or unknown observation the value of the next largest (or smallest) observation."  The reference is to a personal communication from Charles P. Winsor (1895-1951).  (David, 1998).

J. W. Tukey writes that, when he first met Winsor in 1941, "he had already developed a clear and individual philosophy about the proper treatment of "wild shots" ... It seems only appropriate, then, to attach his name to the process of replacing certain of the most extreme of the observations in the sample by the nearest unaffected values, to speak of Winsorizing or Winsorization." (from "The Future of Data Analysis," Annals of Mathematical Statistics, 33, (1962), p. 18.


WISHART DISTRIBUTION. The title of John Wishart’s "The Generalised Product Moment Distribution in Samples from a Normal Multivariate Population," Biometrika, 20A (1928), 32-52, describes exactly what he was interested in. However the distribution he derived, a multivariate generalisation of χ2, is almost always called the Wishart distribution.

WITCH OF AGNESI. Luigi Guido Grandi (1671-1742) studied this curve in 1703 and is believed to have been the first to call it versiera or versoria in Latin, meaning "turning in every direction." According to Boyer in History of Analytic Geometry, Grandi coined the Italian word la versiera in 1718. The term appears in Father Guido Grandi’s commentary on the Trattato del Galileo del moto naturalmente accelerato (Opere di G. Galilei, III, Firenze, 1718, p. 393): "...sarebbe quella curve, che io descrivo nel mio libro delle quadrature alla prop. 4, nata da seni versi, che da me suole chiamarsi la versiera in latino però versoria..."

In 1748, Maria Gaetana Agnesi (1718-1799), in Istituzioni Analitiche, the first calculus book written by a woman, also called the curve la versiera, using the name twice.

The British mathematician John Colson (1680-1760), translating Agnesi’s work into English, translated the Italian word versiera as "the Witch." He wrote, "...and therefore [equation] or [equation] will be the equation of the curve to be described, which is vulgarly called the Witch." He also wrote, "Let the curve to be described be that of Prob. III. n. 238, called the Witch, the equation of which is [equation]." Colson gave the name a third time, in a marginal note, "Another example of the curve called the Witch."

According to the translator’s preface to the 1801 English edition of Analytical Institutions, Colson learned Italian for the sole purpose of translating this work.

The expression witch of Agnesi is found in English in 1875 in An elementary treatise on the integral calculus by Benjamin Williamson (1827-1916): “Find the area between the witch of Agnesi xy2 = 4a2 (2ax) and its asymptote” (OED).

See the entry CAUCHY DISTRIBUTION and MathWorld Witch of Agnesi.

WITHOUT LOSS OF GENERALITY. Without any loss of generality is found in 1842 Mathematical Tracts on the Lunar and Planetary Theories by George Biddell Airy: "But it is plain that, without any loss of generality, we may get rid of A by altering the origin of time from which t is reckoned, or the origin of linear measure from which x is reckoned." [Google print search]

Without loss of generality is found in 1843 in "On the theory of determinants" by Arthur Cayley in Trans. Camb. Phil. Soc. 8: "Hence, without loss of generality, the theorems which follow may be stated with reference to a single marked column only" [University of Michigan Digital Library].

A Google print search found the abbreviation WLOG in 1964 in Group Theory by W. R. Scott. There the abbreviation has an asterisk and the footnote explains what it stands for.

The WOLD DECOMPOSITION (theorem) by which a stationary series is expressed as a sum of a deterministic component and a stochastic component which can itself be expressed as an infinite moving average was given in Herman Wold’s A Study in the Analysis of Stationary Time Series (1938). The name only became common in the 1960s. (JSTOR search.)

WORKING HYPOTHESIS occurs in 1868-1870 in Essays, philosophical and theological by James Martineau: "Mr. Mansel entreats us to hold, and to guide our footsteps; calling them 'regulative truths,' by which he means the best working hypothesis we are able to attain of the character and purposes of God" [University of Michigan Digital Library].

WORKING MATHEMATICIAN. In an article "The Ignorance of Bourbaki" (The Mathematical Intelligencer vol. 14, no 3, 1992), A. R. D. Mathias suggests that this phrase is due to Bourbaki. However, Carlos César de Araújo has found it in a paper by Eliakim Hastings Moore, "On the foundations of mathematics" (Bull. A. M. S., 1903, p. 406).

The term WRONSKIAN (for Höené Wronski) was coined by Thomas Muir (1844-1934) in 1881 (Cajori 1919, page 310).

Front - A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z - Sources