BAIRE CLASSES (of functions). These concepts were introduced by René Baire in his “Sur la théorie des fonctions discontinues,” Comptes rendus, 129, (1899), 1010-1013. The phrase appears in the title of C. de la Vallé Poussin’s Intégrales de Lebesgue, Fonctions d'ensemble, Classes de Baire (Paris, 1916). See the entry in the Encyclopedia of Mathematics.
BANACH SPACE. In their treatise Linear Operators Part I (1957) N. Dunford & J. T. Schwartz write, "Axioms closely related to those of a normed linear space were introduced, in 1916, by Bennett ... In 1922, Banach, Hahn and Wiener published papers using the same or similar sets of axioms. Though Banach did not initiate the study of these spaces, his contributions were many and deep--for that reason many authors use the term Banach space to refer to a complete normed linear space." (p. 85)
Stefan Banach’s (1892-1945) paper of 1922, “Sur les opérations dans les ensembles abstraits et leur application aux équations integrales”, Fundamenta Mathematicae, 3, 133-181, was based on the thesis he submitted in 1920. In 1928 in Les Espaces Abstraits Maurice Fréchet (1878-1973) wrote about "les espaces de M. Banach." In his own Théorie des Operations Linéaires (1932, ch. IV, p. 53) Banach used the term "espace du type (B)." A JSTOR search finds Banach space being used in 1934 T. H. Hildebrandt's "On Bounded Linear Functional Operations," Transactions of the American Mathematical Society, 36, 868-875 and the expresssion soon came into general use.
[This entry was contributed by John Aldrich.]
BANACH-STEINHAUS THEOREM is due to Stefan Banach and Hugo Steinhaus, “Sur le principe de la condensation de singularités” Fund. Math., 9, (1927), 50–61. See Enyclopedia of Mathematics: Banach-Steinhaus theorem.
The BANACH-TARSKI PARADOX is named for a result in S. Banach and A. Tarski’s “Sur la décomposition des ensembles de points en parties respectivement congruentes”, Fundamenta Mathematicae, 6, (1924), 244-277. The result proves that a sphere can be cut into a finite number of pieces and then reassembled into a sphere of larger size. The proof uses the axiom of choice but, according to Gregory Moore's Zermelo's Axiom of Choice: Its Origins, Development and Influence (1982, pp. 284-5), the authors did not consider the result discredited the axiom. In this respect their attitude was like that of Hausdorff who had found a related result some years earlier.
The tag paradox seems to have become attached to the result in the 1940s. A JSTOR search found L. M. Blumenthal "A Paradox, a Paradox, a Most Ingenious Paradox," American Mathematical Monthly, 47, (1940), pp. 346-353. (The title is taken from Gilbert and Sullivan’s story of Frederic, who being born on February 29th had celebrated only 5 bithdays by the time he was 21: A most ingenious paradox.) For Blumenthal the theorem is a paradox, not because it embodies a contradiction, but because it goes against common sense notions about congruence. The word is also found in the title of Wacław Sierpinski’s “Sur le paradoxe de MM. Banach et Tarski”, Fundamenta Mathematicae 33, 229-234 (1945).
See AXIOM OF CHOICE, HAUSDORFF PARADOX and PARADOX.
BAR CHART occurs in Nov. 1914 in W. C. Brinton "Graphic Methods for Presenting Data. IV. Time Charts," Engineering Magazine, 48, 229-241 (David, 1998).
The form of diagram, however, is much older and seems to have been introduced by William Playfair. There is an example in his Commercial and Political Atlas of 1786: see Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization 1700-1799
BAR GRAPH is found in 1919 in School Statistics and Publicity by Carter Alexander: "The data shown in this circle graph for Rockford may be presented in a bar graph which permits of placing the figures so they can be added." [Google print search]
BARTLETT ADJUSTMENT or CORRECTION is a correction factor applied to the likelihood ratio test statistic to make its distribution under the null hypothesis conform better to the asymptotic chi-squared form. It was proposed in 1937 by M. S. Bartlett "Properties of Sufficiency and Statistical Tests," Proceedings of the Royal Society of London. A, 160, 268-282 but it has received most attention in the last 20 years.
The term BARYCENTRIC CALCULUS appears in 1827 in the title Der barycentrische calkul by August Ferdinand Möbius (1790-1868).
BASE (of a geometric figure) appears in English in 1570 in Sir Henry Billingsley's translation of Euclid's Elements (OED2).
BASE (in an isosceles triangle) is found in English in 1571 in Digges, Pantom.: "Isoscheles is such a Triangle as hath onely two sides like, the thirde being vnequall, and that is the Base" (OED2).
BASE (in logarithms) appears in Traité élémentaire de calcul différentiel et de calcul intégral (1797-1800) by Lacroix: "Et si a désigne la base du système, il en résulte l'équation y = ax, dans laquelle les logarithmes sont les abscisses."
Base is found in the 1828 Webster dictionary, in the definition of radix: "2. In logarithms, the base of any system of logarithms, or that number whose logarithm is unity."
BASE (of a number system). Radix was used in the sense of a base of a number system in 1811 in An Elementary Investigation of the Theory of Numbers by Peter Barlow [James A. Landau].
Base is found in the Century Dictionary (1889-1897): "The base of a system of arithmetical notation is a number the multiples of whose powers are added together to express any number; thus, 10 is the base of the decimal system of arithmetic."
BASE ANGLE is found in 1848 in "On the Formation of the Central Spot of Newton's Rings Beyond the Critical Angle" by Sir George Gabriel Stokes in the Transactions of the Cambridge Philosophical Society [University of Michigan Historic Math Collection].
BASIS (of a vector space). The term basis-system was used by Frobenius and Stickelberger in 1878 in Crelle, according to Moore (1896) [James A. Landau].
BAYES and BAYESIAN. Thomas Bayes (1702-1761) and his single work on probability, the posthumously published An Essay towards solving a Problem in the Doctrine of Chances (Philosophical Transactions of the Royal Society of London 53 (1763), 370-418), have inspired several terms. Some, including "Bayes's theorem," have only a tenuous connection to Bayes.
The Essay considers the problem: "Given the number of times in which an unknown event has happened and failed: Required the chance of the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named." Or, in modern terms: given the outcomes of a number of Bernoulli trials, find the posterior distribution of the probability of a success. For the prior Bayes took a uniform distribution on the unit interval.
The essay is difficult and little is known of its background. Historians have asked, "Was Bayes a Bayesian?" (D. A. Gillies Historia Mathematica, 14, (1987), 325-346) and "Who discovered Bayes's Theorem?" (reprinted in Stigler (1999)). A. I. Dale's A History of Inverse Probability from Thomas Bayes to Karl Pearson (2nd edition, 1999) is useful for the changing interpretations. Inverse Probability was the term used in the 19th and early 20th centuries for the probability found when reasoning from effects to causes, direct probabilities being used when reasoning from causes to effects. Throughout this period inference based on inverse probability (now called Bayesian inference) coexisted with inference based on procedures with 'good' repeated sampling properties (now called classical inference).
Bayes's effort was soon superseded by Laplace's more general and apparently more powerful work; the first instalment was the "Mémoire sur la Probabilité des Causes par les événements," Savants étranges 6, (1774), p. 621-656. Oeuvres 8, pp. 27-65. (English translation and commentary by S. M. Stigler in Statistical Science, 1, (1986), 359-378). In 1838 Augustus De Morgan was writing, "This [inverse] method was first used by the Rev. T. Bayes ... [who], though almost forgotten, deserves the most honourable remembrance from all who treat the history of this science." (An Essay on Probabilities, p. vii.)
Today the terms Bayes's Formula, Rule and Theorem are associated with a basic theorem on conditional probability. "La règle de Bayes" appears with this meaning in 1843 in A. A. Cournot's Exposition de la Théorie des Chances et des Probabilités (pp. 158-9). Cournot says the rule is "attributed to Bayes." It is not in the Essay but comes from Laplace: it is his VIth principle for the case where the causes are unequally probable: see Théorie Analytique des Probabilités (1814, pp. xiv-xv in the edition on Gallica.) Laplace's terminology, of "cause" or "hypothesis" for the event whose conditional probability is to be found, survived well into the twentieth century. Thus in J. L. Coolidge's An Introduction to Mathematical Probability (1925) "Bayes' Principle" sits in the chapter on "the probability of causes." However "der Bayessschen Satz" in A. N. Kolmogorov's Grundbegriffe der Wahrscheinlichkeitsrechnung (1933, p. 46) is just a theorem about events.
Bayes's Theorem has also been used in a more historically accurate way. In J. W. Lubbock & J. E. Drinkwater-Bethune's On Probability (1830, p. 48) it refers to Bayes's original problem and solution, as it does in Isaac Todhunter's authoritative A History of the Mathematical Theory of Probability (1865, p. 299). Karl Pearson ("On the Influence of Past Experience on Future Expectation," Philosophical Magazine, 13, (1907), 365-378) and R. A. Fisher took this usage into the 20th century: see e.g. Fisher's "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922, p. 324). While Pearson gave qualified approval to Bayes's theorem, Fisher rejected it outright and was its most persistent critic. This use of "Bayes's theorem" (and "Bayes's postulate" for the uniform prior) appears to have lapsed.
Bayes Estimate, Bayes Risk and Bayes Solution are terms used in Abraham Wald's classical (= non-Bayesian) statistical decision theory. Wald ("Contributions to the Theory of Statistical Estimation and Testing Hypotheses," Annals of Mathematical Statistics, 10, (1939), 299-326) found it "useful" to consider "hypothetical a priori distributions" of the parameter (p. 306). Wald used the term "Bayes solution" in his "An Essentially Complete Class of Admissible Decision Functions" Annals of Mathematical Statistics, 18, (1947), 549-555. In 1948 J. L. Hodges & E. L. Lehmann (Annals of Mathematical Statistics, 19, 396-407) used the term "Bayes risk" for a concept Wald had treated in 1939 without naming it. In "Some Problems in Minimax Point Estimation," Annals of Mathematical Statistics, 21, (1950), 182-197, they renamed the "minimum risk estimate" of 1939 the "Bayes estimate."
The term Bayesian entered circulation around 1950. R. A. Fisher used it in the notes he wrote to accompany the papers in his Contributions to Mathematical Statistics (1950). Fisher thought Bayes's argument was all but extinct for the only recent work to take it seriously was Harold Jeffreys's Theory of Probability (1939). In 1951 L. J. Savage, reviewing Wald's Statistical Decisions Functions, referred to "modern, or unBayesian, statistical theory" ("The Theory of Statistical Decision," Journal of the American Statistical Association, 46, p. 58.). Soon after, however, Savage changed from being an unBayesian to being a Bayesian. While the 1960s would bring a new enthusiasm for inverse probability, this did not extend to the name. For Jeffreys (p. 29) "the chief rule involved in the process of learning from experience." was "the principle of inverse probability, first given by Bayes." But the term "inverse probability" fell out of use and "Bayesian" started to appear: new works had titles like Introduction to Probability and Statistics from a Bayesian View point (D. V. Lindley, 1965).
Empirical Bayes. The term and the method are due to H. Robbins, “An Empirical Bayes Approach to Statistics,” Proceeding of the Third Berkeley Symposium on Mathematical Statistics, volume 1, (1956), 157-163. (David (1998).)
Bayes Factor appears in I. J. Good's 1958 "Significance Tests in Parallel and in Series," Journal of the American Statistical Association, 53, 799-813. Previously in his Probability and the Weighing of Evidence (1950) Good had used the term "factor" explaining that "Dr. A. M. Turing suggested in a conversation in 1940 that the word 'factor' should be regarded as a technical term ... and that it could be more fully described as the factor in favour of the hypothesis H in virtue of the result of the experiment." Jeffreys had introduced this factor, denoting it by K but not giving it a name. (Theory of Probability (1939, chapter V.))
See also CLASSICAL STATISTICAL INFERENCE, DECISION THEORY, INVERSE PROBABILITY, LIKELIHOOD, POSTERIOR & PRIOR, PRINCIPLE OF INDIFFERENCE, PROBABILITY and RULE OF SUCCESSION.
[This entry was contributed by John Aldrich, based on Dale (op. cit.), Hald (1998), David (1995) and David (2000). For further information see Stephen. E. Fienberg When did Bayesian Inference become "Bayesian"? Bayesian Analysis (2006).]
BEHRENS-FISHER DISTRIBUTION, PROBLEM and TEST. In 1929 W. V. Behrens (1902-1962) published a significance test for the difference between means of random samples from two Normal populations with unequal variances: Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen, Landwirtschaftliche Jahrbücher, 68, 807-837. Behrens told R. A. Fisher about his work—see p. 53 of J. H. Bennett Statistical Inference and Analysis: Selected Correspondence of R. A. Fisher (1990)—but then played no further part in its story. In 1935 Fisher began writing about Behrens’s test as an application of his new theory of fiducial inference (see The Fiducial Argument in Statistical Inference.) Both the general theory and the Behrens application aroused controversy. The label “Behrens-Fisher” entered circulation around 1940: M. S. Bartlett referred to the test in “Complete Simultaneous Fiducial Distributions,” Annals of Mathematical Statistics, 10, (1939), 129-138, Harold Jeffreys to the formula in “Note on the Behrens-Fisher formula,” Annals of Eugenics, 10, (1940), 48-51 and Henry Scheffé to the problem in “On Solutions of the Behrens-Fisher Problem, Based on the t-Distribution,” Annals of Mathematical Statistics, 14, (1943), 35-44. This entry was contributed by John Aldrich. See also FIDUCIAL PROBABILITY>.
BELL-SHAPED. Bell-shaped is found in 1785 in Planting and Ornamental Gardening: A Practical Treatise by William Marshall. He refers to bell-shaped flowers.
BELL-SHAPED PARABOLA appears in 1857 in Mathematical Dictionary and Cyclopedia of Mathematical Science. The equation is ay2 - x2 + bx2 = 0.
BELL-SHAPED and BELL CURVE as descriptions of the NORMAL or GAUSSIAN density. It has become a cliché to describe the graph of the normal density as "bell-shaped". This is a relatively recent phenomenon, given that the distribution was first studied in the 1730s and was used throughout the 19th and early 20th centuries in the theory of errors, the theory of gases and statistical theory.
"La surface S, en forme de cloche" appears in Esprit Pascal Jouffret‘s "Etude sur l’effet utile du tir" (1872) as a description of the bivariate normal density with independent components. (Kruskal & Stigler’s "Normative Terminology" (1997 and reprinted in Stigler (1999)). However, Jouffret and the bell surface made no permanent impression and neither is mentioned in the chapter on the bivariate normal distribution (ch. IX "Erreurs de situation d’un point") in Bertrand’s Calcul des Probabilités (1889).
"Bell-shaped curve" is found in Francis Galton’s Catalogue of the Special Loan Collection of Scientific Apparatus at the South Kensington Museum (1876). (David (1998)) However, Galton did not use this description in his books and articles which made so much of the normal distribution. Nor did other principals in the English statistical tradition, Karl Pearson, Yule and Fisher. F. Y. Edgeworth was the only statistician of that era to regularly use a visual analogy. This was the gend’arme’s hat, which he attributed to "a lively French statistician"; see "The Statistics of Examinations," Journal of the Royal Statistical Society, 51, (1888), p. 600. The hat is flatter than a bell, but, hat or bell, it is only a matter of scaling.
A JSTOR search of early 20th century articles finds many occurrences of "bell-shaped" as a description of the normal curve but there is no identifiable authoritative source for the analogy. The analogy acquired authority when it appeared in textbooks: in his Introduction to Mathematical Probability (1937) J. V. Uspensky writes, "the probability curve has a bell-shaped form" and in An Introduction to Probability Theory and its Applications (1950, p. 129) W. Feller writes that the graph of the density "is the symmetric, bell-shaped curve shown in figure 1."
THE bell curve, with its implication that there is only one bell-shaped curve, does not come naturally to statisticians or probabilists and I could find no use of this term in any of the statistics or mathematics journals on JSTOR. However the term has become very common outside the professional literature of probability.
Bell curve is found in 1938 in Public Personnel Problems from the Standpoint of the Operating Officer by Lewis Meriam: "Within recent years a tendency has developed to require that the distribution of efficiency ratings shall conform to the normal frequency or bell curve of distribution." [Fred R. Shapiro]
A JSTOR search found, "we had fewer C’s than the normal bell curve indicates and more B’s" in W. E. Aiken & P. D. Carleton "Freshman English at the University of Vermont," College English, 3, (1941), p. 281. The phrase "an almost perfect Bell Curve" appears in D. T. Sisto "Aural Comprehension in Spanish," Modern Language Journal, 41 (1957), p. 30. The bell curve became more common in the following decades but it really took off in the 90s with such widely discussed works as Richard J. Herrnstein’s The Bell Curve: Intelligence and Class Structure in American Life (1994).
The three preceding entries were contributed by John Aldrich. See the entries CENTRAL LIMIT THEOREM, ERROR, GAUSSIAN and NORMAL and also Symbols associated with the Normal distribution.
The BERNOULLI family was one of the wonders of European mathematics in the 17th and 18th centuries. Eight Bernoullis have MacTutor biographies and there are many Bernoulli references on this site: to see them, use the Search on the front page. Over the centuries Bernoulli’s law, Bernoulli’s theorem, Bernoulli’s principle, etc. have been used in many different ways. The eponymous terms in use today seem to refer mainly to the work of Jakob (Jacques, James), in particular to his Ars Conjectandi (published 1713), and to the work of his nephew Daniel.
BERNOULLI DISTRIBUTION and BERNOULLI RANDOM VARIABLE. In the past the Bernoulli distribution often referred to what is now generally called the BINOMIAL DISTRIBUTION. Thus H. Cramér Random Variables and Probability Distributions (1937, p. 43) refers to "a Binomial or Bernoulli distribution". Aurel Wintner had in mind a different random variable and possibly a different Bernoulli--Daniel not Jakob--when he discussed the "symmetric Bernoulli distribution" in "On Analytic Convolutions of Bernoulli Distributions," American Journal of Mathematics, 56, (1934), p. 662. This distribution has values ±a with equal probability.
Since the 1960s the random variable, 1 with probability p and 0 with probability (1-p), has been prominent in the literature. This has been called the INDICATOR RANDOM VARIABLE and also the Bernoulli random variable. The second term appears in Allan Birnbaum "On the Foundations of Statistical Inference: Binary Experiments," Annals of Mathematical Statistics, 32, (1961), 414-435. It was a natural choice given the established term BERNOULLI TRIAL.
BERNOULLI’S EQUATION in ordinary differential equations. Kline (p. 474) gives the history of this equation as follows: Jakob Bernoulli proposed the problem of solving the equation in the Acta Eruditorum of 1695; in 1697 Leibniz showed it could be reduced to a linear equation by a change of variable; John Bernoulli gave another method. In the Acta of 1696 Jakob solved it essentially by separation of variables.
BERNOULLI’S EQUATION (or Bernoulli’s theorem) in hydrodynamics was introduced by Daniel Bernoulli in his work Hydrodynamica (1738). It is one of Michael Guillen’s Five Equations that Changed the World (1995).
BERNOULLI NUMBERS. These were first discussed by Jakob Bernoulli in the Ars Conjectandi (published 1713). See Hald (1990, section 15.4).
In The Doctrine of Chances (3rd edition 1733) Abraham de Moivre referred to them as "the numbers of Mr. James Bernoulli in his excellent Theorem for the Summing of Powers," quoted in I. Todhunter A History of the Mathematical Theory of Probability (1865, p 152).
According to Cajori (vol. 2, page 42), Leonhard Euler introduced the name "Bernoullian numbers" in 1769 in the title of his "De summis serierum numeros Bernoullianos involventium."
BERNOULLI’S THEOREM was once the usual name for the first version of the LAW OF LARGE NUMBERS, proved by Jacob Bernoulli in Ars Conjectandi (1713). See e.g. Todhunter’s A History of the Mathematical Theory of Probability (1865, p. 71).
BERNOULLI TRIAL is dated 1951 in MWCD10, although James A. Landau has found the phrases "Bernoullian trials" and "Bernoullian series of trials" in 1937 in Introduction to Mathematical Probability by J. V. Uspensky. The reference is to Jakob Bernoulli’s Ars Conjectandi.
See BINOMIAL DISTRIBUTION.
BERTRAND’S PARADOX in probability theory. Four situations are presented in the first chapter of Joseph Bertrand’s Calcul des probabilités (1889) and any one could be referred to as “Bertrand’s paradox.” The most discussed, a paradox in geometric probability, appears on pp. 4-5; see the entry in the Encyclopedia of Mathematics. This problem was treated by J. H. Poincaré in the section “Paradoxe de J. Bertrand” of his Calcul des probabilités ch. VII p. 118. The problem Bertrand treats on pp. 2-3 of the Probabilités is sometimes referred to as Bertrand’s box paradox. It has been re-invented several times and is best known today as the MONTY HALL PROBLEM.
BERTRAND’S POSTULATE in number theory. In his “Mémoire sur le nombre de valeurs que peut prendre une fonction quand on y permute les lettres qu’elle renferme,” Journal de l’Ecole Polytechnique, 18, (1845), 123-140 Joseph Bertrand made a claim about the distribution of prime numbers based on the numbers he had examined. Chebyshev published a proof of “le postulatum de M. Bertrand” in 1854 in his “Mémoire sur les nombres premiers,” reprinted in Oeuvres I, p. 50. See the entry in MathWorld.
BESSEL EQUATION and BESSEL FUNCTION are named for Wilhelm Bessel who made the first systematic study of them in his “Untersuchung des Thiels der planetarischen Storungen, welcher aus der Bewegung der Sonne entsteht,” Abh. d. K. Akad. Wiss. Berlin 1824 (published 1826) 1–52. Abhandlungen 1, p. 84. See the Encyclopedia of Mathematics entries Bessel equation and Bessel functions and Peter Colwell “Bessel Functions and Kepler’s Equation,” American Mathematical Monthly, 99, (1992), 45-48.
Franceschetti (p. 56) implies that the term Bessel function (in German) was introduced by Oskar Xavier Schlömilch in 1854.
Bessel'schen Functionen appears in 1868 in the title Studien über die Bessel'schen Functionen by Eugen Lommel here.
Philosophical Magazine in 1872 has “The value of Bessel's functions is becoming generally recognized” (OED2).
Bessel function appears in 1894 in Ann. Math. IX. 27 in the heading “Roots of the Second Bessel Function” (OED2).
BETA DISTRIBUTION. Distribuzione β is found in 1911 in C. Gini, "Considerazioni Sulle Probabilità Posteriori e Applicazioni al Rapporto dei Sessi Nelle Nascite Umane," Studi Economico-Giuridici della Università de Cagliari, Anno III, 5-41 (David, 1998).
The distribution has a very long history. The "problem in the doctrine of chances" that Bayes treated produced a beta distribution for the posterior density of the probability of a success in Bernoulli trials. In the early 20th century English literature it was usual to refer to the distribution by its designation in the Pearson family of curves. (see Pearson curves entry) However the new text-books of the 1940s did not favour the Pearson classification and the beta designation has become standard: see e.g. C. E. Weatherburn's A First Course in Mathematical Statistics, (1946).
See BAYES and GAMMA DISTRIBUTION.
BETA and GAMMA FUNCTIONS. These terms derive from the symbols B and Γ used to denote the functions that Adrien Marie Legendre (1752-1833) called the Eulerian integral of the first kind and second kind. Legendre introduced the symbol Γ and Binet introduced the symbol B. See EULERIAN INTEGRAL and Earliest use of function symbols.
According to Klein (p. 423), Euler’s research on the functions, published in 1731 and 1771, grew out of earlier work by Wallis published in his Arithmetica infinitorum of 1656. A separate development led to the incomplete B-function. This was Bayes’s (1763) solution of a "problem in the doctrine of chances." See the BETA DISTRIBUTION and BAYES.
The term BETTI NUMBER was coined by Henri Poincaré (1854-1912) and named for Enrico Betti (1823-1892), according to a history note by Victor Katz in A First Course in Abstract Algebra by John B. Fraleigh. "Les nombres de Betti" appear in Poincaré’s "Analysis Situs," Journal de l’École Polytechnique, 1, (1892) 1-121 and in a short communication, which is available on-line, "Sur L’Analysis Situs," Comptes Rendus, 115, (1892), 633-636.
BETWEENNESS. The term had been used earlier in philosophy and psychology (see the OED) but it was first used in connection with geometry by G. B. Halsted in his paper, “The Betweenness Assumptions,” American Mathematical Monthly, 9, (1902), 98-101.
“The betweenness assumptions” was Halsted’s term for what Hilbert had called the Axiome der Anordnung (Axioms of Arrangement) in his Grundlagen der Geometrie (1899). The explanation of the names is given in a passage from Hilbert that Halsted (p. 99) translates as follows: “The axioms of this group define the idea of ‘between,’ and make possible on the basis of this idea the arrangement of the points on a straight, in a plane and in space.”
The term BEZOUTIANT was coined by Sylvester. It is found in J. J. Sylvester, "On a Theory of the Syzygetic Relations of Two Rational Integral Functions, Comprising an Application to the Theory of Sturm's Functions, and That of the Greatest Algebraical Common Measure," Philosophical Transactions of the Royal Society of London, 143, (1853), 407-548: "This quadratic function, which plays a great part in the last section and in the theory of real roots, I term the Bezoutiant; it may be regarded as a species of generating function." [JSTOR search]
BIASED and UNBIASED. Biased errors and unbiased errors (meaning "errors with zero expectation") are found in 1897 in A. L. Bowley, "Relations Between the Accuracy of an Average and That of Its Constituent Parts," Journal of the Royal Statistical Society, 60, 855-866 (David, 1995).
Biased sample is found in 1911 An Introduction to the theory of Statistics by G. U. Yule: "Any sample, taken in the way supposed, is likely to be definitely biassed, in the sense that it will not tend to include, even in the long run, equal proportions of the A’s and [alpha]'s in the original material" (OED2).
Biased sampling is found in F. Yates, "Some examples of biassed sampling," Ann. Eugen. 6 (1935) [James A. Landau].
See also ESTIMATION.
The term BICURSAL was introduced by Cayley (Kline, page 938).
In 1873 Cayley wrote, "A curve of deficiency 1 may be termed bicursal."
BIJECTION. See the entry INJECTION, SURJECTION and BIJECTION.
BILLION. See MILLION.
BIMODAL is found in April 1901 in "A Quantitative Study of Variation in the Smaller North-American Shrikes" by R. M. Strong in The American Naturalist.
BINARY ARITHMETIC appears in English in 1796 A Mathematical and Philosophical Dictionary (OED2).
BINOMIAL. According to the OED2, the Latin word binomius was in use in algebra in the 16th century.
Binomial first appears as a noun in English in its modern mathematical sense in 1557 in The Whetstone of Witte by Robert Recorde: "The nombers that be compound with + be called Bimedialles... If their partes be of 2 denominations, then thei named Binomialles properly. Howbeit many vse to call Binomialles all compounde nombers that have +" (OED2).
BINOMIAL COEFFICIENT. According to Kline (page 272), this term was introduced by Michael Stifel (1487-1567) about 1544. However, Julio González Cabillón believes this information is incorrect. He says Stifel could not have used the word coefficient, which is due to Vieta (1540-1603).
Binomial coefficient is found in Rottock, "Ueber Reihen mit Binomialcoefficienten und Potenzen," Pr. d. G. Rendsburg (1868).
Binomial coefficient is found in English in an 1868 paper by Arthur Cayley [University of Michigan Historical Math Collection].
BINOMIAL DISTRIBUTION is found in 1911 in An Introduction to the Theory of Statistics (p. 305) by G. U. Yule: "The binomial distribution,..only becomes approximately normal when n is large, and this limitation must be remembered in applying the table..to cases in which the distribution is strictly binomial" (OED2). Fisher adopted it in section 18 of his Statistical Methods for Research Workers (1925).
The name is relatively new but the distribution has been studied since it was obtained by Jakob (Jacques, James) Bernoulli (1654-1705) in Ars Conjectandi (1713) Part 1. Earlier names included binomial law.
See BERNOULLI TRIAL.
BINOMIAL THEOREM appears in 1742 in Treatise of Fluxions by Colin Maclaurin (Struik, page 339).
In Gilbert and Sullivan's The Pirates of Penzance (1879), the song "I Am The Very Model of a Modern Major-General" includes the lines:
I'm very well acquainted, too, with matters mathematical,
I understand equations, both the simple and quadratical,
About binomial theorem I'm teeming with a lot o' news,
With many cheerful facts about the square of the hypotenuse. [...]
I'm very good at integral and differential calculus;
I know the scientific names of beings animalculous:
BINORMAL. Binormale was used by Barré de Saint-Venant in a paper "Mémoire sur les lignes courbes non planes" which was presented to l'Académie des Sciences on 16 September 1844 and published in Journal de L'école Royale Polytechnique in 1845. Barré de Saint-Venant also used binormale in Tableau de formules de la Théorie des Courbes dans l'Espace, which also appeared in 1845.
In the former work he defines binormale as follows, in English translation: “Binormal, those of the normals which are perpendicular to the oscillating plane. This line, which has not been given a name, is, in effect, normal to two consecutive elements at the same time, whereas the other normals to the curve are but single elements.”
In the above, “oscillating” apparently should be “osculating.”
The French original: “Binormale, celle des normales qui est perpendiculaire au plan oscillateur. Cette ligne, que l'on est obligé de considérer très-souvent aussi, et à laquelle il n'a pas encore été donné de nom, est, en effet, normale à deux éléments consécutifs à la fois, tandis que les autres normales à la courbe ne le sont qu'à un seul de ses éléments.”
According to Howard Eves in A Survey of Geometry, vol II (1965), “The name binormal was introduced by B. de Saint-Venant in 1845.”
This entry was contributed by James A. Landau.
The term BIOMATHEMATICS was coined by William Moses Feldman (1880-1939), according to Garry J. Tee in "William Moses Feldman: Historian of Rabbinical Mathematics and Astronomy." The term appears in Feldman's textbook Biomathematics published in 1923.
The word BIOMETRY had been used occasionally before 1901 but in that year a new journal, Biometrika, appeared. Francis Galton (1822-1911) wrote the lead article, "Biometry": "The primary object of Biometry is to afford material that shall be exact enough for the discovery of incipient changes in evolution which are too small to be otherwise apparent." (1. p. 9) (OED2) Galton's associates in founding the journal and in establishing biometry were the mathematician Karl Pearson (1857-1936) and the zoologist W. F. R. Weldon (1860-1906). Pearson and Weldon had been doing biometric research for about ten years while Galton's efforts went back more than thirty. For further information see Stephen M. Stigler "The Problematic Unity of Biometrics," Biometrics 56, (2000), p. 653. [John Aldrich]
See POPULATION and REGRESSION.
BIOSTATISTICS has been a popular title for books and courses in the last few decades but the term appeared in the 19th century: there is an entry in 1868 in A Dictionary of Medical Science. [Google print search] and another in the 1890 edition of Webster. In the early decades of the twentieth century the term was most associated with the activities of the Department of Biostatistics, School of Hygiene and Public Health, Johns Hopkins University. The term overlaps with medical statistics, VITAL STATISTICS and BIOMETRY. See also STATISTICS.
BIPARTITE. In 1858, Cayley referred to "bipartite binary quantics."
BIPARTITE CURVE appears in 1879 in George Salmon (1819-1904), Higher Plane Curves (ed. 3): "We shall then call the curve we have been considering a bipartite curve, as consisting of two distinct continuous series of points" (OED2).
BIQUATERNION. Hamilton used the term biquaternion in the sense of a quaternion with complex coefficients.
In the more recent sense, William Kingdon Clifford (1845-1879) coined the term. It appears in 1873 in Proc. London Math. Soc. IV. 386.
BISECT. According to the OED2, bisect is apparently of English formation. The word is dated ca. 1645 in MWCD10.
Bisection appears in 1656 in a translation of Hobbes's Elem. Philos. (1839) 307: "By perpetual bisection of an angle" (OED2).
In 1660, Barrow's translation of Euclid's Elements has "To bisect a right line."
Bisector appears in English in 1864 in The Reader 5 Oct. 483/2: "The internal and external bisectors of the angle" (OED2).
BIT was coined by John W. Tukey (1915-2000).
According to Niels Ole Finnemann in Thought, Sign and Machine, Chapter 6, "After some more informal contacts during the first war years, on the initiative of mathematician Norbert Wiener, a number of scientists gathered in the winter of 1943-44 at a seminar, where Wiener himself tried out his ideas for describing intentional systems as based on feedback mechanisms. On the same occasion J. W. Tukey introduced the term a 'bit' (binary digit) for the smallest informational unit, corresponding to the idea of a quantity of information as a quantity of yes-or-no answers."
Several Internet web pages say Tukey coined the term in 1946. Another web page says, "Tukey records that it evolved over a lunch table as a handier alternative to 'bigit' or 'binit.'"
Bit first appeared in print in July 1948 in "The Mathematical Theory of Communication" by Claude Elwood Shannon (1916-2001) in the Bell Systems Technical Journal. In the article, Shannon credited Tukey with the coinage [West Addison assisted with this entry.]
BIVARIATE (in Statistics) is found in 1920 in Karl Pearson “Notes on the History of Correlation,” Biometrika, 13, p. 37: “Thus in 1885 Galton had completed the theory of bi-variate normal correlation” (OED). The word was soon being written without a hyphen: thus James Henderson “On Expansions in Tetrachoric Functions,” Biometrika, 14, (1922), p. 157 writes of the “normal bivariate frequency surface.”
See MULTIVARIATE, N-VARIATE, TRIVARIATE and UNIVARIATE.
BLACK-SCHOLES FORMULA refers to a formula for the pricing of derivatives on the assumption that stock prices follow a geometric random walk in Fischer Black and Myron Scholes "The Pricing of Options and Corporate Liabilities," Journal of Political Economy, 81, (1973), 637-654. The name "Black-Scholes formula" came into use more or less immediately. Scholes received the Nobel Prize for Economic Sciences in 1997; Black had died in 1995.
BLOCK and RANDOMIZED BLOCK in experimental design. R.A. Fisher introduced the term block in chapter VIII, section 48, Technique of Plot Experimentation, of his Statistical Methods for Research Workers (1925). He also described the technique of randomized blocks, though he did not use the term. The term randomized block appears in his "The Arrangement of Field Experiments", Journal of the Ministry of Agriculture of Great Britain, 33, (1926) p. 509. (David 2001)
BONFERRONI INEQUALITIES. According to the St. Andrews website Carlo Emilio Bonferroni (1892-1960) published some probability inequalities in 1935-6. They were referred to as Bonferroni’s inequalities by W. Feller in his An Introduction to Probability Theory and its Applications volume 1 (p. 75, 1950). (David 2001)
See BOOLE’S INEQUALITY.
BOOLEAN is found in 1851 in the Cambridge and Dublin Mathematical Journal vi. 192: "...the Hessian, or as it ought to be termed, the first Boolian Determinant" (OED2).
BOOLEAN ALGEBRA. Boolian algebra appears in the Century Dictionary (1889-1897):
Boolian algebra, a logical algebra, invented by the English mathematician George Boole (1815-64), for the solution of problems in ordinary logic. It has also a connection with the theory of probabilities.According to E. V. Hutington in "New Sets of Independent Postulates for the Algebra of Logic with Special Reference to Whitehead and Russell's Principia Mathematica," Trans. Amer. Math. Soc. (1933), the term Boolean algebra was introduced by H. M. Sheffer in the paper "A Set of Five Independent Postulates for Boolean Algebras with Application to Logical Constants", Trans. Amer. Math. Soc., 14 (1913).
In an illuminating passage of "Algebraic Logic", Halmos writes (p. 11):
Terminological purists sometimes object to the Boolean use of the word "algebra". The objection is not really cogent. In the first place, the theory of Boolean algebras has not yet collided, and it is not likely to collide, with the theory of linear algebras. In the second place, a collision would not be catastrophic; a Boolean algebra is, after all, a linear algebra over the field of integers modulo 2. (...) While, to be sure, a shorter and more suggestive term than "Boolean algebra" might be desirable, the nomenclature is so thoroughly established that to change now would do more harm than good.[Carlos César de Araújo]
BOOLE’S INEQUALITY. This probability inequality appears in George Boole’s An Investigation into the Laws of Thought, on which are founded the Mathematical Theories of Logic and Probabilities (1854). In the 1930s Bonferroni devised a system of inequalities in which the Boole inequality is the simplest. As it is the most widely used of the Bonferroni inequalities, it is often referred to as "the Bonferroni inequality" even though Bonferroni clearly attributed it to Boole. (Based on "George Boole" in Statisticians of the Centuries (ed. C. C. Heyde and E. Seneta) 2001.)
See BONFERRONI INEQUALITIES.
To BOOT a computer. Although the OED’s earliest quotation is from 1980, the term has been in use since the early 1950s. The term seems to derive from the phrase "to pull oneself up by one's own bootstrap" which had been in circulation since 19th century. Michael Quinion suggests that the computer people picked up the phrase from Robert Heinlein’s By His Bootstraps, a 1941 short story about time-travel. See the next entry.
BOOTSTRAP in Statistics. The term was introduced by Bradley Efron in "Bootstrap methods: another look at the jackknife," Annals of Statistics, 7, (1979) 1-26. Tukey’s "jackknife" had set a precedent for "colorful" terminology and Efron reported some suggestions for his construct: "Swiss Army Knife, Meat Axe, Swan-Dive, Jack Rabbit and my personal favorite, the Shotgun, which to paraphrase Tukey, 'can blow the head off any problem if the statistician can stand the resulting mess.'" In his book An Introduction to the Bootstrap (with R. J. Tibshirani) (1993) Efron explained that "the use of the term bootstrap derives from the phrase to pull oneself up by one's own bootstrap, widely thought to be based on one of the eighteenth century Adventures of Baron Munchausen, by Rudolph Erich Raspe. (The Baron had fallen to the bottom of a deep lake. Just when it looked like all was lost, he thought to pick himself up by his own bootstraps.)" The words "widely thought" seem to be well chosen for Michael Quinion argues that, while the phrase which dates from the 19th century, was probably inspired by Raspe’s story, the exact incident is not in the book! [John Aldrich]
BOREL-CANTELLI LEMMAS. These were given in a simple case by E. Borel in 1909 “Les probabilités dénombrables et leurs applications arithmétiques,” Rendiconti del Circolo Matematico di Palermo, 27, 247-271 and more generally in 1917 by F. P. Cantelli in “Sulla probabilità comme limite di frequenza,” Rendiconti della R. Accademia dei Lincei, vol. XXVI, serie V, gennaio, p. 39-45.
BORROW is found in English in 1594 in Blundevil, Exerc.: "Take 6 out of nothing, which will not bee, wherefore you must borrow 60" (OED2).
In October 1947, "Provision for Individual Differences in High School Mathematics Courses" by William Lee in The Mathematics Teacher has: "The Social Mathematics course stresses understanding of arithmetic: 'carrying' in addition, 'regrouping' (not 'borrowing') in subtraction, 'indenting' in multiplication are analyzed and understood rather than remaining mere rote operations to be performed blindly."
BORSUK-ULAM THEOREM. The story of this result is told by Steinhaus in a note published in 1938. "Several years ago Mr. Ulam conjectured the following theorem: if a sphere is mapped continuously into a plane set, there is at least one pair of antipodal points having the same image, that is, they are mapped into the same point of the plane. This was proved by Mr. Borsuk in 1933 (Drei Sätze über die n-dimensionale euklidische Sphäre Fundamentae Mathematicae, XX, p. 177) extending the theorem to n dimensions."In the same note Steinhaus gave the following illustration, "at any moment, there are two antipodal points on the Earth’s surface that have the same temperature and the same atmospheric pressure."
See the entry HAM SANDWICH THEOREM for the Steinhaus note.
The term BOX-COX TRANSFORMATION is unusual in being inspired—indirectly—by a comic opera. Box and Cox with a libretto by Francis Cowley Burnand was Arthur Sullivan's first. It tells of a landlord's scheme to get double rent from a single room: by day he lets it to Mr. Box (a printer who is out all night) and by night to Mr. Cox (a hatter who works all day). The statisticians G. E. P. Box and D. R. Cox were serving on a committee and the other members thought they should write a paper together. "We said, 'Well, obviously the thing to write about is transformations'" recalled Box to M. H. DeGroot in "A Conversation with George Box," Statistical Science, 2, (1987), p. 254. The resulting Box and Cox paper is "An Analysis of Transformations," Journal of Royal Statistical Society, Series B, 26, (1964), pp. 211-–246.
BOX-JENKINS APPROACH, METHODS etc. are terms referring to the form of time series analysis presented in the 1970 book Time Series Analysis: Forecasting and Control, by George Box and Gwilym Jenkins. The book “is concerned with the building of stochastic (statistical) models for discrete time series in the time-domain and the use of such models in important areas of application.” (Preface.)
BOYER'S LAW. See EPONYMY.
The term BRACHISTOCHRONE was introduced by Johann Bernoulli (1667-1748). Smith (vol. 2, page 326) says the term is "due to the Bernoullis."
BRANCHING PROCESS. The term seems to have been introduced by A. N. Kolmogorov and N. A. Dimitriev in 1947 ("Branching Stochastic Processes," Doklady Akademii Nauk, USSR, 56, 5-8). However, there were many investigations of such processes earlier in the century and even in the 19th century. The French mathematician I. J. Bienaymé studied the process in 1845! His "De la Loi de Multiplication et de la Durée des Familles" is reprinted in Kendall (1975).
Bienaymé's contribution was overlooked until recently but another investigation was more visible. Francis Galton was also interested in the extinction of surnames:
In each generation a0, per cent. of the adult males have no male children who reach adult life; a1 have one such male child; a2 have two; and so on up to a5 who have five. Find (1) what proportion of the surnames will have become extinct after r generations; and (2) how many instances there will be of the same surname being held by m persons.Galton's friend, H. W. Watson, tackled these questions in On the Probability of Extinction of Families. (1874). The name "Galton-Watson process" recalls their work.
The process re-appeared in other contexts, e.g. in the genetic work of R. A. Fisher On the Dominance Ratio. (1922) and J. B. S. Haldane.
[John Aldrich, based on D. G. Kendall "The Genealogy of Genealogy: Branching Processes before (and after) 1873" Bulletin of the London Mathematical Society, 7, (1975), 225-253 and C. C. Heyde & E. Seneta I. J. Bienaymé: Statistical Theory Anticipated, 1977.]
The terms BRA VECTOR and KET VECTOR were introduced by Paul Adrien Maurice Dirac (1902-1984). The terms appear in 1947 in Princ. Quantum Mech. by Dirac: "It is desirable to have a special name for describing the vectors which are connected with the states of a system in quantum mechanics, whether they are in a space of a finite or an infinite number of dimensions. We shall call them ket vectors, or simply kets, and denote a general one of them by a special symbol >|. ... We shall call the new vectors bra vectors, or simply bras, and denote a general one of them by the symbol <|, the mirror image of the symbol for a ket vector" (OED2).
BRIGGSIAN LOGARITHM, referring to the COMMON LOGARITHM (logarithm to base 10), is named after Henry Briggs. The phrase Briggs logarithm is found in the 1771 edition of the Encyclopædia Britannica [James A. Landau]. See the entries COMMON LOGARITHM, LOGARITHM, NAPIERIAN LOGARITHM, NATURAL LOGARITHM.
BROKEN LINE in found in 1852 in a French edition (edited by M. A. Blanchet) of Éléments de géométrie by Adrien Marie Legendre: "Une ligne brisée ou polygonale est une ligne composée de lignes droites." (A broken line or polygonal line is a line composed of straight lines.) The term may appear in the original 1794 edition, which I have not seen.
Broken line is found in 1852 in Elements of geometry and trigonometry, from the works of A. M. Legendre. Revised and adapted to the course of mathematical instruction in the United States, by Charles Davies: "5. A Straight Line is one which lies in the same direction between any two of its points. 6. A Broken Line is one made up of straight lines, not lying in the same direction."
Broken line is found in 1852 in Elements of the differential and integral calculus by Charles Davies: "But the arc POM can never be less than the chord PM, nor greater than the broken line PNM which contains it; hence, the limit of the ratio POM/PM = 1; and consequently, the differential of the arc is equal to the differential of the chord."
Broken line is found in 1852 in Elements of plane trigonometry, with its application to mensuration of heights and distances, surveying and navigation by William Smyth: "Instead of a broken line, a field is sometimes bounded by a line irregularly curves, as by the margin of a brook, river, or lake. In this case (fig. 60) we run, as before, a chain line as near the boundary as possible, and by means of offsets determine a sufficient number of points in the curve to draw it." [These three citations were found using the University of Michigan Historic Math Collection.]
According to Schwartzman (page 38), the "broken line," meaning a curve composed of connected straight line segments, was adopted "around 1898" by David Hilbert (1862-1943).
BROUWER’S FIXED-POINT THEOREM. This appears in L. E. J. Brouwer’s “Ueber eineindeutige, stetige Transformationen von Flächen in sich” Math. Ann., 69 (1910) pp. 176–180. A JSTOR search found a reference to J. W. Alexander’s “Note on Brouwer’s fixed point theorem” of 1924. See the Encyclopedia of Mathematics entry.
BROWNIAN MOTION. In the course of the 20th century the physical phenomenon described by the botanist Robert Brown in 1827 was described in mathematical terms and gradually "Brownian motion" came to refer as much to the mathematical formalism as to the phenomenon. Mathematical theories were developed by, inter alia, A. Einstein ("Zur Theorie der Brownschen Bewegung" (1905)). The "Brownian motion process" of J. L. Doob's Stochastic Processes (1954) is a type of stochastic process divested of physical application. Doob states that the process "was first discussed by Bachelier [Théorie de la Speculation 1900] and later, more rigorously by Wiener ["Differential-space" J. Math. and Phys. 2 (1923) 131-174]. It is sometimes called the Wiener process." An earlier term in physics (and mathematics) was "Brownian movement." This slowly gave way to "Brownian motion," although David (2001) reports an early appearance of "Brownian motion" in 1892 in W. Ramsay's Report of a paper read to the Chemical Society, London. Nature, 45, 429/2. [John Aldrich]
See FOKKER-PLANCK EQUATION and WIENER PROCESS
BRUN’S CONSTANT (named for Viggo Brun (1882-1978)) was coined by R. P. Brent in "Irregularities in the distribution of primes and twin primes," Math. Comp. 29 (1975), according to Algorithmic Number Theory by Bach and Shallit [Paul Pollack].
BUILDING. See the entry APARTMENT, BUILDING and CHAMBER.
BURALI-FORTI PARADOX is now famous as the earliest paradox of set theory. It refers to a result in Cesaro Burali-Forti's paper, "Una questione sui numeri transfiniti" Rendiconti di Matematico di Palermo, 11, (1897), 154-164 (translated in Heijenoort (1967)). Heijenoort comments that "Burali-Forti himself considered the contradiction as establishing, by reductio ad absurdum, the result that the natural ordering of ordinals is just a partial ordering." Bertrand Russell called the result "le paradoxe du Burali-Forti" in his Les Paradoxes de la Logique Revue de métaphysique et de morale (1906) p. 638.
BURNSIDE PROBLEM. In 1902 William Burnside wrote, "A still undecided point in the theory of discontinuous groups is whether the group order of a group may be not finite, while the order of every operation it contains is finite." "On an Unsettled Question in the Theory of Discontinuous Groups." Quart. J. Pure Appl. Math. 33, 230-238, 1902. For the history of the problem see St. Andrews and Mathworld.
The term BYTE was coined in 1956 by Dr. Werner Buchholz of IBM. A question-and-answer session at an ACM conference on the history of programming languages included this exchange:
JOHN GOODENOUGH: You mentioned that the term "byte" is used in JOVIAL. Where did the term come from?
JULES SCHWARTZ (inventor of JOVIAL): As I recall, the AN/FSQ-31, a totally different computer than the 709, was byte oriented. I don't recall for sure, but I'm reasonably certain the description of that computer included the word "byte," and we used it.
FRED BROOKS: May I speak to that? Werner Buchholz coined the word as part of the definition of STRETCH, and the AN/FSQ-31 picked it up from STRETCH, but Werner is very definitely the author of that word.
SCHWARTZ: That's right. Thank you.