Metamath Proof Explorer

Theorem conventions

Description:

Here are some of the conventions we use in the Metamath Proof Explorer (MPE, set.mm), and how they correspond to typical textbook language (skipping the many cases where they are identical). For more specific conventions, see:

conventions-labels : conventions related to labels
conventions-comments : conventions related to comments

Notation. Where possible, the notation attempts to conform to modern conventions, with variations due to our choice of the axiom system or to make proofs shorter. However, our notation is strictly sequential (left-to-right). For example, summation is written in the form sum_ k e. A B ( df-sum ) which denotes that index variable k ranges over A when evaluating B . Thus, sum_ k e. NN ( 1 / ( 2 ^ k ) ) = 1 means 1/2 + 1/4 + 1/8 + ... = 1 ( geoihalfsum ). The notation is usually explained in more detail when first introduced.

Axiomatic assertions ($a). All axiomatic assertions ($a statements) starting with " |- " have labels starting with "ax-" (axioms) or "df-" (definitions). A statement with a label starting with "ax-" corresponds to what is traditionally called an axiom. A statement with a label starting with "df-" introduces new symbols or a new relationship among symbols that can be eliminated; they always extend the definition of a wff or class. Metamath blindly treats $a statements as new given facts but does not try to justify them. The mmj2 program will justify the definitions as sound as discussed below, except for four of them ( df-bi , df-clab , df-cleq , df-clel ) that require a more complex metalogical justification by hand.

Proven axioms. In some cases we wish to treat an expression as an axiom in later theorems, even though it can be proved. For example, we derive the postulates or axioms of complex arithmetic as theorems of ZFC set theory. For convenience, after deriving the postulates, we reintroduce them as new axioms on top of set theory. This lets us easily identify which axioms are needed for a particular complex number proof, without the obfuscation of the set theory used to derive them. For more, see mmcomplex.html . When we wish to use a previously-proven assertion as an axiom, our convention is that we use the regular "ax-NAME" label naming convention to define the axiom, but we precede it with a proof of the same statement with the label "axNAME" . An example is the complex arithmetic axiom ax-1cn , proven by the preceding theorem ax1cn . The Metamath program will warn if an axiom does not match the preceding theorem that justifies it if the names match in this way.

Definitions (df-...). We encourage definitions to include hypertext links to proven examples.

Statements with hypotheses. Many theorems and some axioms, such as ax-mp , have hypotheses that must be satisfied in order for the conclusion to hold, in this case min and maj. When displayed in summarized form such as in the "Theorem List" page (to get to it, click on "Nearby theorems" on the ax-mp page), the hypotheses are connected with an ampersand and separated from the conclusion with a double right arrow, such as in " |- ph & |- ( ph -> ps ) => |- ps ". These symbols are not part of the Metamath language but are just informal notation meaning "and" and "implies".

Discouraged use and modification. If something should only be used in limited ways, it is marked with "(New usage is discouraged.)". This is used, for example, when something can be constructed in more than one way, and we do not want later theorems to depend on that specific construction. This marking is also used if we want later proofs to use proven axioms. For example, we want later proofs to use ax-1cn (not ax1cn ) and ax-1ne0 (not ax1ne0 ), as these are proven axioms for complex arithmetic. Thus, both ax1cn and ax1ne0 are marked as "(New usage is discouraged.)". In some cases a proof should not normally be changed, e.g., when it demonstrates some specific technique. These are marked with "(Proof modification is discouraged.)".

New definitions infrequent. Typically, we are minimalist when introducing new definitions; they are introduced only when a clear advantage becomes apparent for reducing the number of symbols, shortening proofs, etc. We generally avoid the introduction of gratuitous definitions because each one requires associated theorems and additional elimination steps in proofs. For example, we use < and <_ for inequality expressions, and use ( ( sin(i x. A ) ) / i ) instead of ( sinhA ) for the hyperbolic sine.

Minimizing axiom dependencies. We prefer proofs that depend on fewer and/or weaker axioms, even if the proofs are longer. In particular, because of the non-constructive nature of the axiom of choice df-ac , we prefer proofs that do not use it, or use weaker versions like countable choice ax-cc or dependent choice ax-dc . An example is our proof of the Schroeder-Bernstein Theorem sbth , which does not use the axiom of choice. Similarly, any theorem in first-order logic (FOL) that contains only setvar variables that are all mutually distinct, and has no wff variables, can be proved without using ax-10 through ax-13 , by using ax10w through ax13w instead.

We do not try to similarly reduce dependencies on definitions, since definitions are conservative (they do not increase the proving power of a deductive system), and are introduced in order to be used to increase readability). An exception is made for Definitions df-clab , df-cleq , and df-clel , since they can be considered as axioms under some definitions of what a definition is exactly (see their comments).

Alternate proofs (ALT). If a different proof is shorter or clearer but uses more or stronger axioms, we make that proof an "alternate" proof (marked with an ALT label suffix), even if this alternate proof was formalized first. We then make the proof that requires fewer axioms the main proof. Alternate proofs can also occur in other cases when an alternate proof gives some particular insight. Their comment should begin with "Alternate proof of ~xxx " followed by a description of the specificity of that alternate proof. There can be multiple alternates. Alternate (*ALT) theorems should have "(Proof modification is discouraged.) (New usage is discouraged.)" in their comment and should follow the main statement, so that people reading the text in order will see the main statement first. The alternate and main statement comments should use hyperlinks to refer to each other.

Alternate versions (ALTV). The suffix ALTV is reserved for theorems (or definitions) which are alternate versions, or variants, of an existing theorem. This is reserved to statements in mathboxes and is typically used temporarily, when it is not clear yet which variant to use. If it is decided that both variants should be kept and moved to the main part of set.mm, then a label for the variant should be found with a more explicit suffix indicating how it is a variant (e.g., commutation of some subformula, antecedent replaced with hypothesis, (un)curried variant, biconditional instead of implication, etc.). There is no requirement to add discouragement tags, but their comment should have a link to the main version of the statement and describe how it is a variant of it.

Old (OLD) versions or proofs. If a proof, definition, axiom, or theorem is going to be removed, we often stage that change by first renaming its label with an OLD suffix (to make it clear that it is going to be removed). Old (*OLD) statements should have "(Proof modification is discouraged.) (New usage is discouraged.)" and "Obsolete version of ~xxx as of dd-Mmm-yyyy." (not enclosed in parentheses) in the comment. An old statement should follow the main statement, so that people reading the text in order will see the main statement first. This typically happens when a shorter proof to an existing theorem is found: the existing theorem is kept as an *OLD statement for one year. When a proof is shortened automatically (using the Metamath program "MM-PA> MINIMIZE__WITH *" command), then it is not necessary to keep the old proof, nor to add credit for the shortening.

Variables. Propositional variables (variables for well-formed formulas or wffs) are represented with lowercase Greek letters and are generally used in this order: ph = phi, ps = psi, ch = chi, th = theta, ta = tau, et = eta, ze = zeta, and si = sigma. Individual setvar variables are represented with lowercase Latin letters and are generally used in this order: x , y , z , w , v , u , and t . In addition, the surreal number section uses subscripted lowercase Latin letters such as xO , xL , and xR . These match the conventional literature on surreal numbers. These variables should not be used outside of that section. Variables that represent classes are often represented by uppercase Latin letters: A , B , C , D , E , and so on. There are other symbols that also represent class variables and suggest specific purposes, e.g., .0. for a zero element (e.g., fsuppcor ) and connective symbols such as .+ for some group addition operation (e.g., grprinvd ). Class variables are selected in alphabetical order starting from A if there is no reason to do otherwise, but many assertions select different class variables or a different order to make their intended meaning clearer.

Turnstile. "|- ", meaning "It is provable that", is the first token of all assertions and hypotheses that aren't syntax constructions. This is a standard convention in logic. For us, it also prevents any ambiguity with statements that are syntax constructions, such as "wff -. ph ".

Biconditional ( <-> ). There are basically two ways to maximize the effectiveness of biconditionals ( <-> ): you can either have one-directional simplifications of all theorems that produce biconditionals, or you can have one-directional simplifications of theorems that consume biconditionals. Some tools (like Lean) follow the first approach, but set.mm follows the second approach. Practically, this means that in set.mm, for every theorem that uses an implication in the hypothesis, like ax-mp , there is a corresponding version with a biconditional or a reversed biconditional, like mpbi or mpbir . We prefer this second approach because the number of duplications in the second approach is bounded by the size of the propositional calculus section, which is much smaller than the number of possible theorems in all later sections that produce biconditionals. So although theorems like biimpi are available, in most cases there is already a theorem that combines it with your theorem of choice, like mpbir2an , sylbir , or 3imtr4i .

Quantifiers. The quantifiers are named as follows:
- A. : universal quantifier ( wal );
- E. : existential quantifier ( df-ex );
- E* : at-most-one quantifier ( df-mo );
- E! : unique existential quantifier ( df-eu ).
The phrase "uniqueness quantifier" is avoided since it is ambiguous: it can be understood as claiming either uniqueness ( E* ) or unique existence ( E! ).

Substitution. The expression "[ y / x ] ph " should be read "the formula that results from the proper substitution of y for x in the formula ph ". See df-sb and the related df-sbc and df-csb .

Is-a-set. " A e.V " should be read "Class A is a set (i.e., exists)." This is a convention based on Definition 2.9 of Quine p. 19. See df-v and isset . However, instead of using I e. V in the antecedent of a theorem for some variable I , we now prefer to use I e. V (or another variable if V is not available) to make it more general. That way we can often avoid extra uses of elex and syl in the common case where I is already a member of something. For hypotheses ($e statement) of theorems (mostly in inference form), however, |- A e.V is used rather than |- A e. V (e.g., difexi ). This is because A e. V is almost always satisfied using an existence theorem stating " ... e.V ", and a hard-coded V in the $e statement saves a couple of syntax building steps that substitute _V into V. Notice that this does not hold for hypotheses of theorems in deduction form: Here still |- ( ph -> A e. V ) should be used rather than |- ( ph -> A e.V ).

Converse. The symbol " ` ``' " denotes the converse of a relation, so " ``' R " denotes the converse of the class R , which is typically a relation in that context (see df-cnv ). The converse of a relation R is sometimes denoted by R^-1 in textbooks, especially when R is a function, but we avoid this notation since it is generally not a genuine inverse (see f1cocnv1 and funcocnv2 for cases where it is a left or right-inverse). This can be used to define a subset, e.g., df-tan notates "the set of values whose cosine is a nonzero complex number" as ( ``' cos " ( CC \ { 0 } ) ) .

Function application. The symbols "( F `x )" should be read "the value of (function) F at x " and has the same meaning as the more familiar but ambiguous notation F(x). For example, ( cos0 ) = 1 (see cos0 ). The left apostrophe notation originated with Peano and was adopted in Definition *30.01 of WhiteheadRussell p. 235, Definition 10.11 of Quine p. 68, and Definition 6.11 of TakeutiZaring p. 26. See df-fv . In the ASCII (input) representation there are spaces around the grave accent; there is a single accent when it is used directly, and it is doubled within comments.

Infix and parentheses. When a function that takes two classes and produces a class is applied as part of an infix expression, the expression is always surrounded by parentheses (see df-ov ). For example, the + in ( 2 + 2 ) ; see 2p2e4 . Function application is itself an example of this. Similarly, predicate expressions in infix form that take two or three wffs and produce a wff are also always surrounded by parentheses, such as ( ph -> ps ) , ( ph \/ ps ) , ( ph /\ ps ) , and ( ph <-> ps ) (see wi , df-or , df-an , and df-bi respectively). In contrast, a binary relation (which compares two classes_ and produces awff) applied in an infix expression isnot surrounded by parentheses. This includes set membership A e. B (see wel ), equality A = B (see df-cleq ), subset A C_ B (see df-ss ), and less-than A < B (see df-lt ). For the general definition of a binary relation in the form A R B , see df-br . For example, 0 < 1 (see 0lt1 ) does not use parentheses.

Unary minus. The symbol -u is used to indicate a unary minus, e.g., -u 1 . It is specially defined because it is so commonly used. See cneg .

Function definition. Functions are typically defined by first defining the constant symbol (using $c) and declaring that its symbol is a class with the label cNAME (e.g., ccos ). The function is then defined labeled df-NAME; definitions are typically given using the maps-to notation (e.g., df-cos ). Typically, there are other proofs such as its closure labeled NAMEcl (e.g., coscl ), its function application form labeled NAMEval (e.g., cosval ), and at least one simple value (e.g., cos0 ). Another way to define functions is to use recursion (for more details about recursion see below). For an example of how to define functions that aren't primitive recursive using recursion, see the Ackermann function definition df-ack (which is based on the sequence builder seq, see df-seq ).

Factorial. The factorial function is traditionally a postfix operation, but we treat it as a normal function applied in prefix form, e.g., ( !4 ) = ; 2 4 ( df-fac and fac4 ).

Unambiguous symbols. A given symbol has a single unambiguous meaning in general. Thus, where the literature might use the same symbol with different meanings, here we use different (variant) symbols for different meanings. These variant symbols often have suffixes, subscripts, or underlines to distinguish them. For example, here "0 " always means the value zero ( df-0 ), while "0g " is the group identity element ( df-0g ), "0. " is the poset zero ( df-p0 ), "0p " is the zero polynomial ( df-0p ), "0vec " is the zero vector in a normed subcomplex vector space ( df-0v ), and ".0. " is a class variable for use as a connective symbol (this is used, for example, in p0val ). There are other class variables used as connective symbols where traditional notation would use ambiguous symbols, including ".1. ", ".+ ", ".* ", and ".|| ". These symbols are very similar to traditional notation, but because they are different symbols they eliminate ambiguity.

ASCII representation of symbols. We must have an ASCII representation for each symbol. We generally choose short sequences, ideally digraphs, and generally choose sequences that vaguely resemble the mathematical symbol. Here are some of the conventions we use when selecting an ASCII representation.

We generally do not include parentheses inside a symbol because that confuses text editors (such as emacs). Greek letters for wff variables always use the first two letters of their English names, making them easy to type and easy to remember. Symbols that almost look like letters, such as A. , are often represented by that letter followed by a period. For example, "A." is used to represent A. , "e." is used to represent e. , and "E." is used to represent E. . Single letters are now always variable names, so constants that are often shown as single letters are now typically preceded with "_" in their ASCII representation, for example, "_i" is the ASCII representation for the imaginary unit _i . A script font constant is often the letter preceded by "~" meaning "curly", such as "~P" to represent the power class ~P .

Originally, all setvar and class variables used only single letters a-z and A-Z, respectively. A big change in recent years was to allow the use of certain symbols as variable names to make formulas more readable, such as a variable representing an additive group operation. The convention is to take the original constant token (in this case "+" which means complex number addition) and put a period in front of it to result in the ASCII representation of the variable ".+", shown as .+ , that can be used instead of say the letter "P" that had to be used before.

Choosing tokens for more advanced concepts that have no standard symbols but are represented by words in books, is hard. A few are reasonably obvious, like "Grp" for group and "Top" for topology, but often they seem to end up being either too long or too cryptic. It would be nice if the math community came up with standardized short abbreviations for English math terminology, like they have more or less done with symbols, but that probably won't happen any time soon.

Another informal convention that we have somewhat followed, that is also not uncommon in the literature, is to start tokens with a capital letter for collection-like objects and lower case for function-like objects. For example, we have the collections On (ordinal numbers), Fin, Prime, Grp, and we have the functions sin, tan, log, sup. Predicates like Ord and Lim also tend to start with upper case, but in a sense they are really collection-like, e.g., Lim indirectly represents the collection of limit ordinals, but it cannot be an actual class since not all limit ordinals are sets. This initial upper versus lower case letter convention is sometimes ambiguous. In the past there's been a debate about whether domain and range are collection-like or function-like, thus whether we should use Dom, Ran or dom, ran. Both are used in the literature. In the end dom, ran won out for aesthetic reasons (Norm Megill simply just felt they looked nicer).

Typography conventions. Class symbols for functions (e.g., abs , sin ) should usually not have leading or trailing blanks in their HTML representation. This is in contrast to class symbols for operations (e.g., gcd , sadd , eval ), which usually do include leading and trailing blanks in their representation. If a class symbol is used for a function as well as an operation (according to Definition df-ov , each operation value can be written as function value of an ordered pair), the convention for its primary usage should be used, e.g., ( iEdgG ) versus ( V iEdg E ) for the edges of a graph G = <. V , E >. .

LaTeX definitions. Each token has a "LaTeX definition" which is used by the Metamath program to output tex files. When writing LaTeX definitions, contributors should favor simplicity over perfection of the display, and should only use core LaTeX symbols or symbols from standard packages; if packages other than amssymb, amsmath, mathtools, mathrsfs, phonetic, graphicx are needed, this should be discussed. A useful resource is The Comprehensive LaTeX Symbol List.

Number construction independence. There are many ways to model complex numbers. After deriving the complex number postulates we reintroduce them as new axioms on top of set theory. This lets us easily identify which axioms are needed for a particular complex number proof, without the obfuscation of the set theory used to derive them. This also lets us be independent of the specific construction, which we believe is valuable. See mmcomplex.html for details. Thus, for example, we don't allow the use of (/) e/ CC , as handy as that would be, because that would be construction-specific. We want proofs about CC to be independent of whether or not (/) e. CC .

Minimize hypotheses. In most cases we try to minimize hypotheses, so that the statement be more general and easier to use. There are exceptions. For example, we intentionally add hypotheses if they help make proofs independent of a particular construction (e.g., the contruction of the complex numbers CC ). We also intentionally add hypotheses for many real and complex number theorems to expressly state their domains even when they are not needed. For example, we could show that |- ( A < B -> B =/= A ) without any hypotheses, but we require that theorems using this result prove that A and B are real numbers, so that the statement we use is ltnei . Here are the reasons as discussed in https://groups.google.com/g/metamath/c/2AW7T3d2YiQ :
1. Having the hypotheses immediately shows the intended domain of applicability (is it RR , RR* , _om , or something else?), without having to trace back to definitions.
2. Having the hypotheses forces the intended use of the statement, which generally is desirable.
3. Many out-of-domain values are dependent on contingent details of definitions, so hypothesis-free theorems would be non-portable and "brittle".
4. Only a few theorems can have their hypotheses removed in this fashion, due to coincidences for our particular set-theoretical definitions. The poor user (especially a novice learning, e.g., real number arithmetic) is going to be confused not knowing when hypotheses are needed and when they are not. For someone who has not traced back the set-theoretical foundations of the definitions, it is seemingly random and is not intuitive at all.
5. Ultimately, this is a matter of consensus, and the consensus in the group was in favor of keeping sometimes redundant hypotheses.

Natural numbers. There are different definitions of "natural" numbers in the literature. We use NN ( df-nn ) for the set of positive integers starting from 1, and NN0 ( df-n0 ) for the set of nonnegative integers starting at zero.

Decimal numbers. Numbers larger than nine are often expressed in base 10 using the decimal constructor df-dec , e.g., ; ; ; 4 0 0 1 (see 4001prm for a proof that 4001 is prime).

Theorem forms. We will use the following descriptive terms to categorize theorems:
- A theorem is in "closed form" if it has no $e hypotheses (e.g., unss ). The term "tautology" is also used, especially in propositional calculus. This form was formerly called "theorem form" or "closed theorem form".
- A theorem is in "deduction form" (or is a "deduction") if it has zero or more $e hypotheses, and the hypotheses and the conclusion are implications that share the same antecedent. More precisely, the conclusion is an implication with a wff variable as the antecedent (usually ph), and every hypothesis ($e statement) is either:
  1. an implication with the same antecedent as the conclusion, or
  2. a definition. A definition can be for a class variable (this is a class variable followed by =, e.g., the definition of D in lhop ) or a wff variable (this is a wff variable followed by <->); class variable definitions are more common.
  In practice, a proof of a theorem in deduction form will also contain many steps that are implications where the antecedent is either that wff variable (usually ph) or is a conjunction ( ph i^i ... ) including that wff variable (ph). E.g., a1d , unssd . Although they are no real deductions, theorems without $e hypotheses, but in the form ( ph -> ... ), are also said to be in "deduction form". Such theorems usually have a two step proof, applying a1i to a given theorem, and are used as convenience theorems to shorten many proofs. E.g., eqidd , which is used more than 1500 times.
- A theorem is in "inference form" (or is an "inference") if it has one or more $e hypotheses, but is not in deduction form, i.e., there is no common antecedent (e.g., unssi ).
Any theorem whose conclusion is an implication has an associated inference, whose hypotheses are the hypotheses of that theorem together with the antecedent of its conclusion, and whose conclusion is the consequent of that conclusion. When both theorems are in set.mm, then the associated inference is often labeled by adding the suffix "i" to the label of the original theorem (for instance, con3i is the inference associated with con3 ). The inference associated with a theorem is easily derivable from that theorem by a simple use of ax-mp . The other direction is the subject of the Deduction Theorem discussed below. We may also use the term "associated inference" when the above process is iterated. For instance, syl is an inference associated with imim1 because it is the inference associated with imim1i which is itself the inference associated with imim1 .

"Deduction form" is the preferred form for theorems because this form allows to easily use the theorem in places where (in traditional textbook formalizations) the standard Deduction Theorem (see below) would be used. We call this approach "deduction style". In contrast, we usually avoid theorems in "inference form" when that would end up requiring us to use the deduction theorem.

Deductions have a label suffix of "d", especially if there are other forms of the same theorem (e.g., pm2.43d ). The labels for inferences usually have the suffix "i" (e.g., pm2.43i ). The labels of theorems in "closed form" would have no special suffix (e.g., pm2.43 ) or, if the non-suffixed label is already used, then we add the suffix "t" (for "theorem" or "tautology", e.g., ancomst or nfimt ). When an inference with an "is a set" hypothesis (e.g., A e.V) is converted to a theorem (in closed form) by replacing the hypothesis with an antecedent of the form ( A e. V ->, we sometimes suffix the closed form with "g" (for "more general") as in uniex versus uniexg . In this case, the inference often has no suffix "i".

When submitting a new theorem, a revision of a theorem, or an upgrade of a theorem from a Mathbox to the Main database, please use the general form to be the default form of the theorem, without the suffix "g" . For example, "brresg" lost its suffix "g" when it was revised for some other reason, and now it is brres . Its inference form which was the original "brres", now is brresi . The same holds for the suffix "t".

Deduction theorem. The Deduction Theorem is a metalogical theorem that provides an algorithm for constructing a proof of a theorem from the proof of its corresponding deduction (its associated inference). See for instance Theorem 3 in Margaris p. 56. In ordinary mathematics, no one actually carries out the algorithm, because (in its most basic form) it involves an exponential explosion of the number of proof steps as more hypotheses are eliminated. Instead, in ordinary mathematics the Deduction Theorem is invoked simply to claim that something can be done in principle, without actually doing it. For more details, see mmdeduction.html . The Deduction Theorem is a metalogical theorem that cannot be applied directly in Metamath, and the explosion of steps would be a problem anyway, so alternatives are used. One alternative we use sometimes is the "weak deduction theorem" dedth , which works in certain cases in set theory. We also sometimes use dedhb . However, the primary mechanism we use today for emulating the deduction theorem is to write proofs in deduction form (aka "deduction style") as described earlier; the prefixed ph -> mimics the context in a deduction proof system. In practice this mechanism works very well. This approach is described in the deduction form and natural deduction page mmnatded.html ; a list of translations for common natural deduction rules is given in natded .

Recursion. We define recursive functions using various "recursion constructors". These allow to define, with compact direct definitions, functions that are usually defined in textbooks with indirect self-referencing recursive definitions. This produces compact definition and much simpler proofs, and greatly reduces the risk of creating unsound definitions. Examples of recursion constructors include recs ( F ) in df-recs , rec ( F , I ) in df-rdg , seqom ( F , I ) in df-seqom , and seq M ( .+ , F ) in df-seq . These have characteristic function F and initial value I . ( gsum in df-gsum isn't really designed for arbitrary recursion, but you could do it with the right magma.) The logically primary one is df-recs , but for the "average user" the most useful one is probably df-seq - provided that a countable sequence is sufficient for the recursion.

Extensible structures. Mathematics includes many structures such as ring, group, poset, etc. We define an "extensible structure" which is then used to define group, ring, poset, etc. This allows theorems from more general structures (groups) to be reused for more specialized structures (rings) without having to reprove them. See df-struct .

Undefined results and "junk theorems". Some expressions are only expected to be meaningful in certain contexts. For example, consider Russell's definition description binder iota, where ( iota x ph ) is meant to be "the x such that ph " (where ph typically depends on x). What should that expression produce when there is no such x ? In set.mm we primarily use one of two approaches. One approach is to make the expression evaluate to the empty set whenever the expression is being used outside of its expected context. While not perfect, it makes it a bit more clear when something is undefined, and it has the advantage that it makes more things equal outside their domain which can remove hypotheses when you feel like exploiting these so-called junk theorems. Note that Quine does this with iota (his definition of iota evaluates to the empty set when there is no unique value of x ). Quine has no problem with that and we don't see why we should, so we define iota exactly the same way that Quine does. The main place where you see this being systematically exploited is in "reverse closure" theorems like A e. ( FB ) -> B e. dom F , which is useful when F is a family of sets. (by this we mean it's a set set even in a type theoretic interpretation.)

The second approach uses "(New usage is discouraged.)" to prevent unintentional uses of certain properties. For example, you could define some construct df-NAME whose usage is discouraged, and prove only the specific properties you wish to use (and add those proofs to the list of permitted uses of "discouraged" information). From then on, you can only use those specific properties without a warning. Other approaches often have hidden problems. For example, you could try to "not define undefined terms" by creating definitions like ${ $d y x $. $d y ph $. df-iota $a |- ( E! x ph -> ( iota x ph ) = U. { x | ph } ) $. $}. This will be rejected by the definition checker, but the bigger theoretical reason to reject this axiom is that it breaks equality - the metatheorem ( x = y -> P(x) = P(y) ) fails to hold if definitions don't unfold without some assumptions. (That is, iotabidv is no longer provable and must be added as an axiom.) It is important for every syntax constructor to satisfy equality theorems *unconditionally*, e.g., expressions like ( 1 / 0 ) = ( 1 / 0 ) should not be rejected. This is forced on us by the context free term language, and anything else requires a lot more infrastructure (e.g., a type checker) to support without making everything else more painful to use.

Another approach would be to try to make nonsensical statements syntactically invalid, but that can create its own complexities; in some cases that would make parsing itself undecidable. In practice this does not seem to be a serious issue. No one does these things deliberately in "real" situations, and some knowledgeable people (such as Mario Carneiro) have never seen this happen accidentally. Norman Megill doesn't agree that these "junk" consequences are necessarily bad anyway, and they can significantly shorten proofs in some cases. This database would be much larger if, for example, we had to condition fvex on the argument being in the domain of the function. It is impossible to derive a contradiction from sound definitions (i.e. that pass the definition check), assuming ZFC is consistent, and he doesn't see the point of all the extra busy work and huge increase in set.mm size that would result from restricting *all* definitions. So instead of implementing a complex system to counter a problem that does not appear to occur in practice, we use a significantly simpler set of approaches.

Organizing proofs. Humans have trouble understanding long proofs. It is often preferable to break longer proofs into smaller parts (just as with traditional proofs). In Metamath this is done by creating separate proofs of the separate parts.

A proof with the sole purpose of supporting a final proof is a lemma; the naming convention for a lemma is the final proof label followed by "lem", and a number if there is more than one. E.g., sbthlem1 is the first lemma for sbth . The comment should begin with "Lemma for", followed by the final proof label, so that it can be suppressed in theorem lists (see the Metamath program "MM> WRITE THEOREMLIST" command).

Also, consider proving reusable results separately, so that others will be able to easily reuse that part of your work.

Limit proof size. It is often preferable to break longer proofs into smaller parts, just as you would do with traditional proofs. One reason is that humans have trouble understanding long proofs. Another reason is that it's generally best to prove reusable results separately, so that others will be able to easily reuse them. Finally, the Metamath program "MM-PA> MINIMIZE__WITH *" command can take much longer with very long proofs. We encourage proofs to be no more than 200 essential steps, and generally no more than 500 essential steps, though these are simply guidelines and not hard-and-fast rules. Much smaller proofs are fine! We also acknowledge that some proofs, especially autogenerated ones, should sometimes not be broken up (e.g., because breaking them up might be useless and inefficient due to many interconnections and reused terms within the proof). In Metamath, breaking up longer proofs is done by creating multiple separate proofs of separate parts. A proof with the sole purpose of supporting a final proof is a lemma; the naming convention for a lemma is the final proof's name followed by "lem", and a number if there is more than one. E.g., sbthlem1 is the first lemma for sbth .

Proof stubs. It's sometimes useful to record partial proof results, e.g., incomplete proofs or proofs that depend on something else not fully proven. Some systems (like Lean) support a "sorry" axiom, which lets you assert anything is true, but this can quickly run into trouble, because the Metamath tooling is smart and may end up using it to prove everything. If you want to create a proof based on some other claim, without proving that claim, you can choose to define the claim as an axiom. If you temporarily define a claim as an axiom, we encourage you to include "Temporarily provided as axiom" in its comment. Such incomplete work will generally only be accepted in a mathbox until the rest of the work is complete. When you're working on your personal copy of the database you can use "?" in proofs to indicate an unknown step. However, since proofs with "?" will (obviously) fail verification, we don't accept proofs with unknown steps in the public database.

Hypertext links. We strongly encourage comments to have many links to related material, with accompanying text that explains the relationship. These can help readers understand the context. Links to other statements, or to HTTP/HTTPS URLs, can be inserted in ASCII source text by prepending a space-separated tilde (e.g., " ~df-prm " results in " df-prm "). When the Metamath program is used to generate HTML, it automatically inserts hypertext links for syntax used (e.g., every symbol used), every axiom and definition depended on, the justification for each step in a proof, and to both the next and previous assertions.

Hypertext links to section headers. Some section headers have text under them that describes or explains the section. However, they are not part of the description of axioms or theorems, and there is no way to link to them directly. To provide for this, section headers with accompanying text (indicated with "*" prefixed to mmtheorems.html#mmdtoc entries) have an anchor in mmtheorems.html whose name is the first $a or $p statement that follows the header. For example there is a glossary under the section heading called GRAPH THEORY. The first $a or $p statement that follows is cedgf . To reference it we link to the anchor using a space-separated tilde followed by the space-separated link mmtheorems.html#cedgf, which will become the hyperlink mmtheorems.html#cedgf . Note that no theorem in set.mm is allowed to begin with "mm" (this is enforced by the Metamath program "MM> VERIFY MARKUP" command). Whenever the program sees a tilde reference beginning with "http:", "https:", or "mm", the reference is assumed to be a link to something other than a statement label, and the tilde reference is used as is. This can also be useful for relative links to other pages such as mmcomplex.html .

Bibliography references. Please include a bibliographic reference to any external material used. A name in square brackets in a comment indicates a bibliographic reference. The full reference must be of the form KEYWORD IDENTIFIER? NOISEWORD(S)* [AUTHOR(S)] p. NUMBER - note that this is a very specific form that requires a page number. There should be no comma between the author reference and the "p." (a constant indicator). Whitespace, comma, period, or semicolon should follow NUMBER. An example is Theorem 3.1 of Monk1 p. 22, The KEYWORD, which is not case-sensitive, must be one of the following: Axiom, Chapter, Compare, Condition, Corollary, Definition, Equation, Example, Exercise, Figure, Item, Lemma, Lemmas, Line, Lines, Notation, Part, Postulate, Problem, Property, Proposition, Remark, Rule, Scheme, Section, or Theorem. The IDENTIFIER is optional, as in for example "Remark in Monk1 p. 22". The NOISEWORDS(S) are zero or more from the list: from, in, of, on. The AUTHOR(S) must be present in the file identified with the htmlbibliography assignment (e.g., mmset.html) as a named anchor (NAME=). If there is more than one document by the same author(s), add a numeric suffix (as shown here). The NUMBER is a page number, and may be any alphanumeric string such as an integer or Roman numeral. Note that werequire page numbers in comments for individual $a or $p statements. We allow names in square brackets without page numbers (a reference to an entire document) in heading comments. If this is a new reference, please also add it to the "Bibliography" section of mmset.html. (The file mmbiblio.html is automatically rebuilt, e.g., using the Metamath program "MM> WRITE BIBLIOGRAPHY" command.)

Acceptable shorter proofs. Shorter proofs are welcome, and any shorter proof we accept will be acknowledged in the theorem description. However, in some cases a proof may be "shorter" or not depending on how it is formatted. This section provides general guidelines.

Usually we automatically accept shorter proofs that (1) shorten the set.mm file (with compressed proofs), (2) reduce the size of the HTML file generated with SHOW STATEMENT xx / HTML, (3) use only existing, unmodified theorems in the database (the order of theorems may be changed, though), and (4) use no additional axioms. Usually we will also automatically accept anew theorem that is used to shorten multiple proofs, if the total size of set.mm (including the comment of the new theorem, not including the acknowledgment) decreases as a result.

In borderline cases, we typically place more importance on the number of compressed proof steps and less on the length of the label section (since the names are in principle arbitrary). If two proofs have the same number of compressed proof steps, we will typically give preference to the one with the smaller number of different labels, or if these numbers are the same, the proof with the fewest number of characters that the proofs happen to have by chance when label lengths are included.

A few theorems have a longer proof than necessary in order to avoid the use of certain axioms, for pedagogical purposes, and for other reasons. These theorems will (or should) have a "(Proof modification is discouraged.)" tag in their description. For example, idALT shows a proof directly from axioms. Shorter proofs for such cases won't be accepted, of course, unless the criteria described continues to be satisfied.
Information on syntax, axioms, and definitions. For a hyperlinked list of syntax, axioms, and definitions, see mmdefinitions.html . If you have questions about a specific symbol or axiom, it is best to go directly to its definition to learn more about it. The generated HTML for each theorem and axiom includes hypertext links to each symbol's definition.

Reserved symbols: 'LETTER. Some symbols are reserved for potential future use. Symbols with the pattern 'LETTER are reserved for possibly representing characters (this is somewhat similar to Lisp). We would expect '\n to represent newline, 'sp for space, and perhaps '\x24 for the dollar character.

The challenge of varying mathematical conventions

We try to follow mathematical conventions, but in many cases different texts use different conventions. In those cases we pick some reasonably common convention and stick to it. We have already mentioned that the term "natural number" has varying definitions (some start from 0, others start from 1), but that is not the only such case.

A useful example is the set of metavariables used to represent arbitrary well-formed formulas (wffs). We use an open phi, φ, to represent the first arbitrary wff in an assertion with one or more wffs; this is a common convention and this symbol is easily distinguished from the empty set symbol. That said, it is impossible to please everyone or simply "follow the literature" because there are many different conventions for a variable that represents any arbitrary wff. To demonstrate the point, here are some conventions for variables that represent an arbitrary wff and some texts that use each convention:

open phi φ (and so on): Tarski's papers, Rasiowa & Sikorski's The Mathematics of Metamathematics (1963), Monk's Introduction to Set Theory (1969), Enderton's Elements of Set Theory (1977), Bell & Machover's A Course in Mathematical Logic (1977), Jech's Set Theory (1978), Takeuti & Zaring's Introduction to Axiomatic Set Theory (1982).
closed phi ϕ (and so on): Levy's Basic Set Theory (1979), Kunen's Set Theory (1980), Paulson's Isabelle: A Generic Theorem Prover (1994), Huth and Ryan's Logic in Computer Science (2004/2006).
Greek α, β, γ: Duffy's Principles of Automated Theorem Proving (1991).
Roman A, B, C: Kleene's Introduction to Metamathematics (1974), Smullyan's First-Order Logic (1968/1995).
script A, B, C: Hamilton's Logic for Mathematicians (1988).
italic A, B, C: Mendelson's Introduction to Mathematical Logic (1997).
italic P, Q, R: Suppes's Axiomatic Set Theory (1972), Gries and Schneider's A Logical Approach to Discrete Math (1993/1994), Rosser's Logic for Mathematicians (2008).
italic p, q, r: Quine's Set Theory and Its Logic (1969), Kuratowski & Mostowski's Set Theory (1976).
italic X, Y, Z: Dijkstra and Scholten's Predicate Calculus and Program Semantics (1990).
Fraktur letters: Fraenkel et. al's Foundations of Set Theory (1973).

Distinctness or freeness

Here are some conventions that address distinctness or freeness of a variable:

F/ x ph is read " x is not free in (wff) ph "; see df-nf (whose description has some important technical details). Similarly, F/_ x A is read x is not free in (class) A , see df-nfc .

"$d x y $." should be read "Assume x and y are distinct variables."

"$d ph x $." should be read "Assume x does not occur in phi." Sometimes a theorem is proved using F/ x ph ( df-nf ) in place of "$d ph x $." when a more general result is desired; ax-5 can be used to derive the $d version. For an example of how to get from the $d version back to the $e version, see the proof of euf from eu6 .

"$d A x $." should be read "Assume x is not a variable occurring in class A."

"$d A x $. $d ps x $. $e |- ( x = A -> ( ph <-> ps ) ) $." is an idiom often used instead of explicit substitution, meaning "Assume psi results from the proper substitution of A for x in phi." Therefore, we often use the term "implicit substitution" for such a hypothesis.

Class and wff variables should appear at the beginning of distinct variable conditions, and setvars should be in alphabetical order. E.g., "$d Z x y $.", "$d ps a x $.". This convention should be applied for new theorems (formerly, the class and wff variables mostly appear at the end) and will be assured by a formatter in the future.

" |- ( -. A. x x = y -> ... ) " occurs early in some cases, and should be read "If x and y are distinct variables, then..." This antecedent provides us with a technical device (called a "distinctor" in Section 7 of Megill p. 444) to avoid the need for the $d statement early in our development of predicate calculus, permitting unrestricted substitutions as conceptually simple as those in propositional calculus. However, the $d eventually becomes a requirement, and after that this device is rarely used.

There is a general technique to replace a $d x A or $d x ph condition in a theorem with the corresponding F/_ x A or F/ x ph ; here it is. |- T[x, A] where $d x A, and you wish to prove |- F/_ x A => |- T[x, A]. You apply the theorem substituting y for x and A for A , where y is a new dummy variable, so that $d y A is satisfied. You obtain |- T[y, A], and apply chvar to obtain |- T[x, A] (or just use mpbir if T[x, A] binds x). The side goal is |- ( x = y -> ( T[y, A] <-> T[x, A] ) ) , where you can use equality theorems, except that when you get to a bound variable you use a non-dv bound variable renamer theorem like cbval . The section mmtheorems32.html#mm3146s also describes the metatheorem that underlies this.

Additional rules for definitions

Standard Metamath verifiers do not distinguish between axioms and definitions (both are $a statements). In practice, we require that definitions (1) be conservative (a definition should not allow an expression that previously qualified as a wff but was not provable to become provable) and be eliminable (there should exist an algorithmic method for converting any expression using the definition into a logically equivalent expression that previously qualified as a wff). To ensure this, we have additional rules on almost all definitions ($a statements with a label that does not begin with ax-). These additional rules are not applied in a few cases where they are too strict ( df-bi , df-clab , df-cleq , and df-clel ); see those definitions for more information. These additional rules for definitions are checked by at least mmj2's definition check (see mmj2 master file mmj2jar/macros/definitionCheck.js). This definition check relies on the database being very much like set.mm, down to the names of certain constants and types, so it cannot apply to all Metamath databases... but it is useful in set.mm. In this definition check, a $a-statement with a given label and typecode |- passes the test if and only if it respects the following rules (these rules require that we have an unambiguous tree parse, which is checked separately):

The expression must be a biconditional or an equality (i.e. its root-symbol must be <-> or =). If the proposed definition passes this first rule, we then define its definiendum as its left hand side (LHS) and its definiens as its right hand side (RHS). We define the *defined symbol* as the root-symbol of the LHS. We define a *dummy variable* as a variable occurring in the RHS but not in the LHS. Note that the "root-symbol" is the root of the considered tree; it need not correspond to a single token in the database (e.g., see w3o or wsb ).
The defined expression must not appear in any statement between its syntax axiom ($a wff) and its definition, and the defined expression must not be used in its definiens. See df-3an for an example where the same symbol is used in different ways (this is allowed).
No two variables occurring in the LHS may share a disjoint variable (DV) condition.
All dummy variables are required to be disjoint from any other (dummy or not) variable occurring in this labeled expression.
Either
(a) there must be no non-setvar dummy variables, or
(b) there must be a justification theorem.
The justification theorem must be of form |- ( definiens root-symbol definiens' ) where definiens' is definiens but the dummy variables are all replaced with other unused dummy variables of the same type. Note that root-symbol is <-> or = , and that setvar variables are simply variables with the setvar typecode.
One of the following must be true:
(a) there must be no setvar dummy variables,
(b) there must be a justification theorem as described in rule 5, or
(c) if there are setvar dummy variables, every one must not be free.
That is, it must be true that ( ph -> A. x ph ) for each setvar dummy variable x where ph is the definiens. We use two different tests for nonfreeness; one must succeed for each setvar dummy variable x . The first test requires that the setvar dummy variable x be syntactically bound (this is sometimes called the "fast" test, and this implies that we must track binding operators). The second test requires a successful search for the directly-stated proof of ( ph -> A. x ph ) Part c of this rule is how most setvar dummy variables are handled.

Rule 3 may seem unnecessary, but it is needed. Without this rule, you can define something like

       cbar $a wff Foo x y $.
       ${ $d x y $. df-foo $a |- ( Foo x y <-> x = y ) $. $}

and now "Foo x x" is not eliminable; there is no way to prove that it means anything in particular, because the definitional theorem that is supposed to be responsible for connecting it to the original language wants nothing to do with this expression, even though it is well formed.

A justification theorem for a definition (if used this way) must be proven before the definition that depends on it. One example of a justification theorem is vjust . Definition df-v |- _V = { x | x = x } is justified by the justification theorem vjust |- { x | x = x } = { y | y = y } . Another example of a justification theorem is trujust ; Definition df-tru |- ( T. <-> ( A. x x = x -> A. x x = x ) ) is justified by trujust |- ( ( A. x x = x -> A. x x = x ) <-> ( A. y y = y -> A. y y = y ) ) .

Here is more information about our processes for checking and contributing to this work:

Multiple verifiers. This entire file is verified by multiple independently-implemented verifiers when it is checked in, giving us extremely high confidence that all proofs follow from the assumptions. The checkers also check for various other problems such as overly long lines.

Discouraged information. A separate file named "discouraged" lists all discouraged statements and uses of them, and this file is checked. If you change the use of discouraged things, you will need to change this file. This makes it obvious when there is a change to anything discouraged (triggering further review).

LRParser check. Metamath verifiers ensure that $p statements follow from previous $a and $p statements. However, by itself the Metamath language permits certain kinds of syntactic ambiguity that we choose to avoid in this database. Thus, we require that this database unambiguously parse using the "LRParser" check (implemented by at least mmj2). (For details, see mmj2 master file src/mmj/verify/LRParser.java). This check counters, for example, a devious ambiguous construct developed by saueran at oregonstate dot edu posted on Mon, 11 Feb 2019 17:32:32 -0800 (PST) based on creating definitions with mismatched parentheses.
Proposing specific changes. Please propose specific changes as pull requests (PRs) against the "develop" branch of set.mm, at: https://github.com/metamath/set.mm/tree/develop .

Community. We encourage anyone interested in Metamath to join our mailing list: https://groups.google.com/g/metamath .

(Contributed by the Metamath team, 27-Dec-2016) Date of last revision. (Revised by the Metamath team, 22-Sep-2022) (Proof modification is discouraged.) (New usage is discouraged.)

		Ref	Expression
	Hypothesis	conventions.1	$⊢ φ$
	Assertion	conventions	$⊢ φ$

Proof

Step	Hyp	Ref	Expression
1		conventions.1	$⊢ φ$