Clause Level Annotation Specification

Version 0.2 (7/07/2004) Proposal July 2004

Matthew Brook O'Donnell
Stanley E. Porter
Jeffrey T. Reed
Robert Picirilli
Catherine J. Smith
Randall K. Tan

Copyright (c) 2001-2004


The clause is the third level of analysis in the model. This document outlines the linguistic analysis of the clause and its components and describes the XML elements and attributes used for its annotation. It builds upon the word (base) level annotation (see Base (Word) Level Annotation) and the word group level of analysis (see Word Group Annotation).

Status of this document

This document is a development of the initial proposal of the clause level annotation scheme. It is currently under review and comments are requested.

Table of contents

1. Introduction

2. Definitions

3. Features analyzed at the clause level
3.1. Clause components, aspect, causality and presupposition (Field)
3.2. Attitude and participation (Tenor)
3.3. Clause boundaries, prime and subsequent, and conjunctions (Mode)

1. Introduction

a. The extent and nature of the 'clause' is an unclear and somewhat contentious issue in traditional grammar. Many linguistic models treat the sentence as the basic unit of analysis, which sometimes consists of just one clause while on other occasions includes a complex of clauses. The discourse model does not include the sentence as a level of analysis. The clause-complex or a group of clauses connected by their function as a unit is not treated as a seperate level of analysis. However, its boundaries and structure are recognized and annotated at the paragraph level.

b. Each clause-whether it would traditionally be classified as 'independent' or 'dependent'-is marked on the same level. The clause components of subject, predicate, complement and adjunct are marked within each clause as they occur.

c. The discussion of the relationships between clauses (their connection and 'dependency') and the level at which they are functioning cannot be treated at the clause level. This takes place at the pericope or paragraph level (see Paragraph Level Annotation).

2. Definitions

  • [d1] This document assumes the analysis of the word group and its components as a separate level of analysis. The definition and boundaries of a word group are as defined elsewhere (see Word Group Annotation).
  • [d1a] A clause is a unit of language that contains a single proposition about which the language user is making an assertion, negation, query or suggestion.
  • [d2] A clause will usually consist of a verbal element (the Predicate) and its related elements. However, a verbal element will not always be present (e.g. the opening of many letters) and is not required in a clause. A clause may consist of a single word group (e.g. a one-word phrase).
  • [d3] A clause component is a functional unit made up of one or more word groups and can be classified as one of the four following types: subject, predicate, complement and adjunct.
  • [d4] The Predicate of a clause is its verbal element, which grammaticalizes the process of the clause with the semantic features of aspect and causality. It includes both finite and infinite (participle and infinitive) verb forms. Finite verbal forms also grammaticalize attitude, participation and collection. Non-finite forms do not grammaticalize attitude, but presupposition.
  • [d5] The Subject of a clause is the word group or word groups providing greater specification regarding the grammatical subject of a finite verb form (the morphological indication of person and number). For finite verbs the head term of this group (or these groups) are in the nominative case. In infinitive clauses the 'subject' may be indicated in the accusative case. In so-called 'genitive absolute' contructions the subject component occurs in the genitive case. A clause will often have no subject component and can have at most one subject component.
  • [d6] A Complement of a clause is a word group or the word groups that 'completes' the predicate of the clause. The categories of direct and indirect object from traditional grammar are among those classified as complements. A clause may have no complement or many complements. Withi relation to the process of the clause, the complement(s) are those components of the clause that answer the question "who?" or "what?" is affected by the process.
  • [d7] An Adjunct of a clause is a word group or the word groups that modify the predicate, providing an indication of the circumstances associated with the process. Common adjuncts are prepositional and adverbial phrases (adverbs) and also embedded "adverbial clauses". With relation to the process of the clause, adjuncts provide answers to questions of the type "where?", "when?", "why?" and "how?".
  • [d8] Aspect is a semantic category, associated with the predicate of a clause, which concerns the manner in which an author chooses to view and portray an action. It is grammaticalized through an author's choice of tense-form. Perfective aspect is grammaticalized through the aorist tense-form, Imperfective aspect through present forms (present and imperfect tense-forms) and Stative aspect through perfect forms (perfect and pluperfect tense-forms). The future tense-form is aspectually vague and therefore a predicate containing a future form does not specify an aspect value. A number of lexical forms are also aspectually vague.
  • [d9] Causality is a semantic category, associated with the predicate of a clause, which expresses the relationship between the agent of an action and the action itself. It is grammaticalized through an author's selection of voice. Direct causality is grammaticalized through the active voice form, External causality through the passive voice and Internal/Ergative causality through the middle voice form.
  • [d10] Attitude is a semantic category, associated with the predicate of a clause, which expresses the manner in which an author chooses to view and portray an action in relation to reality. It is grammaticalized through an author's selection of mood. Assertive attitude is grammaticalized through the indicative mood, directive attitude through the imperative mood, Projective attitude through the subjunctive mood and Contingent attitude through the optative mood. The semantic value of the future tense-form is included as an attitude value at the clause level. Expective attitude is grammaticalized by the future form.
  • [d11] Presupposition is a semantic category, associated with the predicate of a clause, which expresses an author's presupposition concerning the factual nature of a process. It is grammaticalized through an author's selection of either an infinitive or participle form. Factive presupposition is grammaticalized by a participle verbal form and Non-factive presupposition through an infinitive form.
  • [d12] Participation is a semantic category expressing the degree of involvement of the 'speaker' and 'hearer' within a discourse. A predicate component containing a finite verb will carry a value for this semantic feature. Participation is grammaticalized through an author's selection of person. Excluded participation is grammaticalized by the third person, Indirect participation by the second person and Direct participation by the first person.
  • [d13] Collection is a semantic category expressing the number of things or participants involved as the subject of a process. A predicate component containing both finite and infinite verbal forums carry this semantic feature. Collection is grammaticalized through an author's selection of number. Unitary collection is gramaticalized by singular forms, Plural collection by plural forms.
  • [d14] The subject and complement components are assigned a role function, defining the involvement of the component in the process of the clause. The three possible roles are actor, agent and patient. [If we choose to use process types from Halliday then different process types have different roles for the participants. (mbod-8/6/04)]
  • [d15] A patient is the goal of a process and usually occurs on a complement of the clause.
  • [d16] The actor of a process occurs on the subject of the clause and usually occurs as the grammatical subject of the verbal element. With an active verb the actor is responsible for performing the action, while with a passive verb the actor is not directly responsible for performing the action.
  • [d17] The agent of a process is the component of a clause responsible for performing the action. With a passive verb the agent may be or may not be present. When it is present it specified with a 'by-clause'.
  • [d18] The adjunct components of a clause are assigned circumstances, indicating the nature of the process. The possible circumstances are how, where, when and what.
  • [d19] The prime of a clause is the first word group of a clause, and the subsequent is the remaining word groups.
  • 3. Features analyzed at the clause level

    a. Though not formally recognized in the current elements and attributes, it is helpful to divide the features analyzed at the clause level according to whether they belong to the field, tenor or mode of discourse.

    3.1. Clause components, aspect, causality and presupposition (Field)

    a. A clause usually has a number of different components, differentiated in functional terms. There are four components, subject, predicate, complement and adjunct. A clause can usually have only one subject and predicate, but any number of complements and adjuncts.

    b. The predicate is made up of the word group containing the verbal element of the clause. Each predicate is marked with a number of semantic features based upon the formal morphological features of the verbal element marked at the word group level (see Word Group Annotation 3.3). The features marked on the predicate are: aspect (tense-forms except for future), causality (disambiguated voice forms), attitude (mood for finite forms), participation (person for finite forms), collection (number for finite and participle forms) and presupposition (for participle and infinitive forms). At the clause level ambiguity between middle and passive forms (in the present and perfect forms), marked at the word group level, must be resolved and the predicate assigned either Internal/Ergative or Passive causality.

    c. The subject is made up of one or more word groups, with the head term(s) usually in the nominative case (and in concord with the finite verbal form if it is present). Cases where a number of word groups are joined by conjunctions, e.g. Pau'lo" kai; Timovqeo" kai; Pevtro", are marked as a single subject component. Subject components are marked with a role, which will usually be that of actor.

    d. A complement is made up of one or more connected word groups that can be said to complete the action of the predicate. In traditional grammatical terms complements include direct and indirect objects. A number of related word groups, connected by conjunctions are marked as single complement, e.g. to;n Pevtron kai; to;n ?Iavkwbon kai; to;n ?Iwavnnhn. In other cases, however, a number of word groups may constitute individual complements, e.g. [cara;n pollh;n]C [e[scon]P kai; [paravklhsin]C. Complement components are with a role, which will frequently be that of patient.

    e. An adjunct consists of one of more connected word groups functioning in an adverbial manner. Common adjuncts are prepositional phrases and adverb word groups. In clauses with a series of adverbial word groups joined by conjunctions, e.g. ejn uJmi'n kai; ejn hJmi'n kai; eij" Cristou', each word group is marked as an adjunct. Adjunct components marked with a circumstance with the values where, when, how and what.

    f. Frequently a clause component will consist of an entire clause filling the slot of subject, complement or adjunct. Consider the clause, oJ pisteuvwn ejn qew'/ lalei' tw'/ law'/. This should first be analyzed as follows: {[oJ pisteuvwn ejn qew'/]S [lalei']P [tw'/ law'/]C}. However, the subject component of the clause is itself a clause, analyzed as: {[oJ pisteuvwn]P [ejn qew'/]A}.

    g. Other examples of clauses nested within clause components are adverbial participle clauses functioning as clause adjuncts and infinitive phrases filling the subject or complement slot.

    h. A clause will usually have a single finite verbal element as its predicate. It may have other non-finite verbal forms in other component slots (e.g. a participle phrase as an adjunct). These additional verbal elements are given their own clausal analysis as discussed above. There are occasions where a clause may appear to have a predicate component containing two finite verbs (connected by a conjunctive particle) that share the other clause components (subject, complements and/or adjuncts) e.g. Rom. 1.21 [ [diovti]conj [gnovnte" to;n qeo;n]A [oujc]A [wJ" qeo;n]A [ejdovxasan h] hujcarivsthsan]P ]. An alternative analysis of this clause would be two see it as a clause with two predicates, e.g. [ejdovxasan]P [h]]conj hujcarivsthsan]P. However, the general criterion for determining the boundaries of a clause is one finite predicate per clause. This specification suggests that this be followed if possible, so that the example from Rom. 1.21 would be analyzed as two clauses, e.g. [ [diovti]conj [gnovnte" to;n qeo;n]A [oujc]A [wJ" qeo;n]A [ejdovxasan]CLAUSE1 [ [h]]conj hujcarivsthsan]P ]CLAUSE2.

    i. Periphrastic constructions pose a challenge for annotation at the clause level. 'Periphrastic verbal constructions are formed by the grammatically appropriate combination of a form of the auxiliary verb eijmiv and a participle' (Porter 1994: 45). These two verbal elements, the auxiliary and the participle form, function as a single semantic unit and are thus annotated as a single group. This allows for all the semantic features annotated as attributes on the predicate component, such as aspect, causality and attitude to be included. The auxilary provides the semantics of attitude (eijmiv is vague with regard to aspect and causality [realized by tense-form and voice respectively]) and the participle form provides aspect and causality. As a guide in identifying paraphrastic constructions it should be noted that other elements such as subject or adjunct components related to the auxilary do not usually intervene between the auxilary and the participle (see examples in Porter 1994: 45-46).

    3.2. Attitude and participation (Tenor)

    a. Predicate clause components containing a finite verb form are marked with an indication of attitude (mood forms). Possible values for attitude are: assert (indicative forms), direct (imperative), project (subjunctive) and contingent (optative).

    b. The <part> element

    3.3. Clause boundaries, prime and subsequent, and conjunctions (Mode)

    a. The boundaries of each clause are ascertained on the basis of the criteria outlined in the definition of a clause (see Definitions). A clause will have at most one verbal element, though it need not have one.

    b. As discussed in section 3.1, an entire clause may be nested within a component element of another clause, e.g. C1[ C2[oJ pisteuvwn ejn qew'/] lalei' tw'/ law'/] and C1[ei[ pw" h[dh pote eujodwqhvsomai ejn tw'/ qelvmati tou' qeou' C2[ejlqei'n pro;" uJma'"] ].

    c. Each clause is marked with a unique identifier allowing reference from other elements and documents.

    d. The first word group within a clause is marked as the prime of the clause and the remainder of the clause as the subsequent (other models use the terms theme and rheme for similar categories). At the clause level the prime is usually thematized material or the 'topic' of the clause, while the subsequent provides additional material expanding the prime.

    e. Conjunctions marked at the clause level of discourse are those that join two elements (usually word groups) within the clause. For example, the kaiv in the clause cara;n ga;r pollh;n e[scon kai; paravklhsin ejpi; th'/ ajgavph/ sou, joins the word groups cara;n pollh;n and paravklhsin. This conjunction is marked as a clause level conjunction. In contrast, the gavr in the same clause functions to join the clause to a previous clause, it is, therefore, marked as a conjunction at the paragraph level (see Paragraph Level Annotation).