RISTAL. Research in Subject-matter Teaching and Learning 3 (2020), 108–125

Student texts produced in the context of material-based argumentative writing: Interdisciplinary research-related conception of an evaluation tool

This paper focuses on student texts produced in the context of material based writing and presents a text analysis grid which is developed in an interdisciplinary research project between the didactics of geography and of German language. Based on a critical discussion of the typical methodological procedures that are currently used in research to evaluate the quality of written argumentations, we argue that only a combination of structural, linguistic and content-related analysis steps can determine the quality of argumentative texts as a whole. Hence, there is a particular need for development in text product analysis with regard to linking the various aspects. We therefore propose a catalogue of criteria, which is interdisciplinarily theory-guided and enables a quantitative and qualitative in-depth-analysis of text products on multiple levels: content-related, structural, linguistic and material-related with special consideration of the multiple document input.

argumentation, text analysis grid, material-based writing, evaluation

1 Introduction

With the Educational Standards 2012, a new type of task has been established for teaching German: informational and argumentative material-based writing, primarily applied in upper secondary schools (KMK 2012). The format is characterized by the fact that students write their own (lengthier) informational or argumentative text on a topic based on materials (graphics, tables, pictures, other media sources) and texts of various kinds (Abraham/ Baurmann/Feilke 2015: 4). From an interdisciplinary perspective, however, the 'new' is not quite so new: For while a simultaneous examination of several linear and non-linear or multimodal and multimedia texts in the education of German opens up a new didactic paradigm or represents a new didactic genre, the productive reception of multiple sources and documents has already been a part of the subject culture in many other teaching fields for much longer ─ e.g. in geography lessons from the beginning of secondary education on up to higher levels (DGfG 2020).

In general and for language-sensitive teaching in particular, it is important to integrate material-based writing tasks into the overall curriculum in a meaningful way, focusing on practical and socially relevant writing formats. Language and content learning are to be combined by developing common, mutually stimulating concepts for the material-based format in a "bidirectional" cooperation of the various didactics (cf. Sturm 2017: 23f.).

This applies against the following background: On the one hand, material-based writing is generally understood as a pre-scientific type of task that is intended to prepare for epistemic or eristic writing, i.e. a form of scientific argumentation and writing determined by professional controversy (cf. Feilke 2017: 4). At the same time, the new task type is a reaction to the change in literacy in society, where the prevalence of the use of open polytextual (multimodal and multimedia) formats compared to closed text formats can be increasingly observed in everyday life, especially under the influence of digitalization. The reception of different sources and material-based writing is socially relevant because it prepares people for maturity in a democratic and pluralistic society. Exercising critical thinking through the reception and reflection of several, possibly conflicting sources or the ability to prove one's own position through information from reputable sources are ultimately among the central competencies in modern digital society. In particular, the ability to argue and understand arguments is a key competence for participating in social debates and decision-making in democratic societies.

This fits to the general claim that argumentation should be learned in school lessons in all subjects. Because in the school context, argumentation can also be used to better understand subject matter, formulate one's own assessment of problems, and strengthen students' social skills (Budke/Meyer 2015: 14). For these reasons, we’ll find a broad international literature base for argumentation, and empirical studies are conducted in the didactics of all subjects (see e.g. the overview in Budke et al. 2015). There is a general consensus to define argumentation as a problem-solving process in which a disputed assertion is to be refuted or confirmed by justifications (e.g. Lueken 2000; Bayer/Schlobinski 1999; Kopperschmidt 1995; Kienpointner 1983). The goal of the argumentation is to achieve understanding of the position taken by the respective interaction partners through logical reasoning and, if necessary, to convince them.

In empirical surveys in various school subjects, research focuses not only on the significance of argumentation in class and in the school media, but also on the skills and strategies of the students in oral and written argumentation, as well as on the development of didactic support approaches, which are usually tested for their effectiveness in intervention studies in pre-post-design (e.g. Gronostay 2019; Zohar/Nemet 2002; Lam et al. 2018). In order to achieve these goals, it is usually necessary to measure the students' arguments. This is methodologically particularly demanding, since arguments must be identified from the wealth of linguistic statements, their quality should be assessed according to certain criteria, and the quality of the linguistic wording must also be assessed, since this is the only way to answer whether the oral utterances or texts meet their communicative goal. Of course, these challenges also apply to material-based argumentative writing, which also creates the desideratum to analyze the material references in terms of content and language.

So it can be summed up, that one of the central demands in the further development of the (new) format of material-based argumentative writing is the evaluation of the student’s intermediate and final text products. Here, procedures are required for teaching practice, but above all for research, which enable an assessment that is oriented towards the central aspects and as objective as possible. Especially when, in addition, support formats are to be conceived for students whose literary competencies are still in the process of development, a precise diagnostically oriented research of the format becomes a desideratum. Above all, ideally, such procedures and support formats should be developed in an interdisciplinary way and later be used across disciplines, so that the support of the students can be equally cross curricular.

In this article, we present such an interdisciplinary developed instrument designed for the evaluation of student texts which are produced in the context of material-based argumentative writing. To unfold the theoretical background, paragraph 2 first focuses on the typical methodological approaches of the available empirical studies. Structural, linguistic, content-related and holistic procedures are compared. Subsequently, we identify deficits through a critical discussion of these procedures in paragraph 3; and on this basis, taking into account various theoretical approaches, we introduce an analysis grid for material-based argumentative text products ─ measuring the overall quality of written argumentation ─ in paragraph 4 and 5. In this respect, first results on the suitability of the evaluation tool are also presented and finally its interdisciplinary applicability is discussed. Regarding the overall framework, the presented text analysis grid was tested in the context of the interdisciplinary research project SpiGU: "Language Sensitive Teaching and Learning in Inclusive Geography Classes: Support Formats for Material-based Argumentative Writing"; an collaboration between the didactics of geography and of German language. We close with a short conclusion and prospect on further research in section 6.

2 Measuring the quality of argumentation

Insofar as research on material-supported writing in German didactics is still relatively young, comparatively few papers have been published to date that are dedicated to a differentiated assessment of the corresponding text product. Feilke et al (2016: 71) introduce (in addition to conducting a global rating) a first variable catalogue of criteria for the evaluation of material-based texts. However, the catalogue focuses more on application in school practice than on scientific diagnostic use in a research context.

Schüler (2017) presents one of the first relevant research contributions on material-based writing in upper secondary schools, which zooms in on and relates planning, conception and production processes and the text product in equal measure. In other words, Schüler's research aims at the connection between text production (text conception) and text product (text composition) and therein at the synthesis and structure formation in the processing of multiple documents for the final goal of a controversy essay. The text products are evaluated in this context in two layers: (1) by a (criteria-driven) holistic evaluation with global judgement (12 raters); (2) by a qualitative content analysis of the structural and synthesis performance at the text-macro- and text-micro-level (i.e. evaluation of the reference to the materials and their linkage at the text-macro-level ─ monotextual-aggregative or polytextual-synthetic; evaluation of the referencing and argumentation performance at the text-micro-level). For (2), Schüler develops a differentiated coding guideline based on recent linguistic theory, which focuses on the analysis of synthesis markers along corresponding categories (aggregation vs. synthesis; referencing procedures) and follows the work of Feilke et al.

Altogether, it can be noted that the development of research-diagnostic analysis tools for material-based task formats, including for the text product, is still in its beginnings in German didactics and, due to the nature of the discipline, has a primarily linguistic perspective. On the other hand, there is a rich research tradition of assessing student’s written argumentation ─ approaches that in principle could be made used for the assessment of material-based argumentative writing products.

Generally considered, empirical studies on the quality of students' argumentation may help to diagnose existing competencies of students in reasoning and to identify typical difficulties (e.g. Budke/Uhlenwinkel 2011; Uhlenwinkel 2015; Budke/Kuckuck 2017; Maier/Budke 2018). Another approach frequently used in intervention studies is to measure the effectiveness of different educational support approaches by comparing argumentations that were written before a classroom intervention with texts that were written after the intervention (pre-post design) (e.g. Gronostay 2019; Zohar/Nemet 2002; Lam et al. 2018).

Analyzing these available studies, four different approaches can be identified, some of them are also used in combination. In order to achieve a structured presentation, they are presented separately in the following:

Structural approaches:

The most common methodological approach to analyze the quality of argumentation is the structural argumentation analysis (e.g. Lam et al. 2018; Abdollahzadeh et al. 2017; Stapleton/Wu 2015; Riemeier et al. 2012; Simon et al. 2006; Basel et al. 2013; Zohar/Nemet 2002; Chase 2011; Knudson 1992). The authors usually refer directly to the work of Toulmin (1996) or to authors who have slightly modified his approach. Toulmin (1996) describes the basic structure of arguments, which consists of data, warrant and conclusion. Additional elements that make the structure of an argument more complex are rebuttals, backing, qualifiers and counterarguments. Since data, warrant, and backing are often not clearly distinguishable in empirical analysis, some studies summarize these elements in the category "grounds" (e.g. Clark/Sampson 2008).

The available studies generally examine the extent to which the arguments used by students are complete (basic structure) and consider complete arguments better than incomplete ones (e.g. Aufschnaiter et al. 2008; Lam et al. 2018; Gronostay 2019). Furthermore, the existing arguments/counter-arguments in a text can be identified, and it is assumed that the quality of the texts increases with the number of stated arguments (e.g. Benetos/ Betrancourt 2020; Basten et al. 2017). Not least, structural analysis is applied to determine the internal complexity of arguments. Here, the idea is, the more argumentation elements, the better the argument (e.g. Basel et al. 2013; Aufschnaiter et al. 2008; Chase 2011). All in all, the framework is characterized by the fact that the text quality is determined by a quantitative value that results from counting argumentation elements and/or complete arguments.

Holistic procedures:

Holistic methods are widely used to determine the quality of argumentation (e.g. Knudson 1992; Chase 2011; Allen et al. 2018). Normally, argumentative texts are presented to several experienced raters who are supposed to evaluate them as a whole. This procedure makes it possible to distinguish good from bad texts and to divide them into groups of similar quality.

In some cases, the raters are given specific criteria that they should consider. Abdollahzadeh et al. (2017), for example, use 'the overall argument effectiveness', 'the presence or absence of possible opposing views', 'overall structure', and 'overall language use'. In the study by Chase (2011), the raters should primarily consider 'coherence and cohesion' and Allen et al. (2018) examine aspects such as 'text structure, grammar, sentence structure, word choice'. However, these aspects are included in the final judgement on text quality with a weighting that cannot be defined exactly.

Content analyses:

The third approach refers to evaluation approaches that aim to determine the content and subject-related quality of the arguments presented by the students. This approach is mainly used in studies of teaching in the natural and social sciences. With this objective, e.g. Stapleton/Wu (2015) examine the quality of the grounds in argumentations. The most frequently mentioned reasons were filtered out and their quality was examined using the criteria "relevance" for the thesis and "acceptability", the logical suitability of the grounds in argumentations. In the study by Zohar/Nemet (2002), the validity of the evidence is determined in a similar way. Sandoval (2005) also examines the "conceptual quality" of the grounds, paying special attention to the quality of the reference to material, which is represented by criteria such as "suffiecy of data" and "description of data". Böttcher/Meisert (2011) want to determine the quality of argumentation not only structurally, but on the basis of scientific models.

Linguistic analyses:

Among linguistic subjects, the quality of written argumentation is usually determined by linguistic criteria. This is often done in the context of studies on persuasive writing (e.g. Quasthoff/Domenech 2016; Nippold et al. 2005). Andrews (2016) proposes to investigate, among other things, argumentation-specific linguistic actions in texts. This includes the concessive literary procedures, the execution of counter-arguments and their proper embedding, which were investigated by Petersen (2013). A broader approach is pursued by Quasthoff/Domenech, (2016, p. 32-33), who examine the criteria "contextualization" as a consideration of the text genre and the demands on argumentation, "textualization" which means the address-oriented consideration and rebuttal of counter-arguments, and "marking" as the use of linguistic means typical for argumentation, such as the formulation of one's own opinion.

The typical approaches presented will be critically discussed in the next paragraph; the review serves as a basis for the development of an own methodical approach.

3 Critical discussions of approaches to measure argumentation quality

The structural analysis of arguments can provide precise information about which elements of argumentation are present in the student’s texts and which are often missing. Among other results, it can be found that students often find it difficult to incorporate counter-arguments and rebuttals into their argumentation (e.g. Budke/Kuckuck 2020, Chase 2011). Depending on the question posed, structural analyses also provide the opportunity to take a closer look at the use of a particular structural element in argumentation. Thus Macagno et al. (2015) for example only examine the types and strategies used by students to state counter-arguments.

However, structural analyses reach their limits when the content-related quality of the arguments has to be considered. Usually, no content-related quality criteria of the single structural elements are examined. It is e.g. only investigated, if evidence is named and included, but not, if it is correct or suitable to support the thesis of the argumentation. This is a problem in the context of the natural, social or artistic sciences in schools and universities, since the quality of the argumentation is influenced by whether it is relevant in the respective subject context, on the basis of the current state of research and, if necessary, taking into account relevant sources and materials. A further problem of purely structural approaches is that the linguistic quality of the arguments and their linguistic linkage in a coherent text is not taken into account. This can lead to texts being said to be of a high quality that are difficult to understand and therefore miss their communicative purpose.

In contrast to structural approaches, holistic methods take into account the textual character of the argumentation and usually evaluate both the quality of the content and the linguistic presentation of the arguments, their combination and development. This procedure is therefore suitable, among other things, for determining the success of teaching interventions to promote argumentation competence on the basis of pre-post studies.

Nevertheless, it is not possible to draw conclusions about individual problems of students or typical sources of failure on the basis of holistic assessments, which is possible with structural analyses. Holistic analyses are therefore not suitable in the context of questions concerning the diagnosis and promotion of argumentation skills. In addition, it is not really possible to understand how the different raters arrive at their respective judgements and thus this method has deficits in scientific objectivity. To address this problem, holistic analyses sometimes include criteria (see 2.). However, these often seem arbitrarily chosen and it cannot be understood how they are included in the overall judgement.

There are surprisingly few studies that attempt to determine the quality of the content of written arguments. In existing studies, the quality of the grounds is examined particularly frequently, whereby the subject-related quality of other elements, the conclusion, rebuttel, backings and counter-arguments could also be examined. If only one of the elements that make up an argument is analyzed (e.g. Zohar/Nemet 2002; Stapleton/Wu 2015), it is questionable whether the quality of the argument can be adequately determined. In addition to interdisciplinary quality criteria, which are based on structural argumentation elements, subject-specific quality criteria should also be defined. This has been done for the subjects of mathematics, geography and biology, among others (Budke et al. 2015). A criteria-oriented analysis of the quality of single arguments could then be carried out on this theoretical basis.

Finally, the linguistic analysis of argumentative texts can empirically work out typical language action patterns and wordings in the context of argumentation. In addition, it is possible to analyze linguistic deficits in the formulation of students' argumentations and on this basis, support measures can be planned. With regard to purely linguistic analyses, nevertheless, it can be argued that they do not allow any conclusions to be drawn about the quality of the texts' content. This means that it is possible that texts are attested as being of high quality that do not present any valid arguments or miss the subject matter.

In summary: In the context of material-based argumentative writing in a specific subject such as geography, there is, besides the demand for methodical instruments for process analysis, a particular need for development in text product analysis with regard to linking the various aspects. The desideratum is a catalogue of criteria, which is interdisciplinarily theory-guided ─, i.e. with recourse to current theories of material-based writing, argumentation on a geographical subject, argumentative and intertextual text procedures, etc. ─ thus a catalogue that enables a quantitative and qualitative in-depth-analysis of text products on multiple levels: content-related, structural, linguistic and material-related with special consideration of the multiple document input.

In our judgement, all this requires an interdisciplinary cooperation of German and Geographic didactics. The following paragraph outlines how we have approached the development of a corresponding text analysis grid in the SpiGU project.

4 A text analysis grid for measuring the quality of material-based argumentation

The evaluation tool for text products, which we focus on and discuss in the following, is one component of data evaluation among others in the context of the SpiGU research project and the superordinated methodology of design research (e.g. Einsiedler 2011; Prediger/Link 2012, Prediger et al. 2012). It can be accessed at the following link:

It was created in the course of the first multi-method data collection in an 8th grade of an inclusive comprehensive school in Cologne. The aim here was to identify the challenges faced by (a total of 19) students with and without an enhancement focus in dealing with material-based argumentation tasks on a geographical topic, if possible without any support (scaffolding).[1]

This means that we wanted to know how the different students at the relevant age deal with the tasks in question "on their own initiative" and independently, and which specific challenges arise for each of them, in order to be able to start developing possible scaffolds at this point later on. In the SpiGU project team, we developed an exemplary writing task on a local conflict of land-use based on a total of 10 materials in which the relevant information was presented in various forms of presentation (including texts, pictures, diagrams/graphics and maps). Here, we wanted to investigate what kind of material different students use and what specific challenges they face. Both process and text product data were collected from the implementation.

We opted for a combination of (a) thinking aloud protocols and (b) a mixed set of short questionnaires and longer interviews for self-reflection after the task was completed. In the week prior to the survey, a standardized reading test (LESEN 8-9) was additionally conducted with all participating students.[2]

The text products formed the last essential data component. For the evaluation of both the process data and the product data, coding guidelines were developed on the basis of theory within the framework of quantitative and qualitative content analyses (Mayring 2010; Kuckartz 2014). The results should be related to each other later (Kuckartz 2016).

In the following, we concentrate exclusively on the presentation of the text analysis grid and its criteria catalogue for the text products. The critical discussion in the last section has led to the conclusion that only a combination of structural, linguistic and content-related analysis steps can determine the quality of argumentative texts as a whole. It is only in the combination of the different procedures that the respective potentials can be utilized and the respective "blind spots" can be compensated. Overall, the analysis grid encompasses the following five main categories and is thus based on an interdisciplinary combination of theories from both German and geographical didactics: Section (1): Argumentative organization of the text through text procedures; section (2): Linguistic and structural organization of the text; section (3): Reference to material; section (4): Quality of argumentation in terms of content.

The coding of argumentative text products along the different categories and their subcategories allows on the one hand for a differentiated, also comparative qualitative analysis of the divers linguistic, structural and content-related dimensions involved. Some of the subcategories are dichotomously differentiated (applies; does not apply), some are scaled by percentage. On the other hand a supplementing scoring system permits the determination of a quantitative value of the total quality of the text product, whereby again the included components can be compared in detail. In the following, we provide an overview of how the various language- and subject-related dimensions are evaluated using the text analysis grid and how the categories are each theoretically based.

Measuring the linguistic quality of argumentation:

The operationalization of measuring the linguistic quality was primarily guided by the current theory of Feilke on specific “text procedures” (“Textprozeduren”) in written argumentation (e.g. Feilke, 2012, 2014, 2017; Rezat 2014, 2018), as well as by Steinseifer’s approach (2014; 2018) on recurring text procedures for referencing when dealing with multiple documents in the context of material based writing. In addition, a further section covers the genuine language systematic domains lexis (subject related vocabulary) and grammar as well as the aspect of structural organisation of the respective text.

Thus, section (1) of the grid zooms in on text procedures as typical language operations with particular linguistic function which show frequent reoccurrence depending on the text type. According to Feilke (2012; 2014; 2017), the acquisition and use of such language operations and in connection the application of specific text patterns and text procedures become particularly relevant (for novices) in writing in order (to learn) to serve and unfold a specific type of text. Concerning written argumentation, Feilke argues for positioning, perspectivizing, conceding, justifying, concluding, contrasting, explicit text structuring and textual referencing being key language operations, realized by text procedures that carrying out the corresponding function. The item used to observe (i) the mere occurrence and (ii) the precise use of the operation reads as follows: “The student uses a text procedure of positioning/perspectivizing/etc.” The assessment follows a systematic guideline: First, it is determined whether the respective text procedure merely occurs in the text (yes/no; scored with 1/0 point). Then, the frequency of the text procedure's occurrence in the text is determined in order to assess and, accordingly, to score the percentage of their functional use (e.g. 2 points if 51% and more of the text procedures (of a specific language operation) are used stringently concerning their semantic function).

Section (2) on the text’s linguistic and structural organization evaluates length (2.1), structure (2.2), lexis (2.3) and grammar (2.4). The level of structure is addressed by the item “The student structures her/his text by using comprehensible paragraphs (i.e. introduction, main part, conclusion) (yes/no)”. In case the text displays (successful) attempts of structuring, the item is scored respectively. Regarding lexis, we established two items that differ with respect to the performed or not performed reference to the given material and, similarly to the proceeding in (1), score them with regard to the percentage of which technical terms are used correctly. Last, in order to analyze the students’ grammatical skills, we opted for four items, focussing on syntax, conjunctions, textual references, and punctuation. All four items are evaluated by calculating the percentage of the correctly used grammatical structures and scored accordingly.

For the third constituent contributing to a linguistic perspective, see (3.2): “Use of information in terms of its linguistic presentation”, as part of the assessment of the “Reference to material” (the multiple documents base of the argumentation task) in section (3). Here, we take a closer look at the quality of information reproduction from the given material. After first assessing whether the information is reproduced implicitly (i.e. without reference to the material; 1 point) or explicitly (i.e. with reference; 2 points), a further step allows an insight on how the presentation of the material information is realized linguistically through different text procedures of referencing (i.e. neutral report, qualifying, discussing), plus their argumentative objective (i.e. integrating, systematizing, taking position) (Steinseifer 2014).

To conclude, especially the integration of the text procedures into the analysis grid may prove to be fruitful for the assessment of the development of both linguistic and argumentative skills. The conception merges central aspects of educational language such as lexical and grammatical knowledge as well as discourse/text competence, i.e. knowledge about the structure of text types and the conventional way of performing certain, text type-specific, language operations (cf. Lengyel 2010: 597).

Measuring the material-related quality of the argumentation

Overall, the way whether and how the document base is referred to in the text product is evaluated in section (3). We have already addressed the linguistic perspective. In parallel, (3.1) focusses on the geographical perspective by determining the “Correctness in terms of content”. In this context, it is first of all relevant whether the text refers to a specific material at all. Thus, the criterion was operationalized with the item “The student presents information from material M1/M2 etc. (yes/no)”. In the second step, the criterion of validity is applied, determining whether the information is presented incorrectly or correctly. In the quantitative evaluation, only a correct presentation scores 1 point. Finally, the degree of material interconnection is determined by the assessment of how intensively the different materials are related to each other. In the item “The student connects information from different material” monotextual reference and polytextual aggregation are differentiated (but not scored with points), as well a polytextual partial synthesis and polytextual synthesis are coded and scored with 1 and 2 points respectively.

Measuring the structural quality of the argumentation:

To measure the structural quality of an argumentative text, the approach of Toulmin (1996) has already been successfully used many times (e.g. Lam et al. 2018; Abdollahzadeh et al. 2017; Stapleton/Wu 2015). Based on these studies, statements are recorded as arguments that contain the basic structure: data, warrant and conclusion. As various studies have reported that data and warrant often cannot be analyzed separately (Clark/Sampson 2008), we summarized them as grounds. Matching this in the text analysis grid, the "analysis of completeness" (in section 4) includes all arguments consisting of opinion and reasoning.

Measuring the content related quality of the argumentation:

A fruitful theoretical approach to determine the quality of an argument and to establish its credibility comes from Kopperschmidt (2016: 62-64). His first criterion, validity, refers to the quality of the evidence. It is useful to examine whether an argument uses evidence that is considered to be valid and correct in the respective discipline. The second criterion, suitability, allows to analyze the fit between data and conclusion by the warrant. In other words: Is this rule suitable to conclude from this data to this conclusion? Finally, the presented conclusion has to fit to the given (problem) context, which can be analyzed by the criterion relevance. For the corresponding questions assessing relevance, suitability and validity of each counted argument in a student’s text, see again section (4).

In addition to structural elements that can be examined for their content quality across disciplines, it makes sense to draw up subject-specific criteria, since the different disciplines have different demands on the arguments that students have to consider (Budke et al. 2015). This also leads to the fact that the argumentation skills of students in different subjects can vary greatly (Budke et al. 2015). Here, we focus on how geography-related content criteria can be defined and operationalized (cf. Budke et al. 2015: 276):

In geography, socially controversial issues are often taken up, which are answered on the basis of different worldviews, the interests of the actors and their values and norms. In most cases, there is also no "right" result, but the argumentation is judged according to the extent to which it contains complex justifications and differentiated perceptions from multiple perspectives. The topics can only be analyzed and understood (criterion: multi-perspectivity) by looking at the actors and their everyday "geography-making" (Werlen 1995; 1999) as well as their different interests and perspectives (Rhode-Jüchtern 1995) which they express through argumentation in the context of social discourses (Felgenhauer 2007; Kuckuck 2014). From a disciplinary identity, it is furthermore crucial that a spatial perspective on the respective problem is used (criterion: spatial context). Not last, complex argumentations, including conditions under which the argument is valid and/or taking into account counter-arguments, are considered to be of particularly high quality (criterion: complexity).

In the text analysis grid, these subject-specific criteria are addressed in section (4) on the quality of argumentation in terms of content. For example, multi-perspectivity is assessed by the items “The student names relevant persons involved (yes/no)” and “The student describes the position of the persons involved correctly. (yes/no: incorrect or not at all)”. The (non-) realization is scored accordingly with 0 or 1 point; and in the same fashion for the criteria spatial reference and complexity.

So much for the description of the proposed text analysis grid: In this paragraph, we have shown how linguistic (language-related) and geographical (subject-related) perspectives on evaluation interconnect in an interdisciplinary way. In the next section, we address first results for application.

5 First empirical experience with the use of the presented text analysis grid

As already pointed out in the last paragraph, in the SpiGU project, argumentative texts of 19 students were analyzed for their quality. The evaluation of the text products was carried out by four trained raters using the text analysis grid. In the process, those items with an interrater agreement of less than 50% were revised. Finally, an interrater agreement of Cohens cappa .965 was achieved.

The assessment was both quantitative and qualitative. In a first step, we focused on the quantitative analysis, thus each individual category was scored. The resulting subtotals of the three central areas linguistic part (max. 52 points ), use of material (max. 45 points) and quality of argumentation (max. 8 points per argument) allowed for initial information about the present learning situation and challenges of the students in material-based argumentative writing. Subsequently, the individual subtotals could be added up to a total score which in turn permits statements about the overall quality of the individual texts. All totals, in the form of all students' provided writing performance, displayed a range of points (total score 8-77 points). In the first survey of the SpiGU project, about 3/4 of the students (74%) scored between 20 and 40 total points which shows that a large proportion of the students had great difficulty in formulating valid material-based arguments.

In order to reconstruct the present learning situation and the challenges faced by the students in an empirically more differentiated way, the quantitative assessment was supplemented by qualitative evaluation. For this purpose, the individual categories of the three central areas (linguistic part; use of material; quality of argumentation) were examined in depth. For example, zooming in on the linguistic perspective, the argumentative language operations (positioning, justification, etc.) and their respective text procedures can be examined group-specifically; i.e. the individual groups are first considered separately and then compared with each other ((i) all students; (ii) students with a special enhancement focus; (iii) students without a special focus) with regard to questions like: What percentage of the texts show the individual argumentative language operations? Furthermore, the language operations that were used predominantly in all texts (i.e. positioning and justifying) as well as those that rarely occurred were determined. In a next step, we investigated the functional use of the occurring operations; and we checked the language operations that are primary for written argumentation (e.g. justifying) with respect to the quality of their occurrence (i.e. What does the used justification refer to in the context of the conflict? To the material or to general knowledge? And what is the justification used for? To support the own position? Etc.). Last, for the grammar section, an error analysis concerning syntax, use of conjunctions, use of textual references and punctuation was also carried out.

In a similar way, a qualitative analysis in the areas of quality of argumentation and material use allows for insights on various levels. In the latter, we e.g. dealt with questions like: Which material and which types of material were most frequently used by the students? How was the material linked to each other (monotextual, polytextual)? And which actors/ positions were most frequently named in relation to the conflict? Here, too, the individual groups could be considered separately and then compared with each other.

To conclude: The presented interdisciplinary text analysis grid for material-based argumentative texts is complex and its application requires raters who are trained accordingly. But regarding the application within the range of research, the instrument answers a methodological desideratum and promises high knowledge gain both on quantitative and qualitative level precisely because of its interdisciplinary complexity. This means, it permits differentiated comparisons of various language- and subject-related components of argumentative texts and evaluates beyond that the material reference. Thus, the tool allows for precise statements about the students’ text qualities, i.e. statements about their mastery of argumentative writing in a material-based setting. To become concrete, in the context of the SpiGU research project, the text analysis grid serves as a central means of deducing challenge profiles for students with and without an enhancement focus in order to address their difficulties by developing supportive didactic material.

6 Conclusion and prospect on further research

Currently, there is a particular need for further empirical research on task deployment. Further quasi-experiments on the effect of task characteristics are just as desirable as the investigation of the ─ so far almost unexplored ─ question of which tasks are used in what way and with what effects in 'natural' German and/or Geography lessons. The combination of field-oriented task development and research on the use of tasks, also in cooperation with teachers, e.g. according to the Design-Based-Research approach, is a further research desideratum. In the context of tasks for material-based argumentative writing, a central need concerns the interdisciplinary development of methodical instruments for the analysis of writing processes and text products. Only when we have a better understanding of the challenges faced by different students when dealing with corresponding tasks, can learning tasks with appropriate support formats be developed.

In this paper, we presented a text analysis grid that was specifically developed for the research-related analysis of text products from material-based argumentative writing tasks in geography lessons. In this context, the grid closes a methodological gap and has already proven to be a suitable instrument with satisfactory interrater reliability, allowing for a differentiated qualitative and quantitative analysis of the language- and subject-related text levels. In addition, we would also claim that the grid is suitable or can be adapted for text product research in other subject areas. In this sense, future research has not only to further test the grid in the context of writing tasks in geography lessons; one could also explore the possibilities of adapting the tool for other subjects in which material-based argumentative writing plays a role. Both tasks require an interdisciplinary perspective. But it is precisely this interdisciplinary perspective that we would like to conclude by highlighting as central to research in the field of material-based writing and the development of appropriate methodological research tools.



Dr. Diana Gebele

teaches German linguistics and didactics at the University of Cologne. Her main research areas include second language acquisition, teaching German as a second language as well as learning and teaching German in inclusive classes.

Sarah Schwerdtfeger

is since 2019 PhD at the Institute for Geography Didactics at the University of Cologne. Her main interests include the research field of language-sensitive geography teaching.

Prof. Dr. Alexandra L. Zepter (Ph.D.)

teaches Linguistics, German Language and Didactics at the University of Cologne. Her main research interests include Learning German as a second language as well as teaching and learning German in heterogeneous and multilingual class-rooms and under the special conditions of inclusion.

Prof. Dr. Alexandra Budke

is professor for geography education at the Institute for geography education at the University of Cologne. Her research focuses on argumentation, language-sensitive geography teaching and digital media in geography education.


Pia Königs

is research assistant and doctoral candidate at the Institute for German Language and Literature II at the University of Cologne, where her dissertation project focuses on mechanisms of written discourse synthesis in the context of argumentative material-based writing in inclusive (Geography) classes.


  1. The majority of the students are multilingual with German as a second language; three students have an enhancement focus on learning and/or social-emotional development.

  2. To determine skills in multiple document comprehension (MDC), an MDC test might be of relevance as an alternative; however, to our knowledge, this is currently only available in standardized form for students; see Schoor et al. (2020).