Elementary Science

Introduction

There is a critical need to identify what young learners know and can do in early elementary science and engineering. Yet researchers lack a centralized source of high-quality assessments specifically designed for PreK–5 students. The goal of this collection is to help researchers understand what assessments already exist for early elementary science and to inform their selection of tools for their studies.

Collection Instruments

  • Exploratory Behavior Scale (EBS)

    Expert Notes
    Strengths:

    Captures the quality of young children’s hands-on exploration (e.g., passive contact vs. active manipulation vs. exploratory variation), providing more nuanced insight into learning-related behavior than simple time-based measures.

    Cautions:

    Relies on observational coding of children’s behavior, which requires trained observers and may introduce subjectivity or variability in scoring across different raters or settings.

    The Exploratory Behavior Scale (EBS) is an observational assessment tool designed to measure young children’s hands-on exploratory behavior in interactive learning environments. The instrument is designed to capture the quality of children’s interaction with materials and to enable comparisons…
  • Preschool Assessment of Science (PAS)

    Expert Notes
    Strengths:

    Performance-based and structured as a sequence of prediction–observation–prediction tasks, allowing it to capture children’s scientific reasoning processes and how they revise their thinking based on evidence.

    Cautions:

    Uses a relatively small number of items per task, which may limit reliability and the ability to capture the full range of children’s scientific understanding and variability in performance.

    The Preschool Assessment of Science (PAS) is a performance-based assessment designed to measure preschool children’s science knowledge and inquiry skills through hands-on tasks. The instrument focuses on children’s ability to engage in a cycle of scientific reasoning, including making predictions,…
  • Custom-built electronic toy to test causal structure knowledge

    Expert Notes
    Strengths:

    Uses a hands-on, manipulable physical system to elicit children’s reasoning, allowing researchers to directly observe how preschoolers infer and test causal relationships rather than relying solely on verbal explanations or abstract questions.

    Cautions:

    The custom-built device may be a barrier for some educators looking to use this assessment.

    This assessment is a task-based instrument designed to measure preschool children’s understanding of causal relationships through structured interaction with a physical system. The instrument uses a custom-built device consisting of two gears and a switch, where different underlying causal…
  • Life Science Assessment

    Expert Notes
    Strengths:

    The LSA uses photographs paired with open-ended, oral questions, which is developmentally appropriate for preschool children and allows them to express their understanding without relying on reading or writing skills.

    Cautions:

    Because the LSA relies on open-ended, verbally administered responses, scoring can be subjective and may vary between raters.

    The Life Science Assessment provides insight into children’s emerging inquiry skills and conceptual development by focusing on how they reason about real-world phenomena, rather than simply what they know. The assessment consists of structured, hands-on tasks centered on familiar physical science…

In this Collection

= multi-modal response
= physical manipulatives
= measures multi-dimensional learning
= one-on-one or small group
= artifacts
= practitioner-friendly
 
Computational Thinking
Earth and Space Science
Engineering / Tech or Robotics
Life Science
Physical Science
Science (General)
PreK

Computational Thinking

Earth and Space Science

Engineering / Tech or Robotics

Life Science

Physical Science

Science (General)

K-2 (Lower Elementary)

Computational Thinking

Earth and Space Science

Engineering / Tech or Robotics

Life Science

Physical Science

Science (General)

3-5 (Upper Elementary)

Computational Thinking

Earth and Space Science

Engineering / Tech or Robotics

Life Science

Physical Science

Science (General)

Collection Guidance

This section offers a concise primer for researchers looking to collect data on Pre-K through 5th grade students’ knowledge and skills in the science and engineering domains. It highlights key considerations and design choices that commonly arise when selecting or using standalone science and engineering assessments.

What should I know about collecting data on elementary students’ science knowledge and skills?

Influence of federal or state priorities: Following the National Research Council’s release of A Framework for K–12 Science Education in 2012 and the rollout of the Next Generation Science Standards (NGSS) starting in 2013, science education has moved beyond emphasizing rote memorization toward a three-dimensional approach that blends science and engineering practices, crosscutting concepts, and core disciplinary knowledge. Similarly, assessment of science education has followed, if at a slower pace, incorporating multidimensionality into assessments. 

Tensions and debates: (1) What is measured in science learning: A central limitation of the current landscape is the narrow conceptualization of science learning. Although NGSS emphasizes three-dimensional learning, most assessments continue to focus solely on isolated content knowledge without integrating science and engineering practices or crosscutting concepts. (2) How science learning is measured: Many of the assessments within this collection remain largely traditional in format, with heavy reliance on paper-pencil tests and text based item formats such as constructed, multiple choice or selected response items. Other assessments within this collection employ multi-modal, performance-based, or interactive assessments, which are better suited for capturing complex, practice-based learning. (3) How results are scored or used: First, variability in reporting reliability and validity raises concerns about the consistency and defensibility of score interpretations. Second, limited reporting of sample characteristics, particularly for historically underserved populations, restricts understanding of how assessments function across diverse groups. 

Gaps in Measurement Tool Availability: Most assessments were developed and used in the United States (83%), with limited international representation. All identified assessments were primarily available in only English. Only a small number offered translations into other languages, such as Spanish, German, French and Portuguese. Reporting of sample characteristics was not consistent. While many studies included sample size (68%) and grade or age information (66%), fewer reported school type (37%), geographic location (39%), gender (32%), race (37%), or socioeconomic indicators (17%). Information on specific student populations, such as English Language Learners (20%) and students with disabilities (10%), was particularly limited. 

Grade-level coverage was concentrated in the upper elementary grades. Assessments most frequently targeted grades 3 through 5, with 43% including third grade, 40% fourth grade, and 48% fifth grade. In contrast, fewer assessments were designed for early childhood and primary grades, such as Pre-K (13%) and kindergarten (18%). This pattern highlights a persistent gap in assessment tools for younger learners, despite increasing emphasis on early science education. 

Across disciplines, assessments were most frequently situated in the physical sciences (38%), life sciences (33%), and general science (28%), with more limited representation in earth and space science (18%). A smaller subset of assessments addressed engineering and technology (18%) and computational thinking (18%), while robotics (3%) and science inquiry (5%) were minimally represented. In spite of the majority of assessments being developed between 2016-2025, reference of the Next Generation Science Standards (NGSS) was low. Less than one-third of assessments (30%) explicitly referenced NGSS, and among those that did, coverage of the three dimensions varied. Disciplinary Core Ideas were most frequently represented (100% of the NGSS referenced assessments), followed by Science and Engineering Practices (92%) and Crosscutting Concepts (62%), with small subset of assessments (n=3) including to connections involving engineering, technology, and society (23%). 

Notably, only 33% of these assessments measured multidimensional learning, indicating that most tools continue to assess science learning in a unidimensional manner. 

There’s a need for more science assessments in PreK-2 space. While Physical Sciences and Life Sciences are well represented, Earth and Space Science assessments are fewer. 

Engineering Assessments’ spread across K-5 is better. Pre-K assessments are fewer as in science. Computational thinking as a domain is well represented while other engineering constructs are underrepresented.

What does this collection include?
Criterion
Inclusion Criteria
Examples of Excluded Assessments
Year of Publication
Inclusion Criteria Assessments must be published on or after 2004
Examples of Excluded Assessments Assessments published before 2004.
Content Focus
Inclusion Criteria Assessments center on science and/or engineering content (including computational thinking). This includes 3D measures aligned with NGSS disciplines (LS, PS, ESS, ETS) and practices and CCCs in the context of science and/or engineering. Measures on computational thinking and robotics were counted as engineering assessments.
Examples of Excluded Assessments Instruments on science attitudes, motivation, self-efficacy, or classroom environments. See the Science Instruction and Identity Collection for some of these instruments.
Grade Level
Inclusion Criteria Assessments must be applicable to children in PreK-5.
Examples of Excluded Assessments Assessments that are exclusively designed for use with students in grade 6 or above or explicitly mentioned that the assessment is for students in middle school, high school, and college.
Unit of Analysis
Inclusion Criteria Assessments can be administered on an individual level. At this stage of search, we also include observational instruments of children.
Examples of Excluded Assessments Collaborative or group assessments. Classroom observations.
Standalone Assessment
Inclusion Criteria Assessments must be able to be used independently. Assessments should not be tied to a curriculum or a unit or lesson plan. Any content tied to the assessment should not instruct but provide background for answering questions (e.g., scenarios).
Examples of Excluded Assessments Assessments attached to a curriculum or reference an instruction.
Language and Geography
Inclusion Criteria Assessments must be available in English and suitable for use with children in the United States. All assessments need to be developed in the United States and/or validated in international settings where the dominant language is English.
Examples of Excluded Assessments Only have descriptions of the assessment or validation research in English, but not both. Assessments were developed internationally and validated exclusively on populations where the dominant language is not English.
Purpose
Inclusion Criteria Assessments can be used for research and must not be intended for use exclusively in classroom or school contexts.
Examples of Excluded Assessments Assessments designed exclusively for teachers to inform instructions or curriculum design.
Development Process
Inclusion Criteria Assessments developed by assessment experts, consortia, or researchers (can include teachers as part of a larger team).
Examples of Excluded Assessments Assessments developed by teachers only and were intended to use only in a specific class or school context.