This section offers a concise primer for researchers looking to collect data on Pre-K through 5th grade students’ knowledge and skills in the science and engineering domains. It highlights key considerations and design choices that commonly arise when selecting or using standalone science and engineering assessments.
Introduction
There is a critical need to identify what young learners know and can do in early elementary science and engineering. Yet researchers lack a centralized source of high-quality assessments specifically designed for PreK–5 students. The goal of this collection is to help researchers understand what assessments already exist for early elementary science and to inform their selection of tools for their studies.
Collection Instruments
Exploratory Behavior Scale (EBS)
Expert NotesStrengths:Captures the quality of young children’s hands-on exploration (e.g., passive contact vs. active manipulation vs. exploratory variation), providing more nuanced insight into learning-related behavior than simple time-based measures.
Cautions:Relies on observational coding of children’s behavior, which requires trained observers and may introduce subjectivity or variability in scoring across different raters or settings.
Topics: Student LearningThe Exploratory Behavior Scale (EBS) is an observational assessment tool designed to measure young children’s hands-on exploratory behavior in interactive learning environments. The instrument is designed to capture the quality of children’s interaction with materials and to enable comparisons…Preschool Assessment of Science (PAS)
Expert NotesStrengths:Performance-based and structured as a sequence of prediction–observation–prediction tasks, allowing it to capture children’s scientific reasoning processes and how they revise their thinking based on evidence.
Cautions:Uses a relatively small number of items per task, which may limit reliability and the ability to capture the full range of children’s scientific understanding and variability in performance.
Topics: Student LearningThe Preschool Assessment of Science (PAS) is a performance-based assessment designed to measure preschool children’s science knowledge and inquiry skills through hands-on tasks. The instrument focuses on children’s ability to engage in a cycle of scientific reasoning, including making predictions,…Custom-built electronic toy to test causal structure knowledge
Expert NotesStrengths:Uses a hands-on, manipulable physical system to elicit children’s reasoning, allowing researchers to directly observe how preschoolers infer and test causal relationships rather than relying solely on verbal explanations or abstract questions.
Cautions:The custom-built device may be a barrier for some educators looking to use this assessment.
Topics: Student LearningThis assessment is a task-based instrument designed to measure preschool children’s understanding of causal relationships through structured interaction with a physical system. The instrument uses a custom-built device consisting of two gears and a switch, where different underlying causal…Life Science Assessment
Expert NotesStrengths:The LSA uses photographs paired with open-ended, oral questions, which is developmentally appropriate for preschool children and allows them to express their understanding without relying on reading or writing skills.
Cautions:Because the LSA relies on open-ended, verbally administered responses, scoring can be subjective and may vary between raters.
Topics: Student LearningThe Life Science Assessment provides insight into children’s emerging inquiry skills and conceptual development by focusing on how they reason about real-world phenomena, rather than simply what they know. The assessment consists of structured, hands-on tasks centered on familiar physical science…
In this Collection
Collection Guidance
Influence of federal or state priorities: Following the National Research Council’s release of A Framework for K–12 Science Education in 2012 and the rollout of the Next Generation Science Standards (NGSS) starting in 2013, science education has moved beyond emphasizing rote memorization toward a three-dimensional approach that blends science and engineering practices, crosscutting concepts, and core disciplinary knowledge. Similarly, assessment of science education has followed, if at a slower pace, incorporating multidimensionality into assessments.
Tensions and debates: (1) What is measured in science learning: A central limitation of the current landscape is the narrow conceptualization of science learning. Although NGSS emphasizes three-dimensional learning, most assessments continue to focus solely on isolated content knowledge without integrating science and engineering practices or crosscutting concepts. (2) How science learning is measured: Many of the assessments within this collection remain largely traditional in format, with heavy reliance on paper-pencil tests and text based item formats such as constructed, multiple choice or selected response items. Other assessments within this collection employ multi-modal, performance-based, or interactive assessments, which are better suited for capturing complex, practice-based learning. (3) How results are scored or used: First, variability in reporting reliability and validity raises concerns about the consistency and defensibility of score interpretations. Second, limited reporting of sample characteristics, particularly for historically underserved populations, restricts understanding of how assessments function across diverse groups.
Gaps in Measurement Tool Availability: Most assessments were developed and used in the United States (83%), with limited international representation. All identified assessments were primarily available in only English. Only a small number offered translations into other languages, such as Spanish, German, French and Portuguese. Reporting of sample characteristics was not consistent. While many studies included sample size (68%) and grade or age information (66%), fewer reported school type (37%), geographic location (39%), gender (32%), race (37%), or socioeconomic indicators (17%). Information on specific student populations, such as English Language Learners (20%) and students with disabilities (10%), was particularly limited.
Grade-level coverage was concentrated in the upper elementary grades. Assessments most frequently targeted grades 3 through 5, with 43% including third grade, 40% fourth grade, and 48% fifth grade. In contrast, fewer assessments were designed for early childhood and primary grades, such as Pre-K (13%) and kindergarten (18%). This pattern highlights a persistent gap in assessment tools for younger learners, despite increasing emphasis on early science education.
Across disciplines, assessments were most frequently situated in the physical sciences (38%), life sciences (33%), and general science (28%), with more limited representation in earth and space science (18%). A smaller subset of assessments addressed engineering and technology (18%) and computational thinking (18%), while robotics (3%) and science inquiry (5%) were minimally represented. In spite of the majority of assessments being developed between 2016-2025, reference of the Next Generation Science Standards (NGSS) was low. Less than one-third of assessments (30%) explicitly referenced NGSS, and among those that did, coverage of the three dimensions varied. Disciplinary Core Ideas were most frequently represented (100% of the NGSS referenced assessments), followed by Science and Engineering Practices (92%) and Crosscutting Concepts (62%), with small subset of assessments (n=3) including to connections involving engineering, technology, and society (23%).
Notably, only 33% of these assessments measured multidimensional learning, indicating that most tools continue to assess science learning in a unidimensional manner.
There’s a need for more science assessments in PreK-2 space. While Physical Sciences and Life Sciences are well represented, Earth and Space Science assessments are fewer.
Engineering Assessments’ spread across K-5 is better. Pre-K assessments are fewer as in science. Computational thinking as a domain is well represented while other engineering constructs are underrepresented.