Data sgp is an open source platform for aggregating and analyzing large scale educational data sets. Its goal is to enable researchers across the country and beyond access to large quantities of raw student data, along with the tools needed to extract meaning from it. Unlike traditional research databases that use standard database structures and query languages, data sgp is a custom relational database engine designed to handle very large, complex datasets that need to be queried with a high degree of flexibility.
Its underlying architecture allows it to handle massive amounts of data and provides several important advantages over other platforms, including a highly scalable infrastructure, support for large numbers of concurrent users, and the ability to run sophisticated computational analyses at scale. It also has the advantage of being able to ingest and process datasets on the order of millions of records per day without losing any analytical accuracy. In comparison, an analysis of global Facebook interactions may require thousands of computer servers to execute.
Despite its huge size, the sgp database remains relatively easy to work with. It uses simple, SQL-like commands and is fully accessible through the python IDE environment. In addition, it supports the use of multiple python modules and scripts to perform different tasks, making it possible to customize the data sgp experience for each user.
The most commonly used functions for data sgp are studentGrowthPercentiles and studentGrowthProjections. These higher level wrapper functions are intended to simplify and expand the capabilities of the lower level data sgp functions, allowing them to be run more operationally and at greater scale. The higher level functions can operate on both WIDE and LONG formatted data, but it is strongly recommended that you use the LONG format for all operations except the simplest, one-off analyses.
Students’ MCAS scaled score histories are compared to those of academic peers from previous MCAS administrations of the same subject area in order to calculate SGP. This is why students with identical MCAS scaled scores can have different SGPs; for example, Students B and C both scored the same on this year’s MCAS test in a subject area but have different SGPs because they have different academic peer groups.
For each student, the current SGP is calculated using at least two assessments from different testing windows (fall, winter, spring; the testing windows do not have to correspond to a school or district’s school year). The data set sgpData_INSTRUCTOR_NUMBER is an anonymized student-instructor lookup table that identifies insturctor information associated with each student’s test record.