State

The study determined the dimensionality of 2017 National Examinations Council (NECO) English Language multiple-choice test item and estimated the item parameter indices (discrimination, difficulty, guessing and carelessness) using four parameter logistic model. The ex-post facto design was employed for the study. The population for the study comprised all candidates/test-takers who enrolled and sat for June/July Senior School Certificate Examination (SSCE) 2017 NECO English Language Examination in Kwara State, Nigeria with 12,000 samples purposively selected from sixteen Local Government Area in the State. The research instruments used for the study were Optical Marks Record Sheets for the NECO June/July 2017 English Language objectives items. The responses of the testees were scored dichotomously. The data collected were calibrated using four parameters logistic model. The results showed that the 2017 English Language multiple-choice item among SSCE students in Kwara State does not violate the assumption of unidimensionality which made the items reliable for use in assessing knowledge of students in English language. Also, the results showed that only two items were able to suit the 4-PLM based on the rule of thumb. While the remaining items does not suit the 4-PLM. It was recommended among others that NECO and other examination bodies should intensify more efforts toward improving the standard of the English Language test items using 4-PLM, which is the new trend for estimating item parameter indices.


Introduction
English language as a school subject plays a strategic role in the school system because almost all the school subjects are taught in English in countries where the English Language is an official language.A child cannot learn most of the elementary facts or ideas unless he/she understands the language in which these ideas are expressed.Thus, at the Senior School Certificate Examination (SSCE) level, a credit pass or failure in the subject (English Language) determines the educational advancement of Senior Secondary School (SSS) students.The achievement of students in English Language most especially in external examination has been a source of concern to education stakeholders (including parents, teachers, educators and researchers).Students must have credit in Mathematics and English Language which are major requirements in most courses before admission into tertiary institutions.Given this requirement, there is need to determine the level of achievement in both subjects across the states in Nigeria based on the geopolitical zones, for a clear picture of the entire situation (Atanda, 2011).The alarming rate of failure and poor performance of students in English language in most external examinations has been a source of concern to the education stakeholders.The West Africa Examination Council (WAEC) in 2014 traced the persistent poor performance of students in the examination to lack of adequate preparation.Many scholars also blame the poor performance of students in English language on the fact that learners are not willing to read or lack of adequate reading material to engage with.Many students find it difficult to interact meaningfully with reading materials before them.Thus, the consistent poor performance of students in external examinations such as WAEC and National Examinations Council (NECO) particularly in English Language raises serious concern.
A sample of test taker's action is frequently observed using psychological and educational assessments.Majority of educational psychologists concerned with assessing capabilities and abilities of test takers, as a result, understanding how test-takers 'ability influence the correctness of an answer on an item (Lord, 2012).An examinee having the required knowledge on the item is expected to produce a correct response on test item, whereas an examinee without the required information on the item is expected to give an incorrect answer.However, in the case of multiple-choice assessments, this known-correct assumption may not always correctly reflect what happens in the real sense.An examinee's responses on a multiple-choice test can be divided into three categories: responses that reflect the examinee's genuine ability (e.g., accurate or incorrect); correct responses resulting from lucky guesses; and false responses arising from nervousness, carelessness, or distraction.Because the latter two sorts of aberrant responses do not reflect the examinee's actual knowledge, they may lead to an incorrect evaluation of the examinee's true aptitude.Test items were often weighted equally in traditional tests, the impact of abnormal responses was restricted.In the case of item response theory, on the other hand, lucky predictions and thoughtless errors can lead to an estimation bias (Liao, Ho, Yen, & Cheng, 2012).

Literature Review
In Social Sciences, item response theory (IRT) models are commonly employed.Although IRT models were first used in education, they are now used in a variety of fields, including personality (Loken & Rulison, 2010).In a multiple-choice exam, IRT is critical for scale development and generating correct latent trait estimations.With the rise in IRT applications comes the need to carefully analyze various parametric forms for IRT models, as well as their interpretation and implications for conclusions.Modeling based on item response theory (IRT) has a lengthy history and a large body of work (Baker & Kim, 2004).Item response theory (IRT) is a method of measuring a hypothesized latent construct such as ability or attitude in current educational and psychological settings.These hidden characteristics can't be measured directly on people; instead, they have to be measured through replies to items or questions in a test or survey.In a test or survey, IRT methods are widely used to derive latent scores for individual respondents on traits, abilities, competency, or attitude.In a testing environment, IRT is arguably best understood in terms of latent trait ability.In fact, educational testing was one of the first areas where IRT was used.The IRT scoring method takes into account the respondent's latent variable as well as the item's difficulty and discriminatory characteristics.IRT is employed in a variety of domains, including psychometrics, educational sciences, sociology, health professions, and computer adaptive testing (CAT).In addition, because IRT models employ information about an item's attributes to evaluate and refresh an instrument, it can be used in test or instrument development.
Abnormal responses might lead to ability estimation errors, incorrect items may be chosen.The three-parameter logistic (3PL) model is an appropriate answer for evaluating an examinee's ability in circumstances when guessing is likely to be a factor affecting the examinee's test responses (Amarnani, 2009).To model the effect of guessing, the 3PL model includes a guessing parameter (the lower asymptote).
According to Ojerinde (2013), latent traits are setup as a variable that is not directly detectable yet quantifiably affects discernible attributes.Through the perception of these attributes, it is feasible to make surmises about the presence or extent of these qualities by standard measurements which are the test items.The connection between these items and the latent trait is thought to be direct and the items are thought to be restrictively independent.The response to an item ought to be represented completely by this latent trait.Any covariance among the items is because of their regular reliance on the accepted latent trait and there ought to be no covariance among item responses along some other latent measurement for the basis of objectivity to be fulfilled.A later conceptualisation of the basic connection between response to an item and the ability controlled by an individual is the item response theory, which is probabilistic in its methodology.The general of the IRT Model as follow: Rasch model defines the following: where e denotes a constant 2.718, where j is the power level and bi is the individual item's trouble boundary.The position boundary is the item trouble boundary bi, since it reflects the ability level at which a large percentage of examinees correctly address the item.Although the theoretical range of bi estimation is from to, the average range is assumed to be from -3 to +3.The two-boundary logistic model adds one item boundary, the discrimination boundary, to the Rasch model and is defined as where ai is the discrimination parameter, and 1.7 is the scaling constant that determines the ICC's slope or steepness.While the theoretical range of ai is I a, the value of ai for the correct answer to an item is generally positive, and the value seen in practice is normally less than 2.5.(Baker & Kim, 2004).A low ai value and a relatively flat ICC indicate that the item is ineffective at distinguishing between different skill levels.The threeparameter model's equation is where ci is the pseudo-guessing parameter, which reflects the chance of correctly guessing the object using just guessing.
To simulate a parameter for the upper asymptote in the item characteristic curve, Barton and Lord (1981) suggested 4PL IRT.Due to nervousness and carelessness, this model accounts for unexpected wrong responses (missing) of examinees with a high ability level.The probability of accurate response given the ability level is written as equation in the general form of this model.
Where P4PL (θ) ranges from the lower asymptote c to 1, and P4PL (θ) ranges from c to the upper asymptote parameter d. (i.e., slipping parameter) of item Although Barton and Lord (1981) advocated for a single upper asymptote for all test items, the basic form of the 4PL model allowed for a distinct upper asymptote to be estimated for each test item.In the last decade, one-, two-, and three-parameter logistic (1PL, 2PL, and 3PL) IRT models for dichotomous items have gotten a lot of attention (Magis, 2013).Recently, the 4PL IRT model was not a widely used IRT model among practitioners and researchers due to a lack of evidence for its benefits, difficulties in estimating the upper asymptote, and the lack of computer software programs that practitioners and researchers could use to implement the 4PL IRT model (Loken & Rulison, 2010).With the emergence of increasingly powerful computer software packages such as the "mirt" package in the R program, the 4PL IRT model has grown more prominent in recent years, notably in the literature on IRT and computerized adaptive testing (CAT) (Meng et al., 2019).
Item Response Theory is based on three fundamental assumptions, the first of which is unidimensionality.Unidimensional item response models are those that assume a single latent ability."What is necessary for the unidimensionality assumption to be met satisfactorily is the presence of one dominant factor that effects test performance," Adedoyin and Adedoyin (2013) said.The second assumption is local independence, which states that an examinee's chances of correctly answering a question are unaffected by the answers given to other questions on the test.The aim of researching a test's internal structure is to demonstrate that all of the items operate together, hence the art of assessing dimensionality is to determine the least number of latent ability domains defined in a test.According to Stevina (2011), the number of abilities or constructs measured by a test or a set of items is referred to as dimensionality in assessment.In light of the foregoing, Stevina (2011) defined dimensional structure as the relationship between the test items and the latent proficiencies that the test is supposed to measure.
According to McDonald (2000), the issue of dimensionality entails more than (successfully) identifying a set of proficiencies that explain for item responses.He pointed out that, in addition to determining the number of dimensions that underpin the item responses, the relationship between the items and dimensions is critical in dimensionality evaluation.
In assessment settings, a set of items is considered to be unidimensional if the data is based on a single characteristic, but multidimensional if the data is based on numerous traits.The Multidimensional IRT (MIRT) is a mathematical model that describes the relationship between two or more unobservable variables defined as dimensions and the chance of an examinee successfully answering a specific test item (Ackerman, Gierl & Walker, 2003).Items on an exam may evaluate multiple domains of abilities; nevertheless, this is not an issue as long as the evaluation is assessing the same composite for all students.
Testee-item interaction may cause different composites of ability to be measured for testees with varied backgrounds on particular exams.Multidimensional models, like unidimensional models, are based on two assumptions.Monotonicity and Local Independence are two of them.According to the monotonicity assumption, as an examinee's skill level rises, so does the likelihood of the examinee properly answer any given test item (Smith, 2009).One of the primary goals of education in Nigeria, as stated in the National Policy on Education (FGN, 2004), is to prepare young people for future difficulties and to develop them to satisfy the country's manpower demands.As a result, conducting examinations both inside and outside of schools as a basis for assessment becomes extremely important.
In this context, the study was conducted to determine the dimensionality of 2017 NECO English Language multiple-choice test item and estimate the item parameter indices (discrimination, difficulty, guessing and carelessness) using four parameter logistic model in order to ascertain how the items suitable for the 4-PLM which is a new trend in estimating item parameter indices.
Based on the above objectives, the following research questions were raised.(a) What is the dimensionality of 2017 NECO English Language multiple choice test items among SSCE students in Kwara State?(b) What are the item parameter indices (discrimination, difficulty, guessing and carelessness) using four parameter logistic model?

Methodology
The ex-post facto design was employed for the study.The population for the study comprised all candidates/test-takers who enrolled and sat for June/July SSCE 2017 National Examinations Council (NECO) English Language Examination in Kwara State, Nigeria.The sample comprised 12,000 candidates who sat for the examination in three senatorial districts in the state (i.e Kwara South, Kwara North and Kwara Central).The sample purposively selected from sixteen Local Government Area in the state.This technique was used to ensure a fairly equal representation of sample.The research instruments used for the study were Optical Marks Record Sheets for the National Examination Council (NECO) June/July 2016 SSCE English Language objectives items.The responses of the testees were scored dichotomously.Data collected were analysed using Dimensionality Test (DIMTEST) package.

Research Question One
What is the dimensionality of 2017 NECO English Language multiplechoice test items among SSCE students in Kwara State?
To answer the research question one, the dimensionality of the 2017 NECO English Language multiple-choice test items among SSCE students in Kwara State, examinees' responses were subjected to a test of essential unidimensionality using the Dimensionality Test (DIMTEST) in DIMPACK 1.0 package The result of the test of essential unidimensionality was used to examine the assumption of unidimensionality of the 2017 NECO English Language test items form a secondary dimension.This was done by differentiating the test into two subtests, namely the Assessment Subtest (AT) and the Partitioning Subtest (PT).The AT is the item chosen as those that measures best in the direction most opposite to that of the PT items.The AT was empirically selected using the HCA/CCPROX cluster procedure and DETECT statistic in DIMTEST.This item cluster was tested to determine if it was dimensionally distinct from the remainder of the test.A random sample of 30% of the students' responses was used to select the AT, and the remaining 70% of the examinees' responses were used as PT.p > 0.05 is interpreted as statistically insignificant indicating essential unidimensionality.Table one showed that the AT was not dimensionally distinct from the remaining item of the test (T = 0.7426, p-value = 0.2289, one-tailed); therefore, the assumption of unidimensionality was upheld.This implies that the 2017 English Language multiple-choice item among SSCE students in Kwara State does not violate the assumption of unidimensionality.

Research Question Two
What are the item parameter indices (discrimination, difficulty, guessing, and carelessness) using the 4 parameters logistic model?
To answer this research question, the calibration of students' responses in the 2017 English Language multiple-choice items among SSCE students in Kwara State using a 4-parameter logistic IRT model in JMetrik (Baker & Kim, 2004;Adeyemow & Opesemowo, 2020).The result is presented in Table 2 showing the discrimination, difficulty, guessing, and carelessness parameters of individual items.  2 showed the 2017 NECO English Language items using the fourparameter logistic model.It can be deduced that all the 100 items were calibrated using IRT 4-PLM in JMetrik statistical software.However, the rule of thumb for any item to be considered as being suitable must fulfill the certain condition of the parameter indices.The a-parameter is a discrimination index that must be greater than 0.2, the b-parameter is the difficulty index that ranges from -3 to +3, and the c-parameter is the guessing parameter which must be less than 0.35, while the u-parameter is the carelessness and should not be greater than 0.75.Subsequently, each item must fulfill the criteria about the rule of thumb for the overall remark (Baker & Kim, 2004).Based on the result in Table 2, it could be seen that only two items (i.e., items 11 and 16) were able to satisfy the rule of thumb while the remaining 98 items does not suit the 4-PLM because they violated the rule of thumb in one way or the other.

Discussion of the Findings
The results showed that the 2017 English Language multiple-choice item among SSCE students in Kwara State does not violate the assumption of unidimensionality.The results of this finding were in agreement with Jimoh, (2021) who reported that 2016 NECO Mathematics test was essentially unidimensional.The finding of this study also in line with studies of Jiao (2004); Tomblim and Zhang (2006) and Deng, Wells and Hambleton (2008).However, the finding against the studies of Jang and Roussons (2007); Li, Jiao, and Lissitz (2012).The findings of their studies showed a clear violation of unidimensionality assumption in the tests assessed.The findings also corroborate with the finding of studies of David et al., (2017) the findings of their studies showed that NECO Senior School Certificate June/July Multiple-choice Objective Tests in Government for the years 2013 and 2014 conformed to the assumption of unidimensionality.Another result of the study showed that only two items (i.e., items 11 and 16) were able to satisfy the rule of thumb while the remaining 98 items does not suit the 4-PLM because they violated the rule of thumb in one way or the other.However, this does not affect the validity and reliability of the 2017 NECO English multiple-choice questions because the item does not violate the assumption of unidimensionality.This study is in line with the observation of Jimoh ( 2022) that 2016 NECO Mathematics test items were suitable items as their difficulty parameters were within the range (-2 to 2) for which items difficulty parameter estimates are considered suitable, majority of the items were not suitable with discriminating power and almost test items were not vulnerable to guessing.

Conclusion and Recommendations
Based on the findings of this study, it was concluded that the 2017 English Language multiple-choice item among SSCE students in Kwara State does not violate the assumption of unidimensionality.It was recommended that the National Examinations Council (NECO) should intensify more efforts toward improving the standard of the English Language test items using 4-PLM because is the new trend for estimating item parameter indices (discrimination, difficulty, guessing, and carelessness).It is also recommended that all examining bodies (such as West African Examination Council, NABTEB etc) using multiple-choice test instruments should be encouraged to use 4-PLM of the item response theory approach when developing test items.

Table 1
Dimensionality of 2017 NECO English Language Test Items