ROBOT Report

Level	Number of errors
INFO	395
WARN	24

Rule	Number of errors
lowercase_definition	395
equivalent_class_axiom_no_genus	18
duplicate_exact_synonym	2
multiple_equivalent_classes	2
equivalent_pair	1
missing_definition	1

Row	Level	Rule Name	Subject	Property	Value
0	WARN	duplicate_exact_synonym	STATO:0000636	IAO:0000118	NNS
1	WARN	duplicate_exact_synonym	STATO:0000637	IAO:0000118	NNS
2	WARN	equivalent_class_axiom_no_genus	STATO:0000027	OBI:0000417	STATO:0000121
3	WARN	equivalent_class_axiom_no_genus	STATO:0000033	OBI:0000312	OBI:0200117
4	WARN	equivalent_class_axiom_no_genus	STATO:0000085	OBI:0000295	STATO:0000175
5	WARN	equivalent_class_axiom_no_genus	STATO:0000119	OBI:0000299	STATO:0000144
6	WARN	equivalent_class_axiom_no_genus	STATO:0000131	OBI:0000417	STATO:0000183
7	WARN	equivalent_class_axiom_no_genus	STATO:0000133	BFO:0000062	OBI:0200201
8	WARN	equivalent_class_axiom_no_genus	STATO:0000137	OBI:0000417	STATO:0000226
9	WARN	equivalent_class_axiom_no_genus	STATO:0000191	OBI:0000417	STATO:0000224
10	WARN	equivalent_class_axiom_no_genus	STATO:0000202	OBI:0000417	STATO:0000253
11	WARN	equivalent_class_axiom_no_genus	STATO:0000247	OBI:0000417	STATO:0000173
12	WARN	equivalent_class_axiom_no_genus	STATO:0000279	OBI:0000417	STATO:0000255
13	WARN	equivalent_class_axiom_no_genus	STATO:0000337	OBI:0000299	STATO:0000485
14	WARN	equivalent_class_axiom_no_genus	STATO:0000443	OBI:0000417	STATO:0000439
15	WARN	equivalent_class_axiom_no_genus	STATO:0000471	STATO:0000403	STATO:0000039
16	WARN	equivalent_class_axiom_no_genus	STATO:0000573	OBI:0000312	OBI:0200170
17	WARN	equivalent_class_axiom_no_genus	STATO:0000574	OBI:0000312	OBI:0200181
18	WARN	equivalent_class_axiom_no_genus	STATO:0000697	OBI:0000417	STATO:0000173
19	WARN	equivalent_class_axiom_no_genus	STATO:0000742	IAO:0000219	STATO:0000225
20	WARN	equivalent_pair	STATO:0000247	owl:equivalentClass	STATO:0000697
21	WARN	missing_definition	STATO:0000741	IAO:0000115
22	WARN	multiple_equivalent_classes	STATO:0000247	owl:equivalentClass	STATO:0000697
23	WARN	multiple_equivalent_classes	STATO:0000247	owl:equivalentClass	blank node
24	INFO	lowercase_definition	STATO:0000001	IAO:0000115	property to indicate that a design declares a variable; the inverse property is 'is declared by'@en
25	INFO	lowercase_definition	STATO:0000002	IAO:0000115	an electronic file is an information content entity which conforms to a specification or format and which is meant to hold data and information in digital form, accessible to software agents@en
26	INFO	lowercase_definition	STATO:0000003	IAO:0000115	a balanced design is a an experimental design where all experimental group have the an equal number of subject observations@en
27	INFO	lowercase_definition	STATO:0000004	IAO:0000115	property to indicate the variables declared by a design; the inverse property is 'declares'@en
28	INFO	lowercase_definition	STATO:0000005	IAO:0000115	a single factor design is a study design which declares exactly 1 independent variable@en
29	INFO	lowercase_definition	STATO:0000006	IAO:0000115	x-axis is a cartesian coordinate axis which is orthogonal to the y-axis and the z-axis@en
30	INFO	lowercase_definition	STATO:0000007	IAO:0000115	an axis is a line graph used as reference line for the measurement of coordinates.@en
31	INFO	lowercase_definition	STATO:0000008	IAO:0000115	y-axis is a cartesian coordinate axis which is orthogonal to the x-axis and the z-axis@en
32	INFO	lowercase_definition	STATO:0000011	IAO:0000115	a cartesian axis is one of 3 the axis in a cartesian coordinate system defining a referential in 3 dimensions. each of the axis is orthogonal to the other 2@en
33	INFO	lowercase_definition	STATO:0000012	IAO:0000115	z-axis is a cartesian coordinate axis which is orthogonal to the x-axis and the y-axis@en
34	INFO	lowercase_definition	STATO:0000013	IAO:0000115	a 2 dimensional cartesian coordinate system is a cartesian coordinate system which defines 2 orthogonal one dimensional axes and which may be used to describe a 2 dimensional spatial region.
35	INFO	lowercase_definition	STATO:0000019	IAO:0000115	normal distribution hypothesis is a goodness of fit hypothesis stating that the distribution computed from the sample population fits a normal distribution.@en
36	INFO	lowercase_definition	STATO:0000021	IAO:0000115	a confidence interval which covers 90% of the sampling distribution, meaning that there is a 90% risk of false positive (type I error)@en
37	INFO	lowercase_definition	STATO:0000024	IAO:0000115	a three dimensional cartesian coordinate system is a cartesian coordinate system which defines 3 orthogonal one dimensional axes and which may be used to describe a 3 dimensional spatial region.
38	INFO	lowercase_definition	STATO:0000027	IAO:0000115	linkage between 2 categorical variable test is a statistical test which evaluates if there is an association between a predictor variable assuming discrete values and a response variable also assuming discrete values@en
39	INFO	lowercase_definition	STATO:0000028	IAO:0000115	measure of variation or statistical dispersion is a data item which describes how much a theoritical distribution or dataset is spread.@en
40	INFO	lowercase_definition	STATO:0000029	IAO:0000115	a measure of central tendency is a data item which attempts to describe a set of data by identifying the value of its centre.@en
41	INFO	lowercase_definition	STATO:0000031	IAO:0000115	binary classification (or binomial classification) is a data transformation which aims to cast members of a set into 2 disjoint groups depending on whether the element have a given property/feature or not.@en
42	INFO	lowercase_definition	STATO:0000032	IAO:0000115	an alternative term used for STATO statistical ontology and ISA team@en
43	INFO	lowercase_definition	STATO:0000034	IAO:0000115	a model parameter is a data item which is part of a model and which is meant to characterize an theoritecal or unknown population. a model parameter may be estimated by considering the properties of samples presumably taken from the theoritecal population@en
44	INFO	lowercase_definition	STATO:0000035	IAO:0000115	the range is a measure of variation which describes the difference between the lowest score and the highest score in a set of numbers (a data set)
45	INFO	lowercase_definition	STATO:0000038	IAO:0000115	a set of 2 subjects which result from a pairing process which assigns subject to a set based on a pairing rule/criteria@en
46	INFO	lowercase_definition	STATO:0000039	IAO:0000115	a statistic is a measurement datum to describe a dataset or a variable. It is generated by a calculation on set of observed data.@en
47	INFO	lowercase_definition	STATO:0000040	IAO:0000115	an MA plot is a scatter plot of the log intensity ratios M = log_2(T/R) versus the average log intensities A = log_2(T*T)/2, where T and R represent the signal intensities in the test and reference channels respectively.@en
48	INFO	lowercase_definition	STATO:0000041	IAO:0000115	a R command syntax or link to a R documentation in support of Statistical Ontology Classes or Data Transformations@en
49	INFO	lowercase_definition	STATO:0000043	IAO:0000115	a false positive rate whose value is 5 per cent@en
50	INFO	lowercase_definition	STATO:0000044	IAO:0000115	one-way anova is an analysis of variance where the different groups being compared are associated with the factor levels of only one independent variable. The null hypothesis is an absence of difference between the means calculated for each of the groups. The test assumes normality and equivariance of the data.@en
51	INFO	lowercase_definition	STATO:0000045	IAO:0000115	two-way anova is an analysis of variance where the different groups being compared are associated the factor levels of exatly 2 independent variables. The null hypothesis is an absence of difference between the means calculated for each of the groups. The test assumes normality and equivariance of the data.@en
52	INFO	lowercase_definition	STATO:0000046	IAO:0000115	a block design is a kind of study design which declares a blocking variable (also known as nuisance variable) in order to account for a known source of variation and reduce its impact on the acquisition of the signal@en
53	INFO	lowercase_definition	STATO:0000047	IAO:0000115	a count is a data item denoted by an integer and representing the number of instances or occurences of an entity@en
54	INFO	lowercase_definition	STATO:0000050	IAO:0000115	signal to noise ratio is a measurement datum comparing the amount of meaningful, useful or interesting data (the signal) to the amount of irrelevant or false data (the noise). Depending on the field and domain of application, different variables will be used to determinate a 'signal to noise ratio'. In statistics, the definition of signal to noise ratio is the ratio of the mean of a measurement to its standard deviation. It thus corresponds to the inverse of the coefficient of variation@en
55	INFO	lowercase_definition	STATO:0000053	IAO:0000115	a false positive rate is a data item which accounts for the proportion of incorrect rejection of a true null hypothesis.@en
56	INFO	lowercase_definition	STATO:0000054	IAO:0000115	homoskedasticity states that all variances under consideration are homogenous.@en
57	INFO	lowercase_definition	STATO:0000055	IAO:0000115	chromosome coordinate system is a genomic coordinate which uses chromosome of a particular assembly build process to define start and end positions. This coordinate system is unstable and will change with each new genome sequence assembly build.@en
58	INFO	lowercase_definition	STATO:0000056	IAO:0000115	a null hypothesis which states that no linkage exists between 2 categorical variables@en
59	INFO	lowercase_definition	STATO:0000058	IAO:0000115	goodness of fit hypothesis is a null hypothesis stating that the distribution computed from the sample population fits a theoretical distribution or that a dataset can be correctly explained by a model@en
60	INFO	lowercase_definition	STATO:0000059	IAO:0000115	the Student's t distribution is a continuous probability distribution which arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.@en
61	INFO	lowercase_definition	STATO:0000060	IAO:0000115	hypergeometric distribution is a probability distribution that describes the probability of k successes in n draws from a finite population of size N containing K successes without replacement@en
62	INFO	lowercase_definition	STATO:0000062	IAO:0000115	is a null hypothesis stating that there are no difference observed across a series of measurements made one same subject.@en
63	INFO	lowercase_definition	STATO:0000063	IAO:0000115	genomic coordinate datum is a data item which denotes a genomic position expressed using a genomic coordinate system@en
64	INFO	lowercase_definition	STATO:0000064	IAO:0000115	sequence read count is a data item determining how many sequence reads have been generated by a DNA sequencing assay for a given stretch of DNA
65	INFO	lowercase_definition	STATO:0000067	IAO:0000115	a continuous probability distribution is a probability distribution which is defined by a probability density function@en
66	INFO	lowercase_definition	STATO:0000071	IAO:0000115	reaction rate is a measurement datum which represents the speed of a chemical reaction turning reactive species into product species of event (i.e the number of such conversions)s occuring over a time interval@en
67	INFO	lowercase_definition	STATO:0000072	IAO:0000115	substrate concentration is a scalar measurement datum which denotes the amount of molecular entity involved in an enzymatic reaction (or catalytic chemical reaction) and whose role in that reaction is as substrate.@en
68	INFO	lowercase_definition	STATO:0000075	IAO:0000115	a rarefaction curve is a graph used for estimating species richness in ecology studies@en
69	INFO	lowercase_definition	STATO:0000080	IAO:0000115	the Brown Forsythe test is a statistical test which evaluates if the variance of different groups are equal. It relies on computing the median rather than the mean, as used in the Levene's test for homoschedacity. This test maybe used to, for instance, ensure that the conditions of applications of ANOVA are met.@en
70	INFO	lowercase_definition	STATO:0000082	IAO:0000115	a fixed effect model is a statistical model which represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random.@en
71	INFO	lowercase_definition	STATO:0000084	IAO:0000115	multinomial logistic regression model is a model which attempts to explain data distribution associated with polychotomous response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is probit function.@en
72	INFO	lowercase_definition	STATO:0000085	IAO:0000115	effect size estimate is a data item about the direction and strength of the consequences of a causative agent as explored by statistical methods. Those methods produce estimates of the effect size, e.g. confidence interval@en
73	INFO	lowercase_definition	STATO:0000086	IAO:0000115	an F-test is a statistical test which evaluates that the computed test statistics follows an F-distribution under the null hypothesis. The F-test is sensitive to departure from normality. F-test arise when decomposing the variability in a data set in terms of sum of squares.@en
74	INFO	lowercase_definition	STATO:0000087	IAO:0000115	a polychotomous variable is a categorical variable which is defined to have minimally 2 categories or possible values@en
75	INFO	lowercase_definition	STATO:0000088	IAO:0000115	statistical sample size is a count evaluating the number of individual experimental units@en
76	INFO	lowercase_definition	STATO:0000089	IAO:0000115	a case-control study design is a observation study design which assess the risk of particular outcome (a trait or a disease) associated with an event (either an exposure or endogenous factor). A case-control study design therefore declares an exposure variable which is dichotomous in nature (exposed/non-exposed) and an outcome variable, which is also dichotomous (case or control), thus giving the name to the design. During the execution of the design, a case control study defines a population and counts the events to determine their frequency.@en
77	INFO	lowercase_definition	STATO:0000090	IAO:0000115	a dichotomous variable is a categorical variable which is defined to have only 2 categories or possible values@en
78	INFO	lowercase_definition	STATO:0000095	IAO:0000115	paired t-test is a statistical test which is specifically designed to analysis differences between paired observations in the case of studies realizing repeated measures design with only 2 repeated measurements per subject (before and after treatment for example)@en
79	INFO	lowercase_definition	STATO:0000096	IAO:0000115	stratification is a planned process which executes a stratification rule using as input a population and assign it member to mutually exclusive subpopulation based on the values defined by the stratification rule@en
80	INFO	lowercase_definition	STATO:0000099	IAO:0000115	a random effect(s) model, also called a variance components model, is a kind of hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of different populations whose differences relate to that hierarchy.@en
81	INFO	lowercase_definition	STATO:0000100	IAO:0000115	standardized mean difference is statistic computed by forming the difference between two means, divided by an estimate of the within-group standard deviation. It is used to provide an estimation of the effect size between two treatments when the predictor (independent variable) is categorical and the response(dependent) variable is continuous. A standardized mean difference is a statistic that is a difference between two means, divided by a statistical measure of dispersion. The term Standardized Mean Difference is a description of the concept without an explicit type of statistical measure of dispersion. If the statistical measure of dispersion is specified, then a type (child term) of Standardized Mean Difference is preferred.@en
82	INFO	lowercase_definition	STATO:0000101	IAO:0000115	the relationship between a fraction and the number above the line@en
83	INFO	lowercase_definition	STATO:0000102	IAO:0000115	relationship between a planned process and the plan specification that it carries out; it is defined as equivalent to the composed relationship (realizes o concretizes)@en
84	INFO	lowercase_definition	STATO:0000103	IAO:0000115	the multinomial distribution is a probability distribution which gives the probability of any particular combination of numbers of successes for various categories defined in the context of n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability.@en
85	INFO	lowercase_definition	STATO:0000105	IAO:0000115	log signal intensity ratio is a data item which corresponding the logarithmitic base 2 of the ratio between 2 signal intensity, each corresponding to a condition.@en
86	INFO	lowercase_definition	STATO:0000106	IAO:0000115	probit regression model is a model which attempts to explain data distribution associated with dichotomous response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is the probit function aka the quantile function, i.e., the inverse cumulative distribution function (CDF), associated with the standard normal distribution.@en
87	INFO	lowercase_definition	STATO:0000107	IAO:0000115	a statistical model is an information content entity which is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more other variables. The model is statistical as the variables are not deterministically but stochastically related.@en
88	INFO	lowercase_definition	STATO:0000108	IAO:0000115	linear regression model is a model which attempts to explain data distribution associated with response/dependent variable in terms of values assumed by the independent variable uses a linear function or linear combination of the regression parameters and the predictor/independent variable(s). linear regression modeling makes a number of assumptions, which includes homoskedasticity (constance of variance)@en
89	INFO	lowercase_definition	STATO:0000109	IAO:0000115	multinomial logistic regression model is a model which attempts to explain data distribution associated with polychotomous response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is logistic function.@en
90	INFO	lowercase_definition	STATO:0000111	IAO:0000115	a sequence read is a DNA sequence data which is generated by a DNA sequencer@en
91	INFO	lowercase_definition	STATO:0000112	IAO:0000115	a Funnel plot is a scatter plot of treatment effect versus a measure of study size and aims to provide a visual aid to detecting bias or systematic heterogeneity. A symmetric inverted funnel shape arises from a ‘well-behaved’ data set, in which publication bias is unlikely. An asymmetric funnel indicates a relationship between treatment effect and study size. Known caveats: If high precision studies really are different from low precision studies with respect to effect size (e.g., due to different populations examined) a funnel plot may give a wrong impression of publication bias. The appearance of the funnel plot can change quite dramatically depending on the scale on the y-axis — whether it is the inverse square error or the trial size. Funnel plot was introduced by Light and Palmer in 1984.@en
92	INFO	lowercase_definition	STATO:0000113	IAO:0000115	variance is a data item about a random variable or probability distribution. it is equivalent to the square of the standard deviation. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value).The variance is the second moment of a distribution.@en
93	INFO	lowercase_definition	STATO:0000114	IAO:0000115	relationship between an element and a set it belongs to@en
94	INFO	lowercase_definition	STATO:0000115	IAO:0000115	relationship between a set and one of its elements@en
95	INFO	lowercase_definition	STATO:0000116	IAO:0000115	the process of using statistical analysis for interpreting and communicating \"what the data say\".@en
96	INFO	lowercase_definition	STATO:0000117	IAO:0000115	a discrete probability distribution is a probability distribution which is defined by a probability mass function where the random variable can only assume a finite number of values or infinitely countable values@en
97	INFO	lowercase_definition	STATO:0000118	IAO:0000115	ranking is a data transformation which turns a non-ordinal variable into a Ordinal variable by sorting the values of the input variable and replacing their value by their position in the sorting result@en
98	INFO	lowercase_definition	STATO:0000119	IAO:0000115	model parameter estimation is a data transformation that finds parameter values (the model parameter estimates) most compatible with the data as judged by the model.@en
99	INFO	lowercase_definition	STATO:0000120	IAO:0000115	beanplot is a plot in which (one or) multiple batches (\"beans\") are shown. Each bean consists of a density trace, which is mirrored to form a polygon shape. Next to that, a one-dimensional scatter plot shows all the individual measurements, like in a stripchart. The name beanplot stems from green beans. The density shape can be seen as the pod of a green bean, while the scatter plot shows the seeds inside the pod.@en
100	INFO	lowercase_definition	STATO:0000121	IAO:0000115	the objective of a data transformation to evaluate a null hypothesis of absence of linkage between variables.@en
101	INFO	lowercase_definition	STATO:0000122	IAO:0000115	a pedigree chart is a graph which plots parent child relations@en
102	INFO	lowercase_definition	STATO:0000123	IAO:0000115	r2 is a correlation coefficient which is computed over the frequency of 2 dichotomous variable and is used as a measure of Linkage Disequilibrium and as input data item to the creation of an LD plot@en
103	INFO	lowercase_definition	STATO:0000124	IAO:0000115	a stratification rule/criteria is a criteria used to determine population strata so that a stratification process implementing the rule can result in any member of the total population being assigned to one and only one stratum@en
104	INFO	lowercase_definition	STATO:0000126	IAO:0000115	volcano plot is a kind of scatter plot which graphs the negative log of the p-value (significance) on the y-axis versus log2 of fold-change between 2 conditions on the x-axis. It is a popular method for visualizing differential occurence of variables between 2 conditions.@en
105	INFO	lowercase_definition	STATO:0000127	IAO:0000115	a confidence interval which covers 99% of the sampling distribution, meaning that there is a 1% risk of false positive (type I error)@en
106	INFO	lowercase_definition	STATO:0000130	IAO:0000115	the Breslow-Day test is a statistical test which evaluates if the odds ratios are homogenous across N 2x2 contingency tables, for instance several 2x2 contingency tables associated with different strata of a stratified population when evaluating the relationship between exposure and outcome or associated with the different samples coming from several centres in a multicentric study in clinical trial context.@en
107	INFO	lowercase_definition	STATO:0000131	IAO:0000115	a sphericity test is a null hypothesis statistical testing procedure which posits a null hypothesis of equality of the variances of the differences between levels of the repeated measures factor@en
108	INFO	lowercase_definition	STATO:0000134	IAO:0000115	specificity is a measurement datum qualifying a binary classification test and is computed by substracting the false positive rate to the integral numeral 1@en
109	INFO	lowercase_definition	STATO:0000135	IAO:0000115	strictly standardized mean difference (SSMS) is a standardized mean difference which corresponds to the ratio of mean to the standard deviation of the difference between two groups. SSMD directly measures the magnitude of difference between two groups. SSMD is widely used in High Content Screen for hit selection and quality control. When the data is preprocessed using log-transformation as normally done in HTS experiments, SSMD is the mean of log fold change divided by the standard deviation of log fold change with respect to a negative reference. In other words, SSMD is the average fold change (on the log scale) penalized by the variability of fold change (on the log scale). For quality control, one index for the quality of an HTS assay is the magnitude of difference between a positive control and a negative reference in an assay plate. For hit selection, the size of effects of a compound (i.e., a small molecule or an siRNA) is represented by the magnitude of difference between the compound and a negative reference. SSMD directly measures the magnitude of difference between two groups. Therefore, SSMD can be used for both quality control and hit selection in HTS experiments.@en
110	INFO	lowercase_definition	STATO:0000137	IAO:0000115	an homoskedasticity test is a statistical test aiming at evaluate if the variances from several random samples are similar@en
111	INFO	lowercase_definition	STATO:0000138	IAO:0000115	a 2x2 contingency table is a contingency table build for 2 dichotomous variables (i.e. 2 categorical variables, each with only 2 possible outcomes). It is the simplest of contingency tables@en
112	INFO	lowercase_definition	STATO:0000139	IAO:0000115	a subject pairing is a planned process which executes a pairing rule and results in the creation of sets of 2 subjects meeting the pairing criteria@en
113	INFO	lowercase_definition	STATO:0000140	IAO:0000115	a contigency table is a data item which displays the (multivariate) frequency distribution of the possible values of categorical variables. The first row of the table corresponds to categories of one categorical variable, the first column of the table corresponds to categories of the other categorical variable, the cells corresponding to each combination of categories is filled with the observed occurences in the sample being considered. The table also contains marginal total (marginal sums) and grand total of the occurences The term contingency table was first used by Karl Pearson in \"On the Theory of Contingency and Its Relation to Association and Normal Correlation\", part of the Drapers' Company Research Memoirs Biometric Series I published in 1904.@en
114	INFO	lowercase_definition	STATO:0000141	IAO:0000115	acute toxicity study is an investigation which use interventions organized according to a factorial design and a parallel group design to observe the effect of use of high dose xenobiotics in animal models or cellular models@en
115	INFO	lowercase_definition	STATO:0000144	IAO:0000115	a model parameter estimate is a data item which results from a model parameter estimation process and which provides a numerical value about a model parameter.@en
116	INFO	lowercase_definition	STATO:0000145	IAO:0000115	the geometric distribution is a negative binomial distribution where r is 1. It is useful for modeling the runs of consecutive successes (or failures) in repeated independent trials of a system. The geometric distribution models the number of successes before one failure in an independent succession of tests where each test results in success or failure. The geometric distribution with prob = p has density p(x) = p (1-p)^x for x = 0, 1, 2, …, 0 < p ≤ 1. If an element of x is not integer, the result of dgeom is zero, with a warning. The quantile is defined as the smallest value x such that F(x) ≥ p, where F is the distribution function.@en
117	INFO	lowercase_definition	STATO:0000146	IAO:0000115	a null hypothesis stating that there are differences observed between group of subjects@en
118	INFO	lowercase_definition	STATO:0000149	IAO:0000115	binomial logistic regression model is a model which attempts to explain data distribution associated with dichotomous response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is logistic function.@en
119	INFO	lowercase_definition	STATO:0000150	IAO:0000115	a minimum value is a data item which denotes the smallest value found in a dataset or resulting from a calculation.@en
120	INFO	lowercase_definition	STATO:0000151	IAO:0000115	maximum value is a data item which denotes the largest value found in a dataset or resulting from a calculation.@en
121	INFO	lowercase_definition	STATO:0000152	IAO:0000115	a quartile is a quantile which splits data into sections accrued of 25% of data, so the first quartile delineates 25% of the data, the second quartile delineates 50% of the data and the third quartile, 75 % of the data@en
122	INFO	lowercase_definition	STATO:0000154	IAO:0000115	a violin plot is a plot combining the features of box plot and kernel density plot. The violin plot is therefore similar to box plot but it incorporated in the display the probability density of the data at different values. Typically violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots.@en
123	INFO	lowercase_definition	STATO:0000155	IAO:0000115	meta-analysis is a data transformation which uses the effect size estimates from several independent quantitative scientific studies addressing the same question in order to assess finding consistency.@en
124	INFO	lowercase_definition	STATO:0000156	IAO:0000115	the Scheffe test is a data transformation which evaluates all possible contrasts and adjusting the levels significance by accounting for multiple comparison. The test is therefore conservative. Confidence intervals can be constructed for the corresponding linear regression. It was developped by American statistician Henry Scheffe in 1959.@en
125	INFO	lowercase_definition	STATO:0000157	IAO:0000115	the LSD test is a statistical test for multiple comparisons of treatments by means of least significant difference following an ANOVA analysis
126	INFO	lowercase_definition	STATO:0000158	IAO:0000115	a null hypothesis which states that a linkage exists between 2 categorical variables@en
127	INFO	lowercase_definition	STATO:0000161	IAO:0000115	variable distribution is data item which denotes the spatial resolution of data point making up a variable. variable distribution may be compared to a known probability distribution using goodness of fit test or plotting a quantile-quantile plot for visual assessment of the fit.@en
128	INFO	lowercase_definition	STATO:0000162	IAO:0000115	the role played by an entity part of study group as defined by an experimental design and realized in a data analysis and data interpretation@en
129	INFO	lowercase_definition	STATO:0000163	IAO:0000115	trimmed mean or truncated mean is a measure of central tendency which involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both@en
130	INFO	lowercase_definition	STATO:0000165	IAO:0000115	a pie chart is a graph in which a circular graph is divided into sector illustrating numerical proportion, meaning that the arc length of each sector (and consequently its central angle and area), is proportional to the quantity it represents.@en
131	INFO	lowercase_definition	STATO:0000166	IAO:0000115	the bart chart is a graph resulting from plotting rectangular bars with lengths proportional to the values that they represent.
132	INFO	lowercase_definition	STATO:0000167	IAO:0000115	the first quartile is a quartile which splits the lower 25 % of the data@en
133	INFO	lowercase_definition	STATO:0000168	IAO:0000115	a real time quantitative pcr plot is a line graph which plots the signal fluorescence intensity as a function of the number of PCR cycle@en
134	INFO	lowercase_definition	STATO:0000170	IAO:0000115	the first quartile is a quartile which splits the 75 % of the data@en
135	INFO	lowercase_definition	STATO:0000173	IAO:0000115	homogeneity testing objective is the objective of a data transformation to test a null hypothesis that two or more sub-groups of a population share the same distribution of a single categorical variable. For example, do people of different countries have the same proportion of smokers to non-smokers@en
136	INFO	lowercase_definition	STATO:0000175	IAO:0000115	confidence interval calculation is a data transformation which determines a confidence interval for a given statistical parameter@en
137	INFO	lowercase_definition	STATO:0000176	IAO:0000115	t-statistic is a statistic computed from observations and used to produce a p-value in statistical test when compared to a Student's t distribution.@en
138	INFO	lowercase_definition	STATO:0000177	IAO:0000115	the beta distribution is a continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution@en
139	INFO	lowercase_definition	STATO:0000180	IAO:0000115	standard normal distribution is a normal distribution with variance = 1 and mean=0@en
140	INFO	lowercase_definition	STATO:0000183	IAO:0000115	sphericity testing objective is a statistical objective of a data transformation which aims to test a null hypothesis of sphericity holds.@en
141	INFO	lowercase_definition	STATO:0000185	IAO:0000115	a 2 by n contingency table is a contingency table built for one dichotomous variable (a categorical variable with only 2 outcomes) and one polychotomous variable (a polychomotomous variable with at least 2 outcomes)@en
142	INFO	lowercase_definition	STATO:0000188	IAO:0000115	average log signal intensity is a data time which corresponds to the sum of 2 distinct logarithm base 2 transformed signal intensity, each corresponding to a distinct condition of signal acquisition, divided by 2.@en
143	INFO	lowercase_definition	STATO:0000191	IAO:0000115	a goodness of fit statistical test is a statistical test which aim to evaluate if a sample distribution can be considered equivalent to a theoretical distribution used as input@en
144	INFO	lowercase_definition	STATO:0000192	IAO:0000115	a cartesian product is a data transformation which operates on a n Sets to produce a set of all possible ordered n-tuples where each element of the tuple comes from a Set
145	INFO	lowercase_definition	STATO:0000193	IAO:0000115	is a population whose individual members realize (may be expressed as) a combination of inclusion rule values specifications or resulting from a sampling process (e.g. recruitment followed by randomization to group) on which a number of measurements will be carried out, which may be used as input to statistical tests and statistical inference.
146	INFO	lowercase_definition	STATO:0000194	IAO:0000115	self explanatory@en
147	INFO	lowercase_definition	STATO:0000197	IAO:0000115	a genomic coordinate system is a coordinate system to describe position of sequence on a genomic scaffold (assembly of chromosome, contig....)@en
148	INFO	lowercase_definition	STATO:0000198	IAO:0000115	a statistical test which makes no assumption about the underlying data distribution@en
149	INFO	lowercase_definition	STATO:0000199	IAO:0000115	the Mauchly's test for sphericity is a statistical test which evaluates if the variance of the differences between all combinations of the groups are equal, a property known as 'sphericity' in the context of repeated measures. It is used for instance prior to repeated measure ANOVA. The test works by assessing if a Wishart-distributed covariance matrix (or transformation thereof) is proportional to a given matrix.@en
150	INFO	lowercase_definition	STATO:0000200	IAO:0000115	the statistical test power is data item which is about a statistical test and is obtained by subtracting the false negative rate (type II error rate) to 1. The power of a statistical test is the probability that it will correctly lead to the rejection of a false null hypothesis (Greene 2000). The statistical power is the ability of a test to detect an effect, if the effect actually exists (High 2000).@en
151	INFO	lowercase_definition	STATO:0000202	IAO:0000115	within subject comparison statistical test is a kind of statistical test which evaluates if a change occurs within one experimental unit over time following a treatment or an event@en
152	INFO	lowercase_definition	STATO:0000203	IAO:0000115	a cohort is a study group population where the members are human beings which meet inclusion criteria and undergo a longitudinal design@en
153	INFO	lowercase_definition	STATO:0000204	IAO:0000115	the F-distribution is a continuous probability distribution which arises in the testing of whether two observed samples have the same variance.@en
154	INFO	lowercase_definition	STATO:0000207	IAO:0000115	a planned process which etablishes and states the different hypothesis to be evaluated during a null hypothesis statistical test@en
155	INFO	lowercase_definition	STATO:0000209	IAO:0000115	area under curve is a measurement datum which corresponds to the surface define by the x-axis and bound by the line graph represented in a 2 dimensional plot resulting from an integration or integrative calculus. The interpretation of this measurement datum depends on the variables plotted in the graph@en
156	INFO	lowercase_definition	STATO:0000210	IAO:0000115	is a data item formed by dividing the fluorescence intensity obtained in one channel to that obtained in the other channel, typically the case when considering 2-color microarray data when imaging is done for Cy3 and Cy5 dyes.@en
157	INFO	lowercase_definition	STATO:0000211	IAO:0000115	odds ratio homogeneity hypothesis is a null hypothesis stating that all odds ratio are homogenous, that is remain within the same range.@en
158	INFO	lowercase_definition	STATO:0000212	IAO:0000115	a tetrachoric correlation coefficient is a polychoric correlation coefficient for 2 dichotomous variables used as proxy for correlation between 2 continuous latent variables.@en
159	INFO	lowercase_definition	STATO:0000213	IAO:0000115	discretization as a processing converting a continuous variable into a polychotomous variable by concretizing a set of discretization rules@en
160	INFO	lowercase_definition	STATO:0000214	IAO:0000115	a confidence interval which covers 50% of the sampling distribution, meaning that there is a 50% risk of false positive (type I error)@en
161	INFO	lowercase_definition	STATO:0000215	IAO:0000115	probit regression model is a model which attempts to explain data distribution associated with ordinal response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is the ordered probit function.@en
162	INFO	lowercase_definition	STATO:0000216	IAO:0000115	a stratum population is a population resulting from a population stratification prior to sampling process which aims to produce homogenous subpopulations from an heterogeneous population by applying one or more stratification criteria@en
163	INFO	lowercase_definition	STATO:0000217	IAO:0000115	a null hypothesis which states that a given matrix is proportional to a Wishart-distributed covariance matrix@en
164	INFO	lowercase_definition	STATO:0000219	IAO:0000115	a real time pcr standard curve is a line graph which plots the fluorescence intensity signal as a function of the concentration of a sample used as reference and used to determine relative abundance of test samples@en
165	INFO	lowercase_definition	STATO:0000220	IAO:0000115	the false negative rate is a data item which denotes the proportion of missed detection of elements known to be meeting the detection criteria@en
166	INFO	lowercase_definition	STATO:0000221	IAO:0000115	a random variable (or aleatory variable or stochastic variable) in probability and statistics, is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense)@en
167	INFO	lowercase_definition	STATO:0000222	IAO:0000115	graeco-latin square design is_a study design which allows in its simpler form controlling 3 levels of nuisance variables (also known as blocking variables). The 3 nuisance factors are divided into a tabular grid with the property that each row and each column receive each treatment exactly once.@en
168	INFO	lowercase_definition	STATO:0000223	IAO:0000115	group assignment based on blocking variable specification is a kind of group assignment process which takes into account the levels assumed by a blocking variable to allocate subjects or experimental units to a treatment group@en
169	INFO	lowercase_definition	STATO:0000227	IAO:0000115	a normal distribution is a continuous probability distribution described by a probability distribution function described here: http://mathworld.wolfram.com/NormalDistribution.html@en
170	INFO	lowercase_definition	STATO:0000228	IAO:0000115	ordinal variable is a categorical variable where the discrete possible values are ordered or correspond to an implicit ranking@en
171	INFO	lowercase_definition	STATO:0000230	IAO:0000115	the expected value (or expectation, mathematical expectation, EV, mean, or the first moment) of a random variable is a data item which corresponds to the weighted average of all possible values that this random variable can take on. The weights used in computing this average correspond to the probabilities in case of a discrete random variable, or densities in case of a continuous random variable. From a rigorous theoretical standpoint, the expected value is the integral of the random variable with respect to its probability measure.@en
172	INFO	lowercase_definition	STATO:0000231	IAO:0000115	a confidence interval which covers 95% of the sampling distribution, meaning that there is a 5% risk of false positive (type I error). If the number of observations made is large enough, the sampling distribution can be assumed to be normal, which entails that 95% of the sampling distributions falls within roughly2 (1.96) standard deviations from the mean.@en
173	INFO	lowercase_definition	STATO:0000232	IAO:0000115	number of PCR cycle is a count which enumerates how many iterations of 'annealing, renaturation, amplification,' rounds (or cycles) are performed during a polymerase chain reaction (PCR) or an assay relying on PCR.@en
174	INFO	lowercase_definition	STATO:0000233	IAO:0000115	sensitivity is a measurement datum qualifying a binary classification test and is computed by substracting the false negative rate to the integral numeral 1@en
175	INFO	lowercase_definition	STATO:0000234	IAO:0000115	a residual is a data item which is the output of an error estimate or model fitting process and which is an observable estimate of the unobservable error@en
176	INFO	lowercase_definition	STATO:0000236	IAO:0000115	the coefficient of variation is a normalized measure of dispersion of a probability distribution of frequency distribution.@en
177	INFO	lowercase_definition	STATO:0000238	IAO:0000115	high content screening is a kind of investigation which uses a standardized cellular assays to test the effect of substances (RNAi or small molecules) held in libraries on a cellular phenotype. it relies on microscopy imaging and or flow-cytometry, robotic handling to ensure fast and high-throughput.@en
178	INFO	lowercase_definition	STATO:0000239	IAO:0000115	high throughput screening is a kind of investigation which uses a standardized assays (cell based, enzymatic or chemometric) to test the effect of substances (RNAi or small molecules) held in libraries on a very specific and measureable outcome (e.g fluorence intensity). it relies on robotic handling to ensure fast and high-throughput in assay performance, data acquisition and hit selection.@en
179	INFO	lowercase_definition	STATO:0000242	IAO:0000115	statistical error is an data item denoting the amount by which an observation differs from the expected value, being based on the whole statistical population from which the statistical unit was chosen randomly@en
180	INFO	lowercase_definition	STATO:0000243	IAO:0000115	a box plot is a graph which plots datasets relying on their quartiles and the interquartile range to create the box and the whiskers.@en
181	INFO	lowercase_definition	STATO:0000244	IAO:0000115	(Rn +) − (Rn −), where Rn + = (emission intensity of reporter dye)/(emission intensity of passive reference dye) in PCR with template and Rn − = (emission intensity of reporter dye)/(emission intensity of passive reference dye) in PCR without template or early cycles of a real-time reaction. Ct = threshold cycle, i.e., cycle at which a statistically significant increase in ΔRn is first detected@en
182	INFO	lowercase_definition	STATO:0000247	IAO:0000115	odds ratio homogeneity test is a statistical test which aims to evaluate that null the hypothesis of consistency odds ratio accross different strata of population is true or not@en
183	INFO	lowercase_definition	STATO:0000248	IAO:0000115	a blocking variable is a independent variable which is used in a blocking process part of an experiment with the purpose of maximizing the signal coming from the main variable.
184	INFO	lowercase_definition	STATO:0000249	IAO:0000115	a DNA microarray hybridization is an assay relying on nucleic acid hybridization , which uses a DNA microarray device and a nucleic acid as input. It precedes a data acquisition process@en
185	INFO	lowercase_definition	STATO:0000250	IAO:0000115	group comparison objective is a data transformation objective which aims to determine if 2 or more study group differ with respect to the signal of a response variable@en
186	INFO	lowercase_definition	STATO:0000252	IAO:0000115	a categorical variable is a variable which that can only assume a finite number of value and cast observation in a small number of categories@en
187	INFO	lowercase_definition	STATO:0000253	IAO:0000115	the objective of a data transformation to test a null hypothesis of absence of difference within subject holds.@en
188	INFO	lowercase_definition	STATO:0000255	IAO:0000115	the objective of a data transformation to test a null hypothesis of absence of difference withing subject holds.@en
189	INFO	lowercase_definition	STATO:0000256	IAO:0000115	a manhattan plot for gwas is a kind of scatter plot used to facilitate presentation of genome-wide association study (GWAS) data. Genomic coordinates are displayed along the X-axis, with the negative logarithm of the association P-value for each single nucleotide polymorphism displayed on the Y-axis.@en
190	INFO	lowercase_definition	STATO:0000258	IAO:0000115	a variable is a data item which can assume any of a set of values, either as determined by an agent or as randomly occuring through observation.@en
191	INFO	lowercase_definition	STATO:0000259	IAO:0000115	the relationship between a fraction and the number below the line (or divisor)@en
192	INFO	lowercase_definition	STATO:0000260	IAO:0000115	repeated measure ANOVA is a kind of ANOVA specifically developed for non-independent observations as found when repeated measurements on the sample experimental unit. repeated measure ANOVA is sensitive to departure from normality (evaluation using Bartlett's test), more so in the case of unbalanced groups (i.e. different sizes of sample populations). Departure from sphericity (evaluation using Mauchly'test) used to be an issue which is now handled robustly by modern tools such as R's lme4 or nlme, which accommodate dependence assumptions other than sphericity.@en
193	INFO	lowercase_definition	STATO:0000264	IAO:0000115	a factor level combination is one a possible sets of factor levels resulting from the cartesian product of sets of factor and their levels as defined in a factorial design@en
194	INFO	lowercase_definition	STATO:0000267	IAO:0000115	grouped bar chart is a kind of bar chart which juxtaposes the discrete values for each of the possible value of a given categorical variable, thus providing within group comparison. Grouped bar charts are good for comparing between each element in the categories, and comparing elements across categories. However, the grouping can make it harder to tell the difference between the total of each group.@en
195	INFO	lowercase_definition	STATO:0000269	IAO:0000115	polychoric correlation coefficient is a correlation coefficient which is computed over 2 variables to characterise an association by proxy with 2 (latent) variables which are assumed to be continuous and normally distributed.@en
196	INFO	lowercase_definition	STATO:0000270	IAO:0000115	a full factorial design is a factorial design which ensures that all possible factor level combinations are defined and used so all between group differences can be explored@en
197	INFO	lowercase_definition	STATO:0000271	IAO:0000115	permutation numbering is a data tranformation allowing to count the number of possible permutations of elements in a set of size n, each element occurring exactly once. This number is factorial n.@en
198	INFO	lowercase_definition	STATO:0000274	IAO:0000115	receiver operational characteristics curve is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold (aka cut-off point) is varied by plotting sensitivity vs (1 − specificity)@en
199	INFO	lowercase_definition	STATO:0000277	IAO:0000115	hit selection is a planned process which in screening processes such as high-throughput screening, lead to the identification of perturbing agent which cause the typical signal generated by a standardized assay to significantly differ from the negative control. The selection hitself results from meeting or exceeding selection threshold (for instance 6 sigma from the mean or SSMD value beyond 5 when compared to positive controls or below -5 when compared to negative controls@en

24

INFO

lowercase_definition

STATO:0000001

IAO:0000115

property to indicate that a design declares a variable; the inverse property is 'is declared by'@en

25

INFO

lowercase_definition

STATO:0000002

IAO:0000115

an electronic file is an information content entity which conforms to a specification or format and which is meant to hold data and information in digital form, accessible to software agents@en

26

INFO

lowercase_definition

STATO:0000003

IAO:0000115

a balanced design is a an experimental design where all experimental group have the an equal number of subject observations@en

27

INFO

lowercase_definition

STATO:0000004

IAO:0000115

property to indicate the variables declared by a design; the inverse property is 'declares'@en

28

INFO

lowercase_definition

STATO:0000005

IAO:0000115

a single factor design is a study design which declares exactly 1 independent variable@en

29

INFO

lowercase_definition

STATO:0000006

IAO:0000115

x-axis is a cartesian coordinate axis which is orthogonal to the y-axis and the z-axis@en

30

INFO

lowercase_definition

STATO:0000007

IAO:0000115

an axis is a line graph used as reference line for the measurement of coordinates.@en

31

INFO

lowercase_definition

STATO:0000008

IAO:0000115

y-axis is a cartesian coordinate axis which is orthogonal to the x-axis and the z-axis@en

32

INFO

lowercase_definition

STATO:0000011

IAO:0000115

a cartesian axis is one of 3 the axis in a cartesian coordinate system defining a referential in 3 dimensions. each of the axis is orthogonal to the other 2@en

33

INFO

lowercase_definition

STATO:0000012

IAO:0000115

z-axis is a cartesian coordinate axis which is orthogonal to the x-axis and the y-axis@en

34

INFO

lowercase_definition

STATO:0000013

IAO:0000115

a 2 dimensional cartesian coordinate system is a cartesian coordinate system which defines 2 orthogonal one dimensional axes and which may be used to describe a 2 dimensional spatial region.

35

INFO

lowercase_definition

STATO:0000019

IAO:0000115

normal distribution hypothesis is a goodness of fit hypothesis stating that the distribution computed from the sample population fits a normal distribution.@en

36

INFO

lowercase_definition

STATO:0000021

IAO:0000115

a confidence interval which covers 90% of the sampling distribution, meaning that there is a 90% risk of false positive (type I error)@en

37

INFO

lowercase_definition

STATO:0000024

IAO:0000115

a three dimensional cartesian coordinate system is a cartesian coordinate system which defines 3 orthogonal one dimensional axes and which may be used to describe a 3 dimensional spatial region.

38

INFO

lowercase_definition

STATO:0000027

IAO:0000115

linkage between 2 categorical variable test is a statistical test which evaluates if there is an association between a predictor variable assuming discrete values and a response variable also assuming discrete values@en

39

INFO

lowercase_definition

STATO:0000028

IAO:0000115

measure of variation or statistical dispersion is a data item which describes how much a theoritical distribution or dataset is spread.@en

40

INFO

lowercase_definition

STATO:0000029

IAO:0000115

a measure of central tendency is a data item which attempts to describe a set of data by identifying the value of its centre.@en

41

INFO

lowercase_definition

STATO:0000031

IAO:0000115

binary classification (or binomial classification) is a data transformation which aims to cast members of a set into 2 disjoint groups depending on whether the element have a given property/feature or not.@en

42

INFO

lowercase_definition

STATO:0000032

IAO:0000115

an alternative term used for STATO statistical ontology and ISA team@en

43

INFO

lowercase_definition

STATO:0000034

IAO:0000115

a model parameter is a data item which is part of a model and which is meant to characterize an theoritecal or unknown population. a model parameter may be estimated by considering the properties of samples presumably taken from the theoritecal population@en

44

INFO

lowercase_definition

STATO:0000035

IAO:0000115

the range is a measure of variation which describes the difference between the lowest score and the highest score in a set of numbers (a data set)

45

INFO

lowercase_definition

STATO:0000038

IAO:0000115

a set of 2 subjects which result from a pairing process which assigns subject to a set based on a pairing rule/criteria@en

46

INFO

lowercase_definition

STATO:0000039

IAO:0000115

a statistic is a measurement datum to describe a dataset or a variable. It is generated by a calculation on set of observed data.@en

47

INFO

lowercase_definition

STATO:0000040

IAO:0000115

an MA plot is a scatter plot of the log intensity ratios M = log_2(T/R) versus the average log intensities A = log_2(T*T)/2, where T and R represent the signal intensities in the test and reference channels respectively.@en

48

INFO

lowercase_definition

STATO:0000041

IAO:0000115

a R command syntax or link to a R documentation in support of Statistical Ontology Classes or Data Transformations@en

49

INFO

lowercase_definition

STATO:0000043

IAO:0000115

a false positive rate whose value is 5 per cent@en

50

INFO

lowercase_definition

STATO:0000044

IAO:0000115

one-way anova is an analysis of variance where the different groups being compared are associated with the factor levels of only one independent variable. The null hypothesis is an absence of difference between the means calculated for each of the groups. The test assumes normality and equivariance of the data.@en

51

INFO

lowercase_definition

STATO:0000045

IAO:0000115

two-way anova is an analysis of variance where the different groups being compared are associated the factor levels of exatly 2 independent variables. The null hypothesis is an absence of difference between the means calculated for each of the groups. The test assumes normality and equivariance of the data.@en

52

INFO

lowercase_definition

STATO:0000046

IAO:0000115

a block design is a kind of study design which declares a blocking variable (also known as nuisance variable) in order to account for a known source of variation and reduce its impact on the acquisition of the signal@en

53

INFO

lowercase_definition

STATO:0000047

IAO:0000115

a count is a data item denoted by an integer and representing the number of instances or occurences of an entity@en

54

INFO

lowercase_definition

STATO:0000050

IAO:0000115

signal to noise ratio is a measurement datum comparing the amount of meaningful, useful or interesting data (the signal) to the amount of irrelevant or false data (the noise). Depending on the field and domain of application, different variables will be used to determinate a 'signal to noise ratio'. In statistics, the definition of signal to noise ratio is the ratio of the mean of a measurement to its standard deviation. It thus corresponds to the inverse of the coefficient of variation@en

55

INFO

lowercase_definition

STATO:0000053

IAO:0000115

a false positive rate is a data item which accounts for the proportion of incorrect rejection of a true null hypothesis.@en

56

INFO

lowercase_definition

STATO:0000054

IAO:0000115

homoskedasticity states that all variances under consideration are homogenous.@en

57

INFO

lowercase_definition

STATO:0000055

IAO:0000115

chromosome coordinate system is a genomic coordinate which uses chromosome of a particular assembly build process to define start and end positions. This coordinate system is unstable and will change with each new genome sequence assembly build.@en

58

INFO

lowercase_definition

STATO:0000056

IAO:0000115

a null hypothesis which states that no linkage exists between 2 categorical variables@en

59

INFO

lowercase_definition

STATO:0000058

IAO:0000115

goodness of fit hypothesis is a null hypothesis stating that the distribution computed from the sample population fits a theoretical distribution or that a dataset can be correctly explained by a model@en

60

INFO

lowercase_definition

STATO:0000059

IAO:0000115

the Student's t distribution is a continuous probability distribution which arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.@en

61

INFO

lowercase_definition

STATO:0000060

IAO:0000115

hypergeometric distribution is a probability distribution that describes the probability of k successes in n draws from a finite population of size N containing K successes without replacement@en

62

INFO

lowercase_definition

STATO:0000062

IAO:0000115

is a null hypothesis stating that there are no difference observed across a series of measurements made one same subject.@en

63

INFO

lowercase_definition

STATO:0000063

IAO:0000115

genomic coordinate datum is a data item which denotes a genomic position expressed using a genomic coordinate system@en

64

INFO

lowercase_definition

STATO:0000064

IAO:0000115

sequence read count is a data item determining how many sequence reads have been generated by a DNA sequencing assay for a given stretch of DNA

65

INFO

lowercase_definition

STATO:0000067

IAO:0000115

a continuous probability distribution is a probability distribution which is defined by a probability density function@en

66

INFO

lowercase_definition

STATO:0000071

IAO:0000115

reaction rate is a measurement datum which represents the speed of a chemical reaction turning reactive species into product species of event (i.e the number of such conversions)s occuring over a time interval@en

67

INFO

lowercase_definition

STATO:0000072

IAO:0000115

substrate concentration is a scalar measurement datum which denotes the amount of molecular entity involved in an enzymatic reaction (or catalytic chemical reaction) and whose role in that reaction is as substrate.@en

68

INFO

lowercase_definition

STATO:0000075

IAO:0000115

a rarefaction curve is a graph used for estimating species richness in ecology studies@en

69

INFO

lowercase_definition

STATO:0000080

IAO:0000115

the Brown Forsythe test is a statistical test which evaluates if the variance of different groups are equal. It relies on computing the median rather than the mean, as used in the Levene's test for homoschedacity. This test maybe used to, for instance, ensure that the conditions of applications of ANOVA are met.@en

70

INFO

lowercase_definition

STATO:0000082

IAO:0000115

a fixed effect model is a statistical model which represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random.@en

71

INFO

lowercase_definition

STATO:0000084

IAO:0000115

multinomial logistic regression model is a model which attempts to explain data distribution associated with *polychotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is probit function.@en

72

INFO

lowercase_definition

STATO:0000085

IAO:0000115

effect size estimate is a data item about the direction and strength of the consequences of a causative agent as explored by statistical methods. Those methods produce estimates of the effect size, e.g. confidence interval@en

73

INFO

lowercase_definition

STATO:0000086

IAO:0000115

an F-test is a statistical test which evaluates that the computed test statistics follows an F-distribution under the null hypothesis. The F-test is sensitive to departure from normality. F-test arise when decomposing the variability in a data set in terms of sum of squares.@en

74

INFO

lowercase_definition

STATO:0000087

IAO:0000115

a polychotomous variable is a categorical variable which is defined to have minimally 2 categories or possible values@en

75

INFO

lowercase_definition

STATO:0000088

IAO:0000115

statistical sample size is a count evaluating the number of individual experimental units@en

76

INFO

lowercase_definition

STATO:0000089

IAO:0000115

a case-control study design is a observation study design which assess the risk of particular outcome (a trait or a disease) associated with an event (either an exposure or endogenous factor). A case-control study design therefore declares an exposure variable which is dichotomous in nature (exposed/non-exposed) and an outcome variable, which is also dichotomous (case or control), thus giving the name to the design. During the execution of the design, a case control study defines a population and counts the events to determine their frequency.@en

77

INFO

lowercase_definition

STATO:0000090

IAO:0000115

a dichotomous variable is a categorical variable which is defined to have only 2 categories or possible values@en

78

INFO

lowercase_definition

STATO:0000095

IAO:0000115

paired t-test is a statistical test which is specifically designed to analysis differences between paired observations in the case of studies realizing repeated measures design with only 2 repeated measurements per subject (before and after treatment for example)@en

79

INFO

lowercase_definition

STATO:0000096

IAO:0000115

stratification is a planned process which executes a stratification rule using as input a population and assign it member to mutually exclusive subpopulation based on the values defined by the stratification rule@en

80

INFO

lowercase_definition

STATO:0000099

IAO:0000115

a random effect(s) model, also called a variance components model, is a kind of hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of different populations whose differences relate to that hierarchy.@en

81

INFO

lowercase_definition

STATO:0000100

IAO:0000115

standardized mean difference is statistic computed by forming the difference between two means, divided by an estimate of the within-group standard deviation. It is used to provide an estimation of the effect size between two treatments when the predictor (independent variable) is categorical and the response(dependent) variable is continuous. A standardized mean difference is a statistic that is a difference between two means, divided by a statistical measure of dispersion. The term Standardized Mean Difference is a description of the concept without an explicit type of statistical measure of dispersion. If the statistical measure of dispersion is specified, then a type (child term) of Standardized Mean Difference is preferred.@en

82

INFO

lowercase_definition

STATO:0000101

IAO:0000115

the relationship between a fraction and the number above the line@en

83

INFO

lowercase_definition

STATO:0000102

IAO:0000115

relationship between a planned process and the plan specification that it carries out; it is defined as equivalent to the composed relationship (realizes o concretizes)@en

84

INFO

lowercase_definition

STATO:0000103

IAO:0000115

the multinomial distribution is a probability distribution which gives the probability of any particular combination of numbers of successes for various categories defined in the context of n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability.@en

85

INFO

lowercase_definition

STATO:0000105

IAO:0000115

log signal intensity ratio is a data item which corresponding the logarithmitic base 2 of the ratio between 2 signal intensity, each corresponding to a condition.@en

86

INFO

lowercase_definition

STATO:0000106

IAO:0000115

probit regression model is a model which attempts to explain data distribution associated with *dichotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is the probit function aka the quantile function, i.e., the inverse cumulative distribution function (CDF), associated with the standard normal distribution.@en

87

INFO

lowercase_definition

STATO:0000107

IAO:0000115

a statistical model is an information content entity which is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more other variables. The model is statistical as the variables are not deterministically but stochastically related.@en

88

INFO

lowercase_definition

STATO:0000108

IAO:0000115

linear regression model is a model which attempts to explain data distribution associated with response/dependent variable in terms of values assumed by the independent variable uses a linear function or linear combination of the regression parameters and the predictor/independent variable(s). linear regression modeling makes a number of assumptions, which includes homoskedasticity (constance of variance)@en

89

INFO

lowercase_definition

STATO:0000109

IAO:0000115

multinomial logistic regression model is a model which attempts to explain data distribution associated with *polychotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is logistic function.@en

90

INFO

lowercase_definition

STATO:0000111

IAO:0000115

a sequence read is a DNA sequence data which is generated by a DNA sequencer@en

91

INFO

lowercase_definition

STATO:0000112

IAO:0000115

a Funnel plot is a scatter plot of treatment effect versus a measure of study size and aims to provide a visual aid to detecting bias or systematic heterogeneity. A symmetric inverted funnel shape arises from a ‘well-behaved’ data set, in which publication bias is unlikely. An asymmetric funnel indicates a relationship between treatment effect and study size. Known caveats: If high precision studies really are different from low precision studies with respect to effect size (e.g., due to different populations examined) a funnel plot may give a wrong impression of publication bias. The appearance of the funnel plot can change quite dramatically depending on the scale on the y-axis — whether it is the inverse square error or the trial size. Funnel plot was introduced by Light and Palmer in 1984.@en

92

INFO

lowercase_definition

STATO:0000113

IAO:0000115

variance is a data item about a random variable or probability distribution. it is equivalent to the square of the standard deviation. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value).The variance is the second moment of a distribution.@en

93

INFO

lowercase_definition

STATO:0000114

IAO:0000115

relationship between an element and a set it belongs to@en

94

INFO

lowercase_definition

STATO:0000115

IAO:0000115

relationship between a set and one of its elements@en

95

INFO

lowercase_definition

STATO:0000116

IAO:0000115

the process of using statistical analysis for interpreting and communicating \"what the data say\".@en

96

INFO

lowercase_definition

STATO:0000117

IAO:0000115

a discrete probability distribution is a probability distribution which is defined by a probability mass function where the random variable can only assume a finite number of values or infinitely countable values@en

97

INFO

lowercase_definition

STATO:0000118

IAO:0000115

ranking is a data transformation which turns a non-ordinal variable into a Ordinal variable by sorting the values of the input variable and replacing their value by their position in the sorting result@en

98

INFO

lowercase_definition

STATO:0000119

IAO:0000115

model parameter estimation is a data transformation that finds parameter values (the model parameter estimates) most compatible with the data as judged by the model.@en

99

INFO

lowercase_definition

STATO:0000120

IAO:0000115

beanplot is a plot in which (one or) multiple batches (\"beans\") are shown. Each bean consists of a density trace, which is mirrored to form a polygon shape. Next to that, a one-dimensional scatter plot shows all the individual measurements, like in a stripchart. The name beanplot stems from green beans. The density shape can be seen as the pod of a green bean, while the scatter plot shows the seeds inside the pod.@en

100

INFO

lowercase_definition

STATO:0000121

IAO:0000115

the objective of a data transformation to evaluate a null hypothesis of absence of linkage between variables.@en

101

INFO

lowercase_definition

STATO:0000122

IAO:0000115

a pedigree chart is a graph which plots parent child relations@en

102

INFO

lowercase_definition

STATO:0000123

IAO:0000115

r2 is a correlation coefficient which is computed over the frequency of 2 dichotomous variable and is used as a measure of Linkage Disequilibrium and as input data item to the creation of an LD plot@en

103

INFO

lowercase_definition

STATO:0000124

IAO:0000115

a stratification rule/criteria is a criteria used to determine population strata so that a stratification process implementing the rule can result in any member of the total population being assigned to one and only one stratum@en

104

INFO

lowercase_definition

STATO:0000126

IAO:0000115

volcano plot is a kind of scatter plot which graphs the negative log of the p-value (significance) on the y-axis versus log2 of fold-change between 2 conditions on the x-axis. It is a popular method for visualizing differential occurence of variables between 2 conditions.@en

105

INFO

lowercase_definition

STATO:0000127

IAO:0000115

a confidence interval which covers 99% of the sampling distribution, meaning that there is a 1% risk of false positive (type I error)@en

106

INFO

lowercase_definition

STATO:0000130

IAO:0000115

the Breslow-Day test is a statistical test which evaluates if the odds ratios are homogenous across N 2x2 contingency tables, for instance several 2x2 contingency tables associated with different strata of a stratified population when evaluating the relationship between exposure and outcome or associated with the different samples coming from several centres in a multicentric study in clinical trial context.@en

107

INFO

lowercase_definition

STATO:0000131

IAO:0000115

a sphericity test is a null hypothesis statistical testing procedure which posits a null hypothesis of equality of the variances of the differences between levels of the repeated measures factor@en

108

INFO

lowercase_definition

STATO:0000134

IAO:0000115

specificity is a measurement datum qualifying a binary classification test and is computed by substracting the false positive rate to the integral numeral 1@en

109

INFO

lowercase_definition

STATO:0000135

IAO:0000115

strictly standardized mean difference (SSMS) is a standardized mean difference which corresponds to the ratio of mean to the standard deviation of the difference between two groups. SSMD directly measures the magnitude of difference between two groups. SSMD is widely used in High Content Screen for hit selection and quality control. When the data is preprocessed using log-transformation as normally done in HTS experiments, SSMD is the mean of log fold change divided by the standard deviation of log fold change with respect to a negative reference. In other words, SSMD is the average fold change (on the log scale) penalized by the variability of fold change (on the log scale). For quality control, one index for the quality of an HTS assay is the magnitude of difference between a positive control and a negative reference in an assay plate. For hit selection, the size of effects of a compound (i.e., a small molecule or an siRNA) is represented by the magnitude of difference between the compound and a negative reference. SSMD directly measures the magnitude of difference between two groups. Therefore, SSMD can be used for both quality control and hit selection in HTS experiments.@en

110

INFO

lowercase_definition

STATO:0000137

IAO:0000115

an homoskedasticity test is a statistical test aiming at evaluate if the variances from several random samples are similar@en

111

INFO

lowercase_definition

STATO:0000138

IAO:0000115

a 2x2 contingency table is a contingency table build for 2 dichotomous variables (i.e. 2 categorical variables, each with only 2 possible outcomes). It is the simplest of contingency tables@en

112

INFO

lowercase_definition

STATO:0000139

IAO:0000115

a subject pairing is a planned process which executes a pairing rule and results in the creation of sets of 2 subjects meeting the pairing criteria@en

113

INFO

lowercase_definition

STATO:0000140

IAO:0000115

a contigency table is a data item which displays the (multivariate) frequency distribution of the possible values of categorical variables. The first row of the table corresponds to categories of one categorical variable, the first column of the table corresponds to categories of the other categorical variable, the cells corresponding to each combination of categories is filled with the observed occurences in the sample being considered. The table also contains marginal total (marginal sums) and grand total of the occurences The term contingency table was first used by Karl Pearson in \"On the Theory of Contingency and Its Relation to Association and Normal Correlation\", part of the Drapers' Company Research Memoirs Biometric Series I published in 1904.@en

114

INFO

lowercase_definition

STATO:0000141

IAO:0000115

acute toxicity study is an investigation which use interventions organized according to a factorial design and a parallel group design to observe the effect of use of high dose xenobiotics in animal models or cellular models@en

115

INFO

lowercase_definition

STATO:0000144

IAO:0000115

a model parameter estimate is a data item which results from a model parameter estimation process and which provides a numerical value about a model parameter.@en

116

INFO

lowercase_definition

STATO:0000145

IAO:0000115

the geometric distribution is a negative binomial distribution where r is 1. It is useful for modeling the runs of consecutive successes (or failures) in repeated independent trials of a system. The geometric distribution models the number of successes before one failure in an independent succession of tests where each test results in success or failure. The geometric distribution with prob = p has density p(x) = p (1-p)^x for x = 0, 1, 2, …, 0 < p ≤ 1. If an element of x is not integer, the result of dgeom is zero, with a warning. The quantile is defined as the smallest value x such that F(x) ≥ p, where F is the distribution function.@en

117

INFO

lowercase_definition

STATO:0000146

IAO:0000115

a null hypothesis stating that there are differences observed between group of subjects@en

118

INFO

lowercase_definition

STATO:0000149

IAO:0000115

binomial logistic regression model is a model which attempts to explain data distribution associated with *dichotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is logistic function.@en

119

INFO

lowercase_definition

STATO:0000150

IAO:0000115

a minimum value is a data item which denotes the smallest value found in a dataset or resulting from a calculation.@en

120

INFO

lowercase_definition

STATO:0000151

IAO:0000115

maximum value is a data item which denotes the largest value found in a dataset or resulting from a calculation.@en

121

INFO

lowercase_definition

STATO:0000152

IAO:0000115

a quartile is a quantile which splits data into sections accrued of 25% of data, so the first quartile delineates 25% of the data, the second quartile delineates 50% of the data and the third quartile, 75 % of the data@en

122

INFO

lowercase_definition

STATO:0000154

IAO:0000115

a violin plot is a plot combining the features of box plot and kernel density plot. The violin plot is therefore similar to box plot but it incorporated in the display the probability density of the data at different values. Typically violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots.@en

123

INFO

lowercase_definition

STATO:0000155

IAO:0000115

meta-analysis is a data transformation which uses the effect size estimates from several independent quantitative scientific studies addressing the same question in order to assess finding consistency.@en

124

INFO

lowercase_definition

STATO:0000156

IAO:0000115

the Scheffe test is a data transformation which evaluates all possible contrasts and adjusting the levels significance by accounting for multiple comparison. The test is therefore conservative. Confidence intervals can be constructed for the corresponding linear regression. It was developped by American statistician Henry Scheffe in 1959.@en

125

INFO

lowercase_definition

STATO:0000157

IAO:0000115

the LSD test is a statistical test for multiple comparisons of treatments by means of least significant difference following an ANOVA analysis

126

INFO

lowercase_definition

STATO:0000158

IAO:0000115

a null hypothesis which states that a linkage exists between 2 categorical variables@en

127

INFO

lowercase_definition

STATO:0000161

IAO:0000115

variable distribution is data item which denotes the spatial resolution of data point making up a variable. variable distribution may be compared to a known probability distribution using goodness of fit test or plotting a quantile-quantile plot for visual assessment of the fit.@en

128

INFO

lowercase_definition

STATO:0000162

IAO:0000115

the role played by an entity part of study group as defined by an experimental design and realized in a data analysis and data interpretation@en

129

INFO

lowercase_definition

STATO:0000163

IAO:0000115

trimmed mean or truncated mean is a measure of central tendency which involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both@en

130

INFO

lowercase_definition

STATO:0000165

IAO:0000115

a pie chart is a graph in which a circular graph is divided into sector illustrating numerical proportion, meaning that the arc length of each sector (and consequently its central angle and area), is proportional to the quantity it represents.@en

131

INFO

lowercase_definition

STATO:0000166

IAO:0000115

the bart chart is a graph resulting from plotting rectangular bars with lengths proportional to the values that they represent.

132

INFO

lowercase_definition

STATO:0000167

IAO:0000115

the first quartile is a quartile which splits the lower 25 % of the data@en

133

INFO

lowercase_definition

STATO:0000168

IAO:0000115

a real time quantitative pcr plot is a line graph which plots the signal fluorescence intensity as a function of the number of PCR cycle@en

134

INFO

lowercase_definition

STATO:0000170

IAO:0000115

the first quartile is a quartile which splits the 75 % of the data@en

135

INFO

lowercase_definition

STATO:0000173

IAO:0000115

homogeneity testing objective is the objective of a data transformation to test a null hypothesis that two or more sub-groups of a population share the same distribution of a single categorical variable. For example, do people of different countries have the same proportion of smokers to non-smokers@en

136

INFO

lowercase_definition

STATO:0000175

IAO:0000115

confidence interval calculation is a data transformation which determines a confidence interval for a given statistical parameter@en

137

INFO

lowercase_definition

STATO:0000176

IAO:0000115

t-statistic is a statistic computed from observations and used to produce a p-value in statistical test when compared to a Student's t distribution.@en

138

INFO

lowercase_definition

STATO:0000177

IAO:0000115

the beta distribution is a continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution@en

139

INFO

lowercase_definition

STATO:0000180

IAO:0000115

standard normal distribution is a normal distribution with variance = 1 and mean=0@en

140

INFO

lowercase_definition

STATO:0000183

IAO:0000115

sphericity testing objective is a statistical objective of a data transformation which aims to test a null hypothesis of sphericity holds.@en

141

INFO

lowercase_definition

STATO:0000185

IAO:0000115

a 2 by n contingency table is a contingency table built for one dichotomous variable (a categorical variable with only 2 outcomes) and one polychotomous variable (a polychomotomous variable with at least 2 outcomes)@en

142

INFO

lowercase_definition

STATO:0000188

IAO:0000115

average log signal intensity is a data time which corresponds to the sum of 2 distinct logarithm base 2 transformed signal intensity, each corresponding to a distinct condition of signal acquisition, divided by 2.@en

143

INFO

lowercase_definition

STATO:0000191

IAO:0000115

a goodness of fit statistical test is a statistical test which aim to evaluate if a sample distribution can be considered equivalent to a theoretical distribution used as input@en

144

INFO

lowercase_definition

STATO:0000192

IAO:0000115

a cartesian product is a data transformation which operates on a n Sets to produce a set of all possible ordered n-tuples where each element of the tuple comes from a Set

145

INFO

lowercase_definition

STATO:0000193

IAO:0000115

is a population whose individual members realize (may be expressed as) a combination of inclusion rule values specifications or resulting from a sampling process (e.g. recruitment followed by randomization to group) on which a number of measurements will be carried out, which may be used as input to statistical tests and statistical inference.

146

INFO

lowercase_definition

STATO:0000194

IAO:0000115

self explanatory@en

147

INFO

lowercase_definition

STATO:0000197

IAO:0000115

a genomic coordinate system is a coordinate system to describe position of sequence on a genomic scaffold (assembly of chromosome, contig....)@en

148

INFO

lowercase_definition

STATO:0000198

IAO:0000115

a statistical test which makes no assumption about the underlying data distribution@en

149

INFO

lowercase_definition

STATO:0000199

IAO:0000115

the Mauchly's test for sphericity is a statistical test which evaluates if the variance of the differences between all combinations of the groups are equal, a property known as 'sphericity' in the context of repeated measures. It is used for instance prior to repeated measure ANOVA. The test works by assessing if a Wishart-distributed covariance matrix (or transformation thereof) is proportional to a given matrix.@en

150

INFO

lowercase_definition

STATO:0000200

IAO:0000115

the statistical test power is data item which is about a statistical test and is obtained by subtracting the false negative rate (type II error rate) to 1. The power of a statistical test is the probability that it will correctly lead to the rejection of a false null hypothesis (Greene 2000). The statistical power is the ability of a test to detect an effect, if the effect actually exists (High 2000).@en

151

INFO

lowercase_definition

STATO:0000202

IAO:0000115

within subject comparison statistical test is a kind of statistical test which evaluates if a change occurs within one experimental unit over time following a treatment or an event@en

152

INFO

lowercase_definition

STATO:0000203

IAO:0000115

a cohort is a study group population where the members are human beings which meet inclusion criteria and undergo a longitudinal design@en

153

INFO

lowercase_definition

STATO:0000204

IAO:0000115

the F-distribution is a continuous probability distribution which arises in the testing of whether two observed samples have the same variance.@en

154

INFO

lowercase_definition

STATO:0000207

IAO:0000115

a planned process which etablishes and states the different hypothesis to be evaluated during a null hypothesis statistical test@en

155

INFO

lowercase_definition

STATO:0000209

IAO:0000115

area under curve is a measurement datum which corresponds to the surface define by the x-axis and bound by the line graph represented in a 2 dimensional plot resulting from an integration or integrative calculus. The interpretation of this measurement datum depends on the variables plotted in the graph@en

156

INFO

lowercase_definition

STATO:0000210

IAO:0000115

is a data item formed by dividing the fluorescence intensity obtained in one channel to that obtained in the other channel, typically the case when considering 2-color microarray data when imaging is done for Cy3 and Cy5 dyes.@en

157

INFO

lowercase_definition

STATO:0000211

IAO:0000115

odds ratio homogeneity hypothesis is a null hypothesis stating that all odds ratio are homogenous, that is remain within the same range.@en

158

INFO

lowercase_definition

STATO:0000212

IAO:0000115

a tetrachoric correlation coefficient is a polychoric correlation coefficient for 2 dichotomous variables used as proxy for correlation between 2 continuous latent variables.@en

159

INFO

lowercase_definition

STATO:0000213

IAO:0000115

discretization as a processing converting a continuous variable into a polychotomous variable by concretizing a set of discretization rules@en

160

INFO

lowercase_definition

STATO:0000214

IAO:0000115

a confidence interval which covers 50% of the sampling distribution, meaning that there is a 50% risk of false positive (type I error)@en

161

INFO

lowercase_definition

STATO:0000215

IAO:0000115

probit regression model is a model which attempts to explain data distribution associated with *ordinal* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is the ordered probit function.@en

162

INFO

lowercase_definition

STATO:0000216

IAO:0000115

a stratum population is a population resulting from a population stratification prior to sampling process which aims to produce homogenous subpopulations from an heterogeneous population by applying one or more stratification criteria@en

163

INFO

lowercase_definition

STATO:0000217

IAO:0000115

a null hypothesis which states that a given matrix is proportional to a Wishart-distributed covariance matrix@en

164

INFO

lowercase_definition

STATO:0000219

IAO:0000115

a real time pcr standard curve is a line graph which plots the fluorescence intensity signal as a function of the concentration of a sample used as reference and used to determine relative abundance of test samples@en

165

INFO

lowercase_definition

STATO:0000220

IAO:0000115

the false negative rate is a data item which denotes the proportion of missed detection of elements known to be meeting the detection criteria@en

166

INFO

lowercase_definition

STATO:0000221

IAO:0000115

a random variable (or aleatory variable or stochastic variable) in probability and statistics, is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense)@en

167

INFO

lowercase_definition

STATO:0000222

IAO:0000115

graeco-latin square design is_a study design which allows in its simpler form controlling 3 levels of nuisance variables (also known as blocking variables). The 3 nuisance factors are divided into a tabular grid with the property that each row and each column receive each treatment exactly once.@en

168

INFO

lowercase_definition

STATO:0000223

IAO:0000115

group assignment based on blocking variable specification is a kind of group assignment process which takes into account the levels assumed by a blocking variable to allocate subjects or experimental units to a treatment group@en

169

INFO

lowercase_definition

STATO:0000227

IAO:0000115

a normal distribution is a continuous probability distribution described by a probability distribution function described here: http://mathworld.wolfram.com/NormalDistribution.html@en

170

INFO

lowercase_definition

STATO:0000228

IAO:0000115

ordinal variable is a categorical variable where the discrete possible values are ordered or correspond to an implicit ranking@en

171

INFO

lowercase_definition

STATO:0000230

IAO:0000115

the expected value (or expectation, mathematical expectation, EV, mean, or the first moment) of a random variable is a data item which corresponds to the weighted average of all possible values that this random variable can take on. The weights used in computing this average correspond to the probabilities in case of a discrete random variable, or densities in case of a continuous random variable. From a rigorous theoretical standpoint, the expected value is the integral of the random variable with respect to its probability measure.@en

172

INFO

lowercase_definition

STATO:0000231

IAO:0000115

a confidence interval which covers 95% of the sampling distribution, meaning that there is a 5% risk of false positive (type I error). If the number of observations made is large enough, the sampling distribution can be assumed to be normal, which entails that 95% of the sampling distributions falls within roughly2 (1.96) standard deviations from the mean.@en

173

INFO

lowercase_definition

STATO:0000232

IAO:0000115

number of PCR cycle is a count which enumerates how many iterations of 'annealing, renaturation, amplification,' rounds (or cycles) are performed during a polymerase chain reaction (PCR) or an assay relying on PCR.@en

174

INFO

lowercase_definition

STATO:0000233

IAO:0000115

sensitivity is a measurement datum qualifying a binary classification test and is computed by substracting the false negative rate to the integral numeral 1@en

175

INFO

lowercase_definition

STATO:0000234

IAO:0000115

a residual is a data item which is the output of an error estimate or model fitting process and which is an observable estimate of the unobservable error@en

176

INFO

lowercase_definition

STATO:0000236

IAO:0000115

the coefficient of variation is a normalized measure of dispersion of a probability distribution of frequency distribution.@en

177

INFO

lowercase_definition

STATO:0000238

IAO:0000115

high content screening is a kind of investigation which uses a standardized cellular assays to test the effect of substances (RNAi or small molecules) held in libraries on a cellular phenotype. it relies on microscopy imaging and or flow-cytometry, robotic handling to ensure fast and high-throughput.@en

178

INFO

lowercase_definition

STATO:0000239

IAO:0000115

high throughput screening is a kind of investigation which uses a standardized assays (cell based, enzymatic or chemometric) to test the effect of substances (RNAi or small molecules) held in libraries on a very specific and measureable outcome (e.g fluorence intensity). it relies on robotic handling to ensure fast and high-throughput in assay performance, data acquisition and hit selection.@en

179

INFO

lowercase_definition

STATO:0000242

IAO:0000115

statistical error is an data item denoting the amount by which an observation differs from the expected value, being based on the whole statistical population from which the statistical unit was chosen randomly@en

180

INFO

lowercase_definition

STATO:0000243

IAO:0000115

a box plot is a graph which plots datasets relying on their quartiles and the interquartile range to create the box and the whiskers.@en

181

INFO

lowercase_definition

STATO:0000244

IAO:0000115

(Rn +) − (Rn −), where Rn + = (emission intensity of reporter dye)/(emission intensity of passive reference dye) in PCR with template and Rn − = (emission intensity of reporter dye)/(emission intensity of passive reference dye) in PCR without template or early cycles of a real-time reaction. Ct = threshold cycle, i.e., cycle at which a statistically significant increase in ΔRn is first detected@en

182

INFO

lowercase_definition

STATO:0000247

IAO:0000115

odds ratio homogeneity test is a statistical test which aims to evaluate that null the hypothesis of consistency odds ratio accross different strata of population is true or not@en

183

INFO

lowercase_definition

STATO:0000248

IAO:0000115

a blocking variable is a independent variable which is used in a blocking process part of an experiment with the purpose of maximizing the signal coming from the main variable.

184

INFO

lowercase_definition

STATO:0000249

IAO:0000115

a DNA microarray hybridization is an assay relying on nucleic acid hybridization , which uses a DNA microarray device and a nucleic acid as input. It precedes a data acquisition process@en

185

INFO

lowercase_definition

STATO:0000250

IAO:0000115

group comparison objective is a data transformation objective which aims to determine if 2 or more study group differ with respect to the signal of a response variable@en

186

INFO

lowercase_definition

STATO:0000252

IAO:0000115

a categorical variable is a variable which that can only assume a finite number of value and cast observation in a small number of categories@en

187

INFO

lowercase_definition

STATO:0000253

IAO:0000115

the objective of a data transformation to test a null hypothesis of absence of difference within subject holds.@en

188

INFO

lowercase_definition

STATO:0000255

IAO:0000115

the objective of a data transformation to test a null hypothesis of absence of difference withing subject holds.@en

189

INFO

lowercase_definition

STATO:0000256

IAO:0000115

a manhattan plot for gwas is a kind of scatter plot used to facilitate presentation of genome-wide association study (GWAS) data. Genomic coordinates are displayed along the X-axis, with the negative logarithm of the association P-value for each single nucleotide polymorphism displayed on the Y-axis.@en

190

INFO

lowercase_definition

STATO:0000258

IAO:0000115

a variable is a data item which can assume any of a set of values, either as determined by an agent or as randomly occuring through observation.@en

191

INFO

lowercase_definition

STATO:0000259

IAO:0000115

the relationship between a fraction and the number below the line (or divisor)@en

192

INFO

lowercase_definition

STATO:0000260

IAO:0000115

repeated measure ANOVA is a kind of ANOVA specifically developed for non-independent observations as found when repeated measurements on the sample experimental unit. repeated measure ANOVA is sensitive to departure from normality (evaluation using Bartlett's test), more so in the case of unbalanced groups (i.e. different sizes of sample populations). Departure from sphericity (evaluation using Mauchly'test) used to be an issue which is now handled robustly by modern tools such as R's lme4 or nlme, which accommodate dependence assumptions other than sphericity.@en

193

INFO

lowercase_definition

STATO:0000264

IAO:0000115

a factor level combination is one a possible sets of factor levels resulting from the cartesian product of sets of factor and their levels as defined in a factorial design@en

194

INFO

lowercase_definition

STATO:0000267

IAO:0000115

grouped bar chart is a kind of bar chart which juxtaposes the discrete values for each of the possible value of a given categorical variable, thus providing within group comparison. Grouped bar charts are good for comparing between each element in the categories, and comparing elements across categories. However, the grouping can make it harder to tell the difference between the total of each group.@en

195

INFO

lowercase_definition

STATO:0000269

IAO:0000115

polychoric correlation coefficient is a correlation coefficient which is computed over 2 variables to characterise an association by proxy with 2 (latent) variables which are assumed to be continuous and normally distributed.@en

196

INFO

lowercase_definition

STATO:0000270

IAO:0000115

a full factorial design is a factorial design which ensures that all possible factor level combinations are defined and used so all between group differences can be explored@en

197

INFO

lowercase_definition

STATO:0000271

IAO:0000115

permutation numbering is a data tranformation allowing to count the number of possible permutations of elements in a set of size n, each element occurring exactly once. This number is factorial n.@en

198

INFO

lowercase_definition

STATO:0000274

IAO:0000115

receiver operational characteristics curve is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold (aka cut-off point) is varied by plotting sensitivity vs (1 − specificity)@en

199

INFO

lowercase_definition

STATO:0000277

IAO:0000115

hit selection is a planned process which in screening processes such as high-throughput screening, lead to the identification of perturbing agent which cause the typical signal generated by a standardized assay to significantly differ from the negative control. The selection hitself results from meeting or exceeding selection threshold (for instance 6 sigma from the mean or SSMD value beyond 5 when compared to positive controls or below -5 when compared to negative controls@en

ROBOT Report - stato

Types of errors

Error breakdown