ROBOT Report - stato

Download TSV

Types of errors

LevelNumber of errors
INFO341
WARN78

Error breakdown

RuleNumber of errors
lowercase_definition341
annotation_whitespace53
equivalent_class_axiom_no_genus16
missing_definition6
multiple_equivalent_classes2
duplicate_label_synonym1

Click on any term to redirect to the term page.

Row Level Rule Name Subject Property Value
0 WARN annotation_whitespace STATO:0000011 IAO:0000119 adapted from Wolfram Alpha: https://www.wolframalpha.com/input/?i=cartesian+coordinates&lk=4&num=6&lk=4&num=6 @en
1 WARN annotation_whitespace STATO:0000023 IAO:0000119 A Dictionary of Statistics (2 rev ed.), OUP. ISBN-13: 9780199541454 http://www.oxfordreference.com/view/10.1093/acref/9780199541454.001.0001/acref-9780199541454-e-1588 @en
2 WARN annotation_whitespace STATO:0000049 IAO:0000119 STATO, adapted from wikipedia (http://en.wikipedia.org/wiki/Hardy–Weinberg_principle)
3 WARN annotation_whitespace STATO:0000051 dc11:source NIST: http://www.itl.nist.gov/div898/handbook/eda/section3/eda366j.htm @en
4 WARN annotation_whitespace STATO:0000069 IAO:0000119 http://www.optique-ingenieur.org/en/courses/OPI_ang_M07_C01/co/Contenu_07.html@en
5 WARN annotation_whitespace STATO:0000075 STATO:0000041 >library(vegan) >rarefaction(x, subsample=5, plot=TRUE, color=TRUE, error=FALSE, legend=TRUE, symbol) http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/vegan/html/vegan-package.html @en
6 WARN annotation_whitespace STATO:0000083 IAO:0000117 Philippe Rocca-Serra @en
7 WARN annotation_whitespace STATO:0000095 STATO:0000041 http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.htmlt.test(dependent variable ~ independant variable, data = dataset, var.equal = FALSE, paired= TRUE)@en
8 WARN annotation_whitespace STATO:0000100 IAO:0000119 adapted from http://htaglossary.net/standardised+mean+difference+(SMD) @en
9 WARN annotation_whitespace STATO:0000112 IAO:0000119 adapted from Wikipedia: http://en.wikipedia.org/wiki/Funnel_plot @en
10 WARN annotation_whitespace STATO:0000120 STATO:0000041 http://cran.r-project.org/web/packages/beanplot/index.html@en
11 WARN annotation_whitespace STATO:0000133 IAO:0000119 adapted from wikipedia: http://en.wikipedia.org/wiki/Post-hoc_analysis last accessed: 2013-11-15 @en
12 WARN annotation_whitespace STATO:0000143 IAO:0000119 adapted from wikipedia @en
13 WARN annotation_whitespace STATO:0000148 IAO:0000115 The Cochran-Armitage test is a statistical test used in categorical data analysis when the aim is to assess for the presence of an association between a dichotomous variable (variable with two categories) and a polychotomous variable (a variable with k categories). The two-level variable represents the response, and the other represents an explanatory variable with ordered levels. The null hypothesis is the hypothesis of no trend, which means that the binomial proportion is the same for all levels of the explanatory variable For example, doses of a treatment can be ordered as 'low', 'medium', and 'high', and we may suspect that the treatment benefit cannot become smaller as the dose increases. The trend test is often used as a genotype-based test for case-control genetic association studies. @en
14 WARN annotation_whitespace STATO:0000164 IAO:0000115 The interquartile range is a data item which corresponds to the difference between the upper quartile (3rd quartile) and lower quartile (1st quartile). The interquartile range contains the second quartile or median. The interquartile range is a data item providing a measure of data dispersion @en
15 WARN annotation_whitespace STATO:0000169 IAO:0000116 30/04/2014 - removed restriction: 'is about' exactly 2 'study group population' - need more discussion for the relationship of fold change to study group populations for particular examples.
16 WARN annotation_whitespace STATO:0000184 IAO:0000115 A ratio is a data item which is formed with two numbers r and s is written r/s, where r is the numerator and s is the denominator. The ratio of r to s is equivalent to the quotient r/s. @en
17 WARN annotation_whitespace STATO:0000186 IAO:0000116 TODO: create 'inverse function' and replace 'data transformation' in the assertions @en
18 WARN annotation_whitespace STATO:0000188 IAO:0000119 adapted from wikipedia: http://en.wikipedia.org/wiki/MA_plot last accessed: 2014-03-13 @en
19 WARN annotation_whitespace STATO:0000199 IAO:0000119 AGB-PRS, adapted from wikipedia (http://en.wikipedia.org/wiki/Mauchly's_sphericity_test) polled on june,10th, 2013 and from R manual: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/mauchly.test.html @en
20 WARN annotation_whitespace STATO:0000204 STATO:0000041 df(x, df1, df2, ncp, log = FALSE) http://stat.ethz.ch/R-manual/R-patched/library/stats/html/Fdist.html @en
21 WARN annotation_whitespace STATO:0000212 IAO:0000119 adapted from: http://www.rasch.org/rmt/rmt193c.htm and http://en.wikipedia.org/wiki/Polychoric_correlation @en
22 WARN annotation_whitespace STATO:0000219 IAO:0000119 adapted from: http://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Sigma/General_Information/qpcr_technical_guide.pdf and http://www.lifetechnologies.com/uk/en/home/life-science/pcr/real-time-pcr/qpcr-education/absolute-vs-relative-quantification-for-qpcr.html @en
23 WARN annotation_whitespace STATO:0000257 IAO:0000116 import from Population and Community Ontology: http://www.ontobee.org/browser/rdf.php?o=PCO&iri=http://purl.obolibrary.org/obo/PCO_0000020 @en
24 WARN annotation_whitespace STATO:0000263 IAO:0000115 Galbraith (Radial) plot is a scatter plot which can be used in the meta-analytic context to examine the data for heterogeneity. For a fixed-effects model, the plot shows the inverse of the standard errors on the horizontal axis against the individual observed effect sizes or outcomes standardized by their corresponding standard errors on the vertical axis. Radial plots were introduced by Rex Galbraith (1988a, 1988b, 1994). @en
25 WARN annotation_whitespace STATO:0000263 STATO:0000041 http://www.inside-r.org/packages/cran/Luminescence/docs/plot_RadialPlotplot_RadialPlot(data, na.exclude = TRUE, negatives = \"remove\", log.z = TRUE, central.value, centrality = \"mean.weighted\", plot.ratio, bar.col, grid.col, legend.text, summary = FALSE, stats, line, line.col, line.label, output = FALSE, ...)@en
26 WARN annotation_whitespace STATO:0000269 IAO:0000119 adapted from: http://www.rasch.org/rmt/rmt193c.htm and http://en.wikipedia.org/wiki/Polychoric_correlation @en
27 WARN annotation_whitespace STATO:0000284 IAO:0000115 Breusch-Pagan test is a statistical test which computes a score test of the hypothesis of constant error variance against the alternative that the error variance changes with the level of the response (fitted values), or with a linear combination of predictors. @en
28 WARN annotation_whitespace STATO:0000284 STATO:0000041 http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/lmtest/html/bptest.htmlbptest(formula, varformula = NULL, studentize = TRUE, data = list())orhttp://www.inside-r.org/packages/cran/car/docs/ncvTest@en
29 WARN annotation_whitespace STATO:0000285 IAO:0000112 http://www.ncbi.nlm.nih.gov/pubmed/?term=17182697Bioinformatics. 2007 Feb 15;23(4):401-7.Enrichment or depletion of a GO category within a class of genes: which test?Rivals I1, Personnaz L, Taing L, Potier MC.@en
30 WARN annotation_whitespace STATO:0000289 IAO:0000112 let's consider an experiment evaluating 2 compounds (aspirin & ibuprofen) at 3 distinct dose levels (low, medium, high) and 4 time points post exposure (0h, 6h, 12h, 24h). Assuming the treatments are applied only once (no replication), the number of observation in a full factorial design is 2 x 3 x 4 = 24 so the design matrix would have 24 rows and 3 columns (1 per factor (independent variable). @en
31 WARN annotation_whitespace STATO:0000289 STATO:0000041 model.matrix(object, data = environment(object), contrasts.arg = NULL, xlev = NULL, ...) http://stat.ethz.ch/R-manual/R-patched/library/stats/html/model.matrix.html @en
32 WARN annotation_whitespace STATO:0000298 IAO:0000119 adapted from: http://en.wikipedia.org/wiki/Binomial_test @en
33 WARN annotation_whitespace STATO:0000301 IAO:0000115 The covariance is a measurement data item about the strength of correlation between a set (2 or more) of random variables. The covariance is obtained by forming: cov(X,Y)=E([X-E(X)][Y-E(Y)] where E(X), E(Y) is the expected value (mean) of variable X and Y respectively. covariance is symmetric so cov(X,Y)=cov(Y,X). The covariance is usefull when looking at the variance of the sum of the 2 random variables since: var(X+Y) = var(X) +var(Y) +2cov(X,Y) The covariance cov(x,y) is used to obtain the coefficient of correlation cor(x,y) by normalizing (dividing) cov(x,y) but the product of the standard deviations of x and y. @en
34 WARN annotation_whitespace STATO:0000303 IAO:0000119 adapted from: http://en.wikipedia.org/wiki/Student's_t-test#Independent_.28unpaired.29_samples and from: http://www.psychology.emory.edu/clinical/bliwise/Tutorials/TOM/meanstests/tind.htm @en
35 WARN annotation_whitespace STATO:0000319 IAO:0000119 adapted from : http://en.wikipedia.org/wiki/Effect_size#Cohen.27s_d and http://blog.stata.com/tag/cohens-d/ @en
36 WARN annotation_whitespace STATO:0000320 IAO:0000119 adapted from : http://en.wikipedia.org/wiki/Effect_size#Cohen.27s_d and http://blog.stata.com/tag/cohens-d/ @en
37 WARN annotation_whitespace STATO:0000325 IAO:0000115 The Akaike information criterion (AIC) is a measure of the relative quality of a statistical model for a given set of data. As such, AIC provides a means for model selection. AIC is defined as: AIC = 2K - 2log(L) where K is the number of predictors and L is the maximized likelihood value. AIC deals with the trade-off between the goodness of fit of the model and the complexity of the model. It is founded on information theory: it offers a relative estimate of the information lost when a given model is used to represent the process that generates the data. AIC does not provide a test of a model in the sense of testing a null hypothesis; i.e. AIC can tell nothing about the quality of the model in an absolute sense. If all the candidate models fit poorly, AIC will not give any warning of that. @en
38 WARN annotation_whitespace STATO:0000377 IAO:0000119 http://en.wikipedia.org/wiki/Deviance_%28statistics%29
39 WARN annotation_whitespace STATO:0000389 IAO:0000115 a power-law probability distribution is a probability distribution whose density function (or mass function in the discrete case) has the form p(x) = L(x) . x^{-alpha} where alpha is a parameter >1 and L(x) is a slowly varying function. @en
40 WARN annotation_whitespace STATO:0000393 IAO:0000115 the Pareto type-II probability distribution is a continuous probability distribution which is defined by a probability density function characterized by 2 parameters, alpha and lambda, 2 real, strictly positive numbers. alpha is known as the shape parameter while lambda is known as the scale parameter. the function defines the probably of a continous random variable according to the following: p(x) = {\alpha \over \lambda} \left[{1+ {x \over \lambda}}\right]^{-(\alpha+1)}, \qquad x \geq 0, @en
41 WARN annotation_whitespace STATO:0000397 STATO:0000041 http://personality-project.org/r/html/harmonic.mean.htmlUsage: > harmonic.mean(x,na.rm=TRUE)Arguments: x, a vector, matrix, or data.framena.rm, na.rm=TRUE remove NA values before processing@en
42 WARN annotation_whitespace STATO:0000422 IAO:0000115 The L’Abbé plot was introduced in 1987 in the context of meta-analyses of clinical trials with dichotomous (binary) outcomes, as a plot of observed risks in the treatment group against observed risks in the control group. Another formulation is that it plots the event rate in the experimental (intervention) group against the event rate in the control group, as an aid to exploring the heterogeneity of effect estimates within a meta-analysis. It is diagram used in meta-analysis that compares the risks observed in the experimental and control arms of clinical trials. Each trial is located in the space of a diagram where the sizes of the circles indicate the sizes of the trials. Trials in which the experimental treatment had a higher risk than the control will be in the upper left of the plot. If risk in the both groups is the same the circle will fall on the line of equality. If the control treatment has a higher risk than the experimental treatment then the point will be in the lower right of the plot. It is often used as an indicator of heterogeneity and hence as an indicator of the likelihood that results from different trials can be validly combined. Named after Kristin L'Abbé. @en
43 WARN annotation_whitespace STATO:0000423 IAO:0000119 adapted from: http://handbook.cochrane.org/chapter_9/9_2_2_4_measure_of_absolute_effect_the_risk_difference.htm @en
44 WARN annotation_whitespace STATO:0000434 IAO:0000115 Cochran's Q test is a statistical test used for unreplicated randomized block design experiments with a binary response variable and paired data. In the analysis of two-way randomized block designs where the response variable can take only two possible outcomes (coded as 0 and 1), Cochran's Q test is a non-parametric statistical test to verify whether k treatments have identical effects. @en
45 WARN annotation_whitespace STATO:0000440 STATO:0000041 dixon.outliers(data) from: http://finzi.psych.upenn.edu/library/referenceIntervals/html/dixon.outliers.html @en
46 WARN annotation_whitespace STATO:0000442 STATO:0000041 FindOutliersTietjenMooreTest(dataSeries,k,alpha=0.05) from: https://rdrr.io/rforge/climtrends/man/findOutliers.Tietjen.Moore.test.html @en
47 WARN annotation_whitespace STATO:0000443 STATO:0000041 rgrubbs.test(x, alpha = 0.05) from: http://finzi.psych.upenn.edu/library/OutlierDM/html/rgrubbs.test.html @en
48 WARN annotation_whitespace STATO:0000445 IAO:0000115 a split split plot design is a study design where restricted randomization affect 2 study factors (and not 1 as in split-plot design). Such design is only possible if at least 3 independent variables are present. @en
49 WARN annotation_whitespace STATO:0000445 IAO:0000119 adapted from https://onlinecourses.science.psu.edu/stat503/node/72 last accessed 2016/12/15 @en
50 WARN annotation_whitespace STATO:0000446 IAO:0000115 Restricted randomization is a kind of randomization which is used or occured when hard to change factors exist in a study design. In other words, when complete randomization is not possible, a case of restricted randomization exists, for instance in the case of split-plot design. Restricted randomization allows intuitively poor allocations of treatments to experimental units to be avoided, while retaining the theoretical benefits of randomization. Restricted randomization can also result from an unplanned event and is then something that should be avoided. RandomizeR R package can be used to detect such events and assess the quality of randomization process. @en
51 WARN annotation_whitespace STATO:0000446 IAO:0000119 Adapted from Wikipedia: https://en.wikipedia.org/wiki/Restricted_randomization last accessed: 2016/12/15 @en
52 WARN annotation_whitespace http://purl.obolibrary.org/obo/stato.owl dc11:rights This Ontology is distributed under a Creative Commons Attribution License ^^http://www.w3.org/2001/XMLSchema#anyURI
53 WARN duplicate_label_synonym STATO:0000239 IAO:0000118 high throughput screening@en
54 WARN equivalent_class_axiom_no_genus STATO:0000027 OBI:0000417 STATO:0000121
55 WARN equivalent_class_axiom_no_genus STATO:0000033 OBI:0000312 OBI:0200117
56 WARN equivalent_class_axiom_no_genus STATO:0000046 BFO:0000051 STATO:0000223
57 WARN equivalent_class_axiom_no_genus STATO:0000046 STATO:0000001 STATO:0000248
58 WARN equivalent_class_axiom_no_genus STATO:0000085 OBI:0000295 STATO:0000175
59 WARN equivalent_class_axiom_no_genus STATO:0000119 OBI:0000299 STATO:0000144
60 WARN equivalent_class_axiom_no_genus STATO:0000131 OBI:0000417 STATO:0000183
61 WARN equivalent_class_axiom_no_genus STATO:0000133 BFO:0000062 OBI:0200201
62 WARN equivalent_class_axiom_no_genus STATO:0000137 OBI:0000417 STATO:0000226
63 WARN equivalent_class_axiom_no_genus STATO:0000191 OBI:0000417 STATO:0000224
64 WARN equivalent_class_axiom_no_genus STATO:0000202 OBI:0000417 STATO:0000253
65 WARN equivalent_class_axiom_no_genus STATO:0000247 OBI:0000417 STATO:0000173
66 WARN equivalent_class_axiom_no_genus STATO:0000279 OBI:0000417 STATO:0000255
67 WARN equivalent_class_axiom_no_genus STATO:0000337 OBI:0000299 STATO:0000485
68 WARN equivalent_class_axiom_no_genus STATO:0000443 OBI:0000417 STATO:0000439
69 WARN equivalent_class_axiom_no_genus STATO:0000471 STATO:0000403 STATO:0000039
70 WARN missing_definition STATO:0000342 IAO:0000115
71 WARN missing_definition STATO:0000344 IAO:0000115
72 WARN missing_definition STATO:0000345 IAO:0000115
73 WARN missing_definition STATO:0000380 IAO:0000115
74 WARN missing_definition STATO:0000381 IAO:0000115
75 WARN missing_definition STATO:0000382 IAO:0000115
76 WARN multiple_equivalent_classes STATO:0000046 owl:equivalentClass blank node
77 WARN multiple_equivalent_classes STATO:0000137 owl:equivalentClass blank node
78 INFO lowercase_definition STATO:0000001 IAO:0000115 property to indicate that a design declares a variable; the inverse property is 'is declared by'@en
79 INFO lowercase_definition STATO:0000002 IAO:0000115 an electronic file is an information content entity which conforms to a specification or format and which is meant to hold data and information in digital form, accessible to software agents@en
80 INFO lowercase_definition STATO:0000003 IAO:0000115 a balanced design is a an experimental design where all experimental group have the an equal number of subject observations@en
81 INFO lowercase_definition STATO:0000004 IAO:0000115 property to indicate the variables declared by a design; the inverse property is 'declares'@en
82 INFO lowercase_definition STATO:0000005 IAO:0000115 a single factor design is a study design which declares exactly 1 independent variable@en
83 INFO lowercase_definition STATO:0000006 IAO:0000115 x-axis is a cartesian coordinate axis which is orthogonal to the y-axis and the z-axis@en
84 INFO lowercase_definition STATO:0000007 IAO:0000115 an axis is a line graph used as reference line for the measurement of coordinates.@en
85 INFO lowercase_definition STATO:0000008 IAO:0000115 y-axis is a cartesian coordinate axis which is orthogonal to the x-axis and the z-axis@en
86 INFO lowercase_definition STATO:0000011 IAO:0000115 a cartesian axis is one of 3 the axis in a cartesian coordinate system defining a referential in 3 dimensions. each of the axis is orthogonal to the other 2@en
87 INFO lowercase_definition STATO:0000012 IAO:0000115 z-axis is a cartesian coordinate axis which is orthogonal to the x-axis and the y-axis@en
88 INFO lowercase_definition STATO:0000013 IAO:0000115 a 2 dimensional cartesian coordinate system is a cartesian coordinate system which defines 2 orthogonal one dimensional axes and which may be used to describe a 2 dimensional spatial region.
89 INFO lowercase_definition STATO:0000019 IAO:0000115 normal distribution hypothesis is a goodness of fit hypothesis stating that the distribution computed from the sample population fits a normal distribution.@en
90 INFO lowercase_definition STATO:0000021 IAO:0000115 a confidence interval which covers 90% of the sampling distribution, meaning that there is a 90% risk of false positive (type I error)@en
91 INFO lowercase_definition STATO:0000024 IAO:0000115 a three dimensional cartesian coordinate system is a cartesian coordinate system which defines 3 orthogonal one dimensional axes and which may be used to describe a 3 dimensional spatial region.
92 INFO lowercase_definition STATO:0000027 IAO:0000115 linkage between 2 categorical variable test is a statistical test which evaluates if there is an association between a predictor variable assuming discrete values and a response variable also assuming discrete values@en
93 INFO lowercase_definition STATO:0000028 IAO:0000115 measure of variation or statistical dispersion is a data item which describes how much a theoritical distribution or dataset is spread.@en
94 INFO lowercase_definition STATO:0000029 IAO:0000115 a measure of central tendency is a data item which attempts to describe a set of data by identifying the value of its centre.@en
95 INFO lowercase_definition STATO:0000031 IAO:0000115 binary classification (or binomial classification) is a data transformation which aims to cast members of a set into 2 disjoint groups depending on whether the element have a given property/feature or not.@en
96 INFO lowercase_definition STATO:0000032 IAO:0000115 an alternative term used for STATO statistical ontology and ISA team@en
97 INFO lowercase_definition STATO:0000034 IAO:0000115 a model parameter is a data item which is part of a model and which is meant to characterize an theoritecal or unknown population. a model parameter may be estimated by considering the properties of samples presumably taken from the theoritecal population@en
98 INFO lowercase_definition STATO:0000035 IAO:0000115 the range is a measure of variation which describes the difference between the lowest score and the highest score in a set of numbers (a data set)
99 INFO lowercase_definition STATO:0000038 IAO:0000115 a set of 2 subjects which result from a pairing process which assigns subject to a set based on a pairing rule/criteria@en
100 INFO lowercase_definition STATO:0000039 IAO:0000115 a statistic is a measurement datum to describe a dataset or a variable. It is generated by a calculation on set of observed data.@en
101 INFO lowercase_definition STATO:0000040 IAO:0000115 an MA plot is a scatter plot of the log intensity ratios M = log_2(T/R) versus the average log intensities A = log_2(T*T)/2, where T and R represent the signal intensities in the test and reference channels respectively.@en
102 INFO lowercase_definition STATO:0000041 IAO:0000115 a R command syntax or link to a R documentation in support of Statistical Ontology Classes or Data Transformations@en
103 INFO lowercase_definition STATO:0000043 IAO:0000115 a false positive rate whose value is 5 per cent@en
104 INFO lowercase_definition STATO:0000044 IAO:0000115 one-way anova is an analysis of variance where the different groups being compared are associated with the factor levels of only one independent variable. The null hypothesis is an absence of difference between the means calculated for each of the groups. The test assumes normality and equivariance of the data.@en
105 INFO lowercase_definition STATO:0000045 IAO:0000115 two-way anova is an analysis of variance where the different groups being compared are associated the factor levels of exatly 2 independent variables. The null hypothesis is an absence of difference between the means calculated for each of the groups. The test assumes normality and equivariance of the data.@en
106 INFO lowercase_definition STATO:0000046 IAO:0000115 a block design is a kind of study design which declares a blocking variable (also known as nuisance variable) in order to account for a known source of variation and reduce its impact on the acquisition of the signal@en
107 INFO lowercase_definition STATO:0000047 IAO:0000115 a count is a data item denoted by an integer and represented the number of instances or occurences of an entity@en
108 INFO lowercase_definition STATO:0000050 IAO:0000115 signal to noise ratio is a measurement datum comparing the amount of meaningful, useful or interesting data (the signal) to the amount of irrelevant or false data (the noise). Depending on the field and domain of application, different variables will be used to determinate a 'signal to noise ratio'. In statistics, the definition of signal to noise ratio is the ratio of the mean of a measurement to its standard deviation. It thus corresponds to the inverse of the coefficient of variation@en
109 INFO lowercase_definition STATO:0000053 IAO:0000115 a false positive rate is a data item which accounts for the proportion of incorrect rejection of a true null hypothesis.@en
110 INFO lowercase_definition STATO:0000054 IAO:0000115 homoskedasticity states that all variances under consideration are homogenous.@en
111 INFO lowercase_definition STATO:0000055 IAO:0000115 chromosome coordinate system is a genomic coordinate which uses chromosome of a particular assembly build process to define start and end positions. This coordinate system is unstable and will change with each new genome sequence assembly build.@en
112 INFO lowercase_definition STATO:0000056 IAO:0000115 a null hypothesis which states that no linkage exists between 2 categorical variables@en
113 INFO lowercase_definition STATO:0000058 IAO:0000115 goodness of fit hypothesis is a null hypothesis stating that the distribution computed from the sample population fits a theoretical distribution or that a dataset can be correctly explained by a model@en
114 INFO lowercase_definition STATO:0000059 IAO:0000115 the Student's t distribution is a continuous probability distribution which arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.@en
115 INFO lowercase_definition STATO:0000060 IAO:0000115 hypergeometric distribution is a probability distribution that describes the probability of k successes in n draws from a finite population of size N containing K successes without replacement@en
116 INFO lowercase_definition STATO:0000062 IAO:0000115 is a null hypothesis stating that there are no difference observed across a series of measurements made one same subject.@en
117 INFO lowercase_definition STATO:0000063 IAO:0000115 genomic coordinate datum is a data item which denotes a genomic position expressed using a genomic coordinate system@en
118 INFO lowercase_definition STATO:0000064 IAO:0000115 sequence read count is a data item determining how many sequence reads generated by a DNA sequencing assay for a given stretch of DNA can counted
119 INFO lowercase_definition STATO:0000067 IAO:0000115 a continuousprobability distribution is a probability distribution which is defined by a probability density function@en
120 INFO lowercase_definition STATO:0000071 IAO:0000115 reaction rate is a measurement datum which represents the speed of a chemical reaction turning reactive species into product species of event (i.e the number of such conversions)s occuring over a time interval@en
121 INFO lowercase_definition STATO:0000072 IAO:0000115 substrate concentration is a scalar measurement datum which denotes the amount of molecular entity involved in an enzymatic reaction (or catalytic chemical reaction) and whose role in that reaction is as substrate.@en
122 INFO lowercase_definition STATO:0000075 IAO:0000115 a rarefaction curve is a graph used for estimating species richness in ecology studies@en
123 INFO lowercase_definition STATO:0000080 IAO:0000115 the Brown Forsythe test is a statistical test which evaluates if the variance of different groups are equal. It relies on computing the median rather than the mean, as used in the Levene's test for homoschedacity. This test maybe used to, for instance, ensure that the conditions of applications of ANOVA are met.@en
124 INFO lowercase_definition STATO:0000082 IAO:0000115 a fixed effect model is a statistical model which represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random.@en
125 INFO lowercase_definition STATO:0000084 IAO:0000115 multinomial logistic regression model is a model which attempts to explain data distribution associated with *polychotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is probit function.@en
126 INFO lowercase_definition STATO:0000085 IAO:0000115 effect size estimate is a data item about the direction and strength of the consequences of a causative agent as explored by statistical methods. Those methods produce estimates of the effect size, e.g. confidence interval@en
127 INFO lowercase_definition STATO:0000086 IAO:0000115 an F-test is a statistical test which evaluates that the computed test statistics follows an F-distribution under the null hypothesis. The F-test is sensitive to departure from normality. F-test arise when decomposing the variability in a data set in terms of sum of squares.@en
128 INFO lowercase_definition STATO:0000087 IAO:0000115 a polychotomous variable is a categorical variable which is defined to have minimally 2 categories or possible values@en
129 INFO lowercase_definition STATO:0000088 IAO:0000115 statistical sample size is a count evaluating the number of individual experimental units@en
130 INFO lowercase_definition STATO:0000089 IAO:0000115 a case-control study design is a observation study design which assess the risk of particular outcome (a trait or a disease) associated with an event (either an exposure or endogenous factor). A case-control study design therefore declares an exposure variable which is dichotomous in nature (exposed/non-exposed) and an outcome variable, which is also dichotomous (case or control), thus giving the name to the design. During the execution of the design, a case control study defines a population and counts the events to determine their frequency.@en
131 INFO lowercase_definition STATO:0000090 IAO:0000115 a dichotomous variable is a categorical variable which is defined to have only 2 categories or possible values@en
132 INFO lowercase_definition STATO:0000095 IAO:0000115 paired t-test is a statistical test which is specifically designed to analysis differences between paired observations in the case of studies realizing repeated measures design with only 2 repeated measurements per subject (before and after treatment for example)@en
133 INFO lowercase_definition STATO:0000096 IAO:0000115 stratification is a planned process which executes a stratification rule using as input a population and assign it member to mutually exclusive subpopulation based on the values defined by the stratification rule@en
134 INFO lowercase_definition STATO:0000099 IAO:0000115 a random effect(s) model, also called a variance components model, is a kind of hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of different populations whose differences relate to that hierarchy.@en
135 INFO lowercase_definition STATO:0000100 IAO:0000115 standardized mean difference is data item computed by forming the difference between two means, divided by an estimate of the within-group standard deviation. It is used to provide an estimatation of the effect size between two treatments when the predictor (independent variable) is categorical and the response(dependent) variable is continuous@en
136 INFO lowercase_definition STATO:0000101 IAO:0000115 the relationship between a fraction and the number above the line@en
137 INFO lowercase_definition STATO:0000102 IAO:0000115 relationship between a planned process and the plan specification that it carries out; it is defined as equivalent to the composed relationship (realizes o concretizes)@en
138 INFO lowercase_definition STATO:0000103 IAO:0000115 the multinomial distribution is a probability distribution which gives the probability of any particular combination of numbers of successes for various categories defined in the context of n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability.@en
139 INFO lowercase_definition STATO:0000105 IAO:0000115 log signal intensity ratio is a data item which corresponding the logarithmitic base 2 of the ratio between 2 signal intensity, each corresponding to a condition.@en
140 INFO lowercase_definition STATO:0000106 IAO:0000115 probit regression model is a model which attempts to explain data distribution associated with *dichotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is the probit function aka the quantile function, i.e., the inverse cumulative distribution function (CDF), associated with the standard normal distribution.@en
141 INFO lowercase_definition STATO:0000107 IAO:0000115 a statistical model is an information content entity which is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more other variables. The model is statistical as the variables are not deterministically but stochastically related.@en
142 INFO lowercase_definition STATO:0000108 IAO:0000115 linear regression model is a model which attempts to explain data distribution associated with response/dependent variable in terms of values assumed by the independent variable uses a linear function or linear combination of the regression parameters and the predictor/independent variable(s). linear regression modeling makes a number of assumptions, which includes homoskedasticity (constance of variance)@en
143 INFO lowercase_definition STATO:0000109 IAO:0000115 multinomial logistic regression model is a model which attempts to explain data distribution associated with *polychotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is logistic function.@en
144 INFO lowercase_definition STATO:0000111 IAO:0000115 a sequence read is a DNA sequence data which is generated by a DNA sequencer@en
145 INFO lowercase_definition STATO:0000112 IAO:0000115 a Funnel plot is a scatter plot of treatment effect versus a measure of study size and aims to provide a visual aid to detecting bias or systematic heterogeneity. A symmetric inverted funnel shape arises from a ‘well-behaved’ data set, in which publication bias is unlikely. An asymmetric funnel indicates a relationship between treatment effect and study size. Known caveats: If high precision studies really are different from low precision studies with respect to effect size (e.g., due to different populations examined) a funnel plot may give a wrong impression of publication bias. The appearance of the funnel plot can change quite dramatically depending on the scale on the y-axis — whether it is the inverse square error or the trial size. Funnel plot was introduced by Light and Palmer in 1984.@en
146 INFO lowercase_definition STATO:0000113 IAO:0000115 variance is a data item about a random variable or probability distribution. it is equivalent to the square of the standard deviation. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value).The variance is the second moment of a distribution.@en
147 INFO lowercase_definition STATO:0000114 IAO:0000115 relationship between an element and a set it belongs to@en
148 INFO lowercase_definition STATO:0000115 IAO:0000115 relationship between a set and one of its elements@en
149 INFO lowercase_definition STATO:0000116 IAO:0000115 the process of using statistical analysis for interpreting and communicating \"what the data say\".@en
150 INFO lowercase_definition STATO:0000117 IAO:0000115 a discrete probability distribution is a probability distribution which is defined by a probability mass function where the random variable can only assume a finite number of values or infinitely countable values@en
151 INFO lowercase_definition STATO:0000118 IAO:0000115 ranking is a data transformation which turns a non-ordinal variable into a Ordinal variable by sorting the values of the input variable and replacing their value by their position in the sorting result@en
152 INFO lowercase_definition STATO:0000119 IAO:0000115 model parameter estimation is a data transformation that finds parameter values (the model parameter estimates) most compatible with the data as judged by the model.@en
153 INFO lowercase_definition STATO:0000120 IAO:0000115 beanplot is a plot in which (one or) multiple batches (\"beans\") are shown. Each bean consists of a density trace, which is mirrored to form a polygon shape. Next to that, a one-dimensional scatter plot shows all the individual measurements, like in a stripchart. The name beanplot stems from green beans. The density shape can be seen as the pod of a green bean, while the scatter plot shows the seeds inside the pod.@en
154 INFO lowercase_definition STATO:0000121 IAO:0000115 the objective of a data transformation to evaluate a null hypothesis of absence of linkage between variables.@en
155 INFO lowercase_definition STATO:0000122 IAO:0000115 a pedigree chart is a graph which plots parent child relations@en
156 INFO lowercase_definition STATO:0000123 IAO:0000115 r2 is a correlation coefficient which is computed over the frequency of 2 dichotomous variable and is used as a measure of Linkage Disequilibrium and as input data item to the creation of an LD plot@en
157 INFO lowercase_definition STATO:0000124 IAO:0000115 a stratification rule/criteria is a criteria used to determine population strata so that a stratification process implementing the rule can result in any member of the total population being assigned to one and only one stratum@en
158 INFO lowercase_definition STATO:0000126 IAO:0000115 volcano plot is a kind of scatter plot which graphs the negative log of the p-value (significance) on the y-axis versus log2 of fold-change between 2 conditions on the x-axis. It is a popular method for visualizing differential occurence of variables between 2 conditions.@en
159 INFO lowercase_definition STATO:0000127 IAO:0000115 a confidence interval which covers 99% of the sampling distribution, meaning that there is a 1% risk of false positive (type I error)@en
160 INFO lowercase_definition STATO:0000130 IAO:0000115 the Breslow-Day test is a statistical test which evaluates if the odds ratios are homogenous across N 2x2 contingency tables, for instance several 2x2 contingency tables associated with different strata of a stratified population when evaluating the relationship between exposure and outcome or associated with the different samples coming from several centres in a multicentric study in clinical trial context.@en
161 INFO lowercase_definition STATO:0000131 IAO:0000115 a sphericity test is a null hypothesis statistical testing procedure which posits a null hypothesis of equality of the variances of the differences between levels of the repeated measures factor@en
162 INFO lowercase_definition STATO:0000134 IAO:0000115 specificity is a measurement datum qualifying a binary classification test and is computed by substracting the false positive rate to the integral numeral 1@en
163 INFO lowercase_definition STATO:0000135 IAO:0000115 strictly standardized mean difference (SSMS) is a standardized mean difference which corresponds to the ratio of mean to the standard deviation of the difference between two groups. SSMD directly measures the magnitude of difference between two groups. SSMD is widely used in High Content Screen for hit selection and quality control. When the data is preprocessed using log-transformation as normally done in HTS experiments, SSMD is the mean of log fold change divided by the standard deviation of log fold change with respect to a negative reference. In other words, SSMD is the average fold change (on the log scale) penalized by the variability of fold change (on the log scale). For quality control, one index for the quality of an HTS assay is the magnitude of difference between a positive control and a negative reference in an assay plate. For hit selection, the size of effects of a compound (i.e., a small molecule or an siRNA) is represented by the magnitude of difference between the compound and a negative reference. SSMD directly measures the magnitude of difference between two groups. Therefore, SSMD can be used for both quality control and hit selection in HTS experiments.@en
164 INFO lowercase_definition STATO:0000137 IAO:0000115 an homoskedasticity test is a statistical test aiming at evaluate if the variances from several random samples are similar@en
165 INFO lowercase_definition STATO:0000138 IAO:0000115 a 2x2 contingency table is a contingency table build for 2 dichotomous variables (i.e. 2 categorical variables, each with only 2 possible outcomes). It is the simplest of contingency tables@en
166 INFO lowercase_definition STATO:0000139 IAO:0000115 a subject pairing is a planned process which executes a pairing rule and results in the creation of sets of 2 subjects meeting the pairing criteria@en
167 INFO lowercase_definition STATO:0000140 IAO:0000115 a contigency table is a data item which displays the (multivariate) frequency distribution of the possible values of categorical variables. The first row of the table corresponds to categories of one categorical variable, the first column of the table corresponds to categories of the other categorical variable, the cells corresponding to each combination of categories is filled with the observed occurences in the sample being considered. The table also contains marginal total (marginal sums) and grand total of the occurences The term contingency table was first used by Karl Pearson in \"On the Theory of Contingency and Its Relation to Association and Normal Correlation\", part of the Drapers' Company Research Memoirs Biometric Series I published in 1904.@en
168 INFO lowercase_definition STATO:0000141 IAO:0000115 acute toxicity study is an investigation which use interventions organized according to a factorial design and a parallel group design to observe the effect of use of high dose xenobiotics in animal models or cellular models@en
169 INFO lowercase_definition STATO:0000144 IAO:0000115 a model parameter estimate is a data item which results from a model parameter estimation process and which provides a numerical value about a model parameter.@en
170 INFO lowercase_definition STATO:0000145 IAO:0000115 the geometric distribution is a negative binomial distribution where r is 1. It is useful for modeling the runs of consecutive successes (or failures) in repeated independent trials of a system. The geometric distribution models the number of successes before one failure in an independent succession of tests where each test results in success or failure. The geometric distribution with prob = p has density p(x) = p (1-p)^x for x = 0, 1, 2, …, 0 < p ≤ 1. If an element of x is not integer, the result of dgeom is zero, with a warning. The quantile is defined as the smallest value x such that F(x) ≥ p, where F is the distribution function.@en
171 INFO lowercase_definition STATO:0000146 IAO:0000115 a null hypothesis stating that there are differences observed between group of subjects@en
172 INFO lowercase_definition STATO:0000149 IAO:0000115 binomial logistic regression model is a model which attempts to explain data distribution associated with *dichotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is logistic function.@en
173 INFO lowercase_definition STATO:0000150 IAO:0000115 a minimum value is a data item which denotes the smallest value found in a dataset or resulting from a calculation.@en
174 INFO lowercase_definition STATO:0000151 IAO:0000115 maximum value is a data item which denotes the largest value found in a dataset or resulting from a calculation.@en
175 INFO lowercase_definition STATO:0000152 IAO:0000115 a quartile is a quantile which splits data into sections accrued of 25% of data, so the first quartile delineates 25% of the data, the second quartile delineates 50% of the data and the third quartile, 75 % of the data@en
176 INFO lowercase_definition STATO:0000154 IAO:0000115 a violin plot is a plot combining the features of box plot and kernel density plot. The violin plot is therefore similar to box plot but it incorporated in the display the probability density of the data at different values. Typically violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots.@en
177 INFO lowercase_definition STATO:0000155 IAO:0000115 meta-analysis is a data transformation which uses the effect size estimates from several independent quantitative scientific studies addressing the same question in order to assess finding consistency.@en
178 INFO lowercase_definition STATO:0000156 IAO:0000115 the Scheffe test is a data transformation which evaluates all possible contrasts and adjusting the levels significance by accounting for multiple comparison. The test is therefore conservative. Confidence intervals can be constructed for the corresponding linear regression. It was developped by American statistician Henry Scheffe in 1959.@en
179 INFO lowercase_definition STATO:0000157 IAO:0000115 the LSD test is a statistical test for multiple comparisons of treatments by means of least significant difference following an ANOVA analysis
180 INFO lowercase_definition STATO:0000158 IAO:0000115 a null hypothesis which states that a linkage exists between 2 categorical variables@en
181 INFO lowercase_definition STATO:0000161 IAO:0000115 variable distribution is data item which denotes the spatial resolution of data point making up a variable. variable distribution may be compared to a known probability distribution using goodness of fit test or plotting a quantile-quantile plot for visual assessment of the fit.@en
182 INFO lowercase_definition STATO:0000162 IAO:0000115 the role played by an entity part of study group as defined by an experimental design and realized in a data analysis and data interpretation@en
183 INFO lowercase_definition STATO:0000163 IAO:0000115 trimmed mean or truncated mean is a measure of central tendency which involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both@en
184 INFO lowercase_definition STATO:0000165 IAO:0000115 a pie chart is a graph in which a circular graph is divided into sector illustrating numerical proportion, meaning that the arc length of each sector (and consequently its central angle and area), is proportional to the quantity it represents.@en
185 INFO lowercase_definition STATO:0000166 IAO:0000115 the bart chart is a graph resulting from plotting rectangular bars with lengths proportional to the values that they represent.
186 INFO lowercase_definition STATO:0000167 IAO:0000115 the first quartile is a quartile which splits the lower 25 % of the data@en
187 INFO lowercase_definition STATO:0000168 IAO:0000115 a real time quantitative pcr plot is a line graph which plots the signal fluorescence intensity as a function of the number of PCR cycle@en
188 INFO lowercase_definition STATO:0000170 IAO:0000115 the first quartile is a quartile which splits the 75 % of the data@en
189 INFO lowercase_definition STATO:0000172 IAO:0000115 expected fragments per kilobase of transcript per million fragments mapped is a metric used to report transcript expression event as generated by RNA-Seq using paired-end library. The calculated value results from 2 types of normalization, one to take into account the difference in reads counts associated with transcript length (at equal abundance, longer transcripts will have more reads than shorter transcripts) , (hence the 'per kilobase of transcript') and the other one to take into account different sequencing depth during distinct sequencing runs (hence the 'per millions mapped fragment'. The metric is specifically produced by cufflink software.@en
190 INFO lowercase_definition STATO:0000173 IAO:0000115 homogeneity testing objective is the objective of a data transformation to test a null hypothesis that two or more sub-groups of a population share the same distribution of a single categorical variable. For example, do people of different countries have the same proportion of smokers to non-smokers@en
191 INFO lowercase_definition STATO:0000175 IAO:0000115 confidence interval calculation is a data transformation which determines a confidence interval for a given statistical parameter@en
192 INFO lowercase_definition STATO:0000176 IAO:0000115 t-statistic is a statistic computed from observations and used to produce a p-value in statistical test when compared to a Student's t distribution.@en
193 INFO lowercase_definition STATO:0000177 IAO:0000115 the beta distribution is a continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution@en
194 INFO lowercase_definition STATO:0000180 IAO:0000115 standard normal distribution is a normal distribution with variance = 1 and mean=0@en
195 INFO lowercase_definition STATO:0000183 IAO:0000115 sphericity testing objective is a statistical objective of a data transformation which aims to test a null hypothesis of sphericity holds.@en
196 INFO lowercase_definition STATO:0000185 IAO:0000115 a 2 by n contingency table is a contingency table built for one dichotomous variable (a categorical variable with only 2 outcomes) and one polychotomous variable (a polychomotomous variable with at least 2 outcomes)@en
197 INFO lowercase_definition STATO:0000188 IAO:0000115 average log signal intensity is a data time which corresponds to the sum of 2 distinct logarithm base 2 transformed signal intensity, each corresponding to a distinct condition of signal acquisition, divided by 2.@en
198 INFO lowercase_definition STATO:0000191 IAO:0000115 a goodness of fit statistical test is a statistical test which aim to evaluate if a sample distribution can be considered equivalent to a theoretical distribution used as input@en
199 INFO lowercase_definition STATO:0000192 IAO:0000115 a cartesian product is a data transformation which operates on a n Sets to produce a set of all possible ordered n-tuples where each element of the tuple comes from a Set