Level | Number of errors |
---|---|

INFO | 381 |

WARN | 16 |

Rule | Number of errors |
---|---|

lowercase_definition | 381 |

equivalent_class_axiom_no_genus | 14 |

duplicate_exact_synonym | 2 |

Row |
Level |
Rule Name |
Subject |
Property |
Value |
---|---|---|---|---|---|

0 | WARN | duplicate_exact_synonym | STATO:0000636 | IAO:0000118 | NNS |

1 | WARN | duplicate_exact_synonym | STATO:0000637 | IAO:0000118 | NNS |

2 | WARN | equivalent_class_axiom_no_genus | STATO:0000027 | OBI:0000417 | STATO:0000121 |

3 | WARN | equivalent_class_axiom_no_genus | STATO:0000033 | OBI:0000312 | OBI:0200117 |

4 | WARN | equivalent_class_axiom_no_genus | STATO:0000085 | OBI:0000295 | STATO:0000175 |

5 | WARN | equivalent_class_axiom_no_genus | STATO:0000119 | OBI:0000299 | STATO:0000144 |

6 | WARN | equivalent_class_axiom_no_genus | STATO:0000131 | OBI:0000417 | STATO:0000183 |

7 | WARN | equivalent_class_axiom_no_genus | STATO:0000133 | BFO:0000062 | OBI:0200201 |

8 | WARN | equivalent_class_axiom_no_genus | STATO:0000137 | OBI:0000417 | STATO:0000226 |

9 | WARN | equivalent_class_axiom_no_genus | STATO:0000191 | OBI:0000417 | STATO:0000224 |

10 | WARN | equivalent_class_axiom_no_genus | STATO:0000202 | OBI:0000417 | STATO:0000253 |

11 | WARN | equivalent_class_axiom_no_genus | STATO:0000247 | OBI:0000417 | STATO:0000173 |

12 | WARN | equivalent_class_axiom_no_genus | STATO:0000279 | OBI:0000417 | STATO:0000255 |

13 | WARN | equivalent_class_axiom_no_genus | STATO:0000337 | OBI:0000299 | STATO:0000485 |

14 | WARN | equivalent_class_axiom_no_genus | STATO:0000443 | OBI:0000417 | STATO:0000439 |

15 | WARN | equivalent_class_axiom_no_genus | STATO:0000471 | STATO:0000403 | STATO:0000039 |

16 | INFO | lowercase_definition | STATO:0000001 | IAO:0000115 | property to indicate that a design declares a variable; the inverse property is 'is declared by'@en |

17 | INFO | lowercase_definition | STATO:0000002 | IAO:0000115 | an electronic file is an information content entity which conforms to a specification or format and which is meant to hold data and information in digital form, accessible to software agents@en |

18 | INFO | lowercase_definition | STATO:0000003 | IAO:0000115 | a balanced design is a an experimental design where all experimental group have the an equal number of subject observations@en |

19 | INFO | lowercase_definition | STATO:0000004 | IAO:0000115 | property to indicate the variables declared by a design; the inverse property is 'declares'@en |

20 | INFO | lowercase_definition | STATO:0000005 | IAO:0000115 | a single factor design is a study design which declares exactly 1 independent variable@en |

21 | INFO | lowercase_definition | STATO:0000006 | IAO:0000115 | x-axis is a cartesian coordinate axis which is orthogonal to the y-axis and the z-axis@en |

22 | INFO | lowercase_definition | STATO:0000007 | IAO:0000115 | an axis is a line graph used as reference line for the measurement of coordinates.@en |

23 | INFO | lowercase_definition | STATO:0000008 | IAO:0000115 | y-axis is a cartesian coordinate axis which is orthogonal to the x-axis and the z-axis@en |

24 | INFO | lowercase_definition | STATO:0000011 | IAO:0000115 | a cartesian axis is one of 3 the axis in a cartesian coordinate system defining a referential in 3 dimensions. each of the axis is orthogonal to the other 2@en |

25 | INFO | lowercase_definition | STATO:0000012 | IAO:0000115 | z-axis is a cartesian coordinate axis which is orthogonal to the x-axis and the y-axis@en |

26 | INFO | lowercase_definition | STATO:0000013 | IAO:0000115 | a 2 dimensional cartesian coordinate system is a cartesian coordinate system which defines 2 orthogonal one dimensional axes and which may be used to describe a 2 dimensional spatial region. |

27 | INFO | lowercase_definition | STATO:0000019 | IAO:0000115 | normal distribution hypothesis is a goodness of fit hypothesis stating that the distribution computed from the sample population fits a normal distribution.@en |

28 | INFO | lowercase_definition | STATO:0000021 | IAO:0000115 | a confidence interval which covers 90% of the sampling distribution, meaning that there is a 90% risk of false positive (type I error)@en |

29 | INFO | lowercase_definition | STATO:0000024 | IAO:0000115 | a three dimensional cartesian coordinate system is a cartesian coordinate system which defines 3 orthogonal one dimensional axes and which may be used to describe a 3 dimensional spatial region. |

30 | INFO | lowercase_definition | STATO:0000027 | IAO:0000115 | linkage between 2 categorical variable test is a statistical test which evaluates if there is an association between a predictor variable assuming discrete values and a response variable also assuming discrete values@en |

31 | INFO | lowercase_definition | STATO:0000028 | IAO:0000115 | measure of variation or statistical dispersion is a data item which describes how much a theoritical distribution or dataset is spread.@en |

32 | INFO | lowercase_definition | STATO:0000029 | IAO:0000115 | a measure of central tendency is a data item which attempts to describe a set of data by identifying the value of its centre.@en |

33 | INFO | lowercase_definition | STATO:0000031 | IAO:0000115 | binary classification (or binomial classification) is a data transformation which aims to cast members of a set into 2 disjoint groups depending on whether the element have a given property/feature or not.@en |

34 | INFO | lowercase_definition | STATO:0000032 | IAO:0000115 | an alternative term used for STATO statistical ontology and ISA team@en |

35 | INFO | lowercase_definition | STATO:0000034 | IAO:0000115 | a model parameter is a data item which is part of a model and which is meant to characterize an theoritecal or unknown population. a model parameter may be estimated by considering the properties of samples presumably taken from the theoritecal population@en |

36 | INFO | lowercase_definition | STATO:0000035 | IAO:0000115 | the range is a measure of variation which describes the difference between the lowest score and the highest score in a set of numbers (a data set) |

37 | INFO | lowercase_definition | STATO:0000038 | IAO:0000115 | a set of 2 subjects which result from a pairing process which assigns subject to a set based on a pairing rule/criteria@en |

38 | INFO | lowercase_definition | STATO:0000039 | IAO:0000115 | a statistic is a measurement datum to describe a dataset or a variable. It is generated by a calculation on set of observed data.@en |

39 | INFO | lowercase_definition | STATO:0000040 | IAO:0000115 | an MA plot is a scatter plot of the log intensity ratios M = log_2(T/R) versus the average log intensities A = log_2(T*T)/2, where T and R represent the signal intensities in the test and reference channels respectively.@en |

40 | INFO | lowercase_definition | STATO:0000041 | IAO:0000115 | a R command syntax or link to a R documentation in support of Statistical Ontology Classes or Data Transformations@en |

41 | INFO | lowercase_definition | STATO:0000043 | IAO:0000115 | a false positive rate whose value is 5 per cent@en |

42 | INFO | lowercase_definition | STATO:0000044 | IAO:0000115 | one-way anova is an analysis of variance where the different groups being compared are associated with the factor levels of only one independent variable. The null hypothesis is an absence of difference between the means calculated for each of the groups. The test assumes normality and equivariance of the data.@en |

43 | INFO | lowercase_definition | STATO:0000045 | IAO:0000115 | two-way anova is an analysis of variance where the different groups being compared are associated the factor levels of exatly 2 independent variables. The null hypothesis is an absence of difference between the means calculated for each of the groups. The test assumes normality and equivariance of the data.@en |

44 | INFO | lowercase_definition | STATO:0000046 | IAO:0000115 | a block design is a kind of study design which declares a blocking variable (also known as nuisance variable) in order to account for a known source of variation and reduce its impact on the acquisition of the signal@en |

45 | INFO | lowercase_definition | STATO:0000047 | IAO:0000115 | a count is a data item denoted by an integer and representing the number of instances or occurences of an entity@en |

46 | INFO | lowercase_definition | STATO:0000050 | IAO:0000115 | signal to noise ratio is a measurement datum comparing the amount of meaningful, useful or interesting data (the signal) to the amount of irrelevant or false data (the noise). Depending on the field and domain of application, different variables will be used to determinate a 'signal to noise ratio'. In statistics, the definition of signal to noise ratio is the ratio of the mean of a measurement to its standard deviation. It thus corresponds to the inverse of the coefficient of variation@en |

47 | INFO | lowercase_definition | STATO:0000053 | IAO:0000115 | a false positive rate is a data item which accounts for the proportion of incorrect rejection of a true null hypothesis.@en |

48 | INFO | lowercase_definition | STATO:0000054 | IAO:0000115 | homoskedasticity states that all variances under consideration are homogenous.@en |

49 | INFO | lowercase_definition | STATO:0000055 | IAO:0000115 | chromosome coordinate system is a genomic coordinate which uses chromosome of a particular assembly build process to define start and end positions. This coordinate system is unstable and will change with each new genome sequence assembly build.@en |

50 | INFO | lowercase_definition | STATO:0000056 | IAO:0000115 | a null hypothesis which states that no linkage exists between 2 categorical variables@en |

51 | INFO | lowercase_definition | STATO:0000058 | IAO:0000115 | goodness of fit hypothesis is a null hypothesis stating that the distribution computed from the sample population fits a theoretical distribution or that a dataset can be correctly explained by a model@en |

52 | INFO | lowercase_definition | STATO:0000059 | IAO:0000115 | the Student's t distribution is a continuous probability distribution which arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.@en |

53 | INFO | lowercase_definition | STATO:0000060 | IAO:0000115 | hypergeometric distribution is a probability distribution that describes the probability of k successes in n draws from a finite population of size N containing K successes without replacement@en |

54 | INFO | lowercase_definition | STATO:0000062 | IAO:0000115 | is a null hypothesis stating that there are no difference observed across a series of measurements made one same subject.@en |

55 | INFO | lowercase_definition | STATO:0000063 | IAO:0000115 | genomic coordinate datum is a data item which denotes a genomic position expressed using a genomic coordinate system@en |

56 | INFO | lowercase_definition | STATO:0000064 | IAO:0000115 | sequence read count is a data item determining how many sequence reads have been generated by a DNA sequencing assay for a given stretch of DNA |

57 | INFO | lowercase_definition | STATO:0000067 | IAO:0000115 | a continuous probability distribution is a probability distribution which is defined by a probability density function@en |

58 | INFO | lowercase_definition | STATO:0000071 | IAO:0000115 | reaction rate is a measurement datum which represents the speed of a chemical reaction turning reactive species into product species of event (i.e the number of such conversions)s occuring over a time interval@en |

59 | INFO | lowercase_definition | STATO:0000072 | IAO:0000115 | substrate concentration is a scalar measurement datum which denotes the amount of molecular entity involved in an enzymatic reaction (or catalytic chemical reaction) and whose role in that reaction is as substrate.@en |

60 | INFO | lowercase_definition | STATO:0000075 | IAO:0000115 | a rarefaction curve is a graph used for estimating species richness in ecology studies@en |

61 | INFO | lowercase_definition | STATO:0000080 | IAO:0000115 | the Brown Forsythe test is a statistical test which evaluates if the variance of different groups are equal. It relies on computing the median rather than the mean, as used in the Levene's test for homoschedacity. This test maybe used to, for instance, ensure that the conditions of applications of ANOVA are met.@en |

62 | INFO | lowercase_definition | STATO:0000082 | IAO:0000115 | a fixed effect model is a statistical model which represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random.@en |

63 | INFO | lowercase_definition | STATO:0000084 | IAO:0000115 | multinomial logistic regression model is a model which attempts to explain data distribution associated with *polychotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is probit function.@en |

64 | INFO | lowercase_definition | STATO:0000085 | IAO:0000115 | effect size estimate is a data item about the direction and strength of the consequences of a causative agent as explored by statistical methods. Those methods produce estimates of the effect size, e.g. confidence interval@en |

65 | INFO | lowercase_definition | STATO:0000086 | IAO:0000115 | an F-test is a statistical test which evaluates that the computed test statistics follows an F-distribution under the null hypothesis. The F-test is sensitive to departure from normality. F-test arise when decomposing the variability in a data set in terms of sum of squares.@en |

66 | INFO | lowercase_definition | STATO:0000087 | IAO:0000115 | a polychotomous variable is a categorical variable which is defined to have minimally 2 categories or possible values@en |

67 | INFO | lowercase_definition | STATO:0000088 | IAO:0000115 | statistical sample size is a count evaluating the number of individual experimental units@en |

68 | INFO | lowercase_definition | STATO:0000089 | IAO:0000115 | a case-control study design is a observation study design which assess the risk of particular outcome (a trait or a disease) associated with an event (either an exposure or endogenous factor). A case-control study design therefore declares an exposure variable which is dichotomous in nature (exposed/non-exposed) and an outcome variable, which is also dichotomous (case or control), thus giving the name to the design. During the execution of the design, a case control study defines a population and counts the events to determine their frequency.@en |

69 | INFO | lowercase_definition | STATO:0000090 | IAO:0000115 | a dichotomous variable is a categorical variable which is defined to have only 2 categories or possible values@en |

70 | INFO | lowercase_definition | STATO:0000095 | IAO:0000115 | paired t-test is a statistical test which is specifically designed to analysis differences between paired observations in the case of studies realizing repeated measures design with only 2 repeated measurements per subject (before and after treatment for example)@en |

71 | INFO | lowercase_definition | STATO:0000096 | IAO:0000115 | stratification is a planned process which executes a stratification rule using as input a population and assign it member to mutually exclusive subpopulation based on the values defined by the stratification rule@en |

72 | INFO | lowercase_definition | STATO:0000099 | IAO:0000115 | a random effect(s) model, also called a variance components model, is a kind of hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of different populations whose differences relate to that hierarchy.@en |

73 | INFO | lowercase_definition | STATO:0000100 | IAO:0000115 | standardized mean difference is statistic computed by forming the difference between two means, divided by an estimate of the within-group standard deviation. It is used to provide an estimation of the effect size between two treatments when the predictor (independent variable) is categorical and the response(dependent) variable is continuous. A standardized mean difference is a statistic that is a difference between two means, divided by a statistical measure of dispersion. The term Standardized Mean Difference is a description of the concept without an explicit type of statistical measure of dispersion. If the statistical measure of dispersion is specified, then a type (child term) of Standardized Mean Difference is preferred.@en |

74 | INFO | lowercase_definition | STATO:0000101 | IAO:0000115 | the relationship between a fraction and the number above the line@en |

75 | INFO | lowercase_definition | STATO:0000102 | IAO:0000115 | relationship between a planned process and the plan specification that it carries out; it is defined as equivalent to the composed relationship (realizes o concretizes)@en |

76 | INFO | lowercase_definition | STATO:0000103 | IAO:0000115 | the multinomial distribution is a probability distribution which gives the probability of any particular combination of numbers of successes for various categories defined in the context of n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability.@en |

77 | INFO | lowercase_definition | STATO:0000105 | IAO:0000115 | log signal intensity ratio is a data item which corresponding the logarithmitic base 2 of the ratio between 2 signal intensity, each corresponding to a condition.@en |

78 | INFO | lowercase_definition | STATO:0000106 | IAO:0000115 | probit regression model is a model which attempts to explain data distribution associated with *dichotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is the probit function aka the quantile function, i.e., the inverse cumulative distribution function (CDF), associated with the standard normal distribution.@en |

79 | INFO | lowercase_definition | STATO:0000107 | IAO:0000115 | a statistical model is an information content entity which is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more other variables. The model is statistical as the variables are not deterministically but stochastically related.@en |

80 | INFO | lowercase_definition | STATO:0000108 | IAO:0000115 | linear regression model is a model which attempts to explain data distribution associated with response/dependent variable in terms of values assumed by the independent variable uses a linear function or linear combination of the regression parameters and the predictor/independent variable(s). linear regression modeling makes a number of assumptions, which includes homoskedasticity (constance of variance)@en |

81 | INFO | lowercase_definition | STATO:0000109 | IAO:0000115 | multinomial logistic regression model is a model which attempts to explain data distribution associated with *polychotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is logistic function.@en |

82 | INFO | lowercase_definition | STATO:0000111 | IAO:0000115 | a sequence read is a DNA sequence data which is generated by a DNA sequencer@en |

83 | INFO | lowercase_definition | STATO:0000112 | IAO:0000115 | a Funnel plot is a scatter plot of treatment effect versus a measure of study size and aims to provide a visual aid to detecting bias or systematic heterogeneity. A symmetric inverted funnel shape arises from a ‘well-behaved’ data set, in which publication bias is unlikely. An asymmetric funnel indicates a relationship between treatment effect and study size. Known caveats: If high precision studies really are different from low precision studies with respect to effect size (e.g., due to different populations examined) a funnel plot may give a wrong impression of publication bias. The appearance of the funnel plot can change quite dramatically depending on the scale on the y-axis — whether it is the inverse square error or the trial size. Funnel plot was introduced by Light and Palmer in 1984.@en |

84 | INFO | lowercase_definition | STATO:0000113 | IAO:0000115 | variance is a data item about a random variable or probability distribution. it is equivalent to the square of the standard deviation. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value).The variance is the second moment of a distribution.@en |

85 | INFO | lowercase_definition | STATO:0000114 | IAO:0000115 | relationship between an element and a set it belongs to@en |

86 | INFO | lowercase_definition | STATO:0000115 | IAO:0000115 | relationship between a set and one of its elements@en |

87 | INFO | lowercase_definition | STATO:0000116 | IAO:0000115 | the process of using statistical analysis for interpreting and communicating \"what the data say\".@en |

88 | INFO | lowercase_definition | STATO:0000117 | IAO:0000115 | a discrete probability distribution is a probability distribution which is defined by a probability mass function where the random variable can only assume a finite number of values or infinitely countable values@en |

89 | INFO | lowercase_definition | STATO:0000118 | IAO:0000115 | ranking is a data transformation which turns a non-ordinal variable into a Ordinal variable by sorting the values of the input variable and replacing their value by their position in the sorting result@en |

90 | INFO | lowercase_definition | STATO:0000119 | IAO:0000115 | model parameter estimation is a data transformation that finds parameter values (the model parameter estimates) most compatible with the data as judged by the model.@en |

91 | INFO | lowercase_definition | STATO:0000120 | IAO:0000115 | beanplot is a plot in which (one or) multiple batches (\"beans\") are shown. Each bean consists of a density trace, which is mirrored to form a polygon shape. Next to that, a one-dimensional scatter plot shows all the individual measurements, like in a stripchart. The name beanplot stems from green beans. The density shape can be seen as the pod of a green bean, while the scatter plot shows the seeds inside the pod.@en |

92 | INFO | lowercase_definition | STATO:0000121 | IAO:0000115 | the objective of a data transformation to evaluate a null hypothesis of absence of linkage between variables.@en |

93 | INFO | lowercase_definition | STATO:0000122 | IAO:0000115 | a pedigree chart is a graph which plots parent child relations@en |

94 | INFO | lowercase_definition | STATO:0000123 | IAO:0000115 | r2 is a correlation coefficient which is computed over the frequency of 2 dichotomous variable and is used as a measure of Linkage Disequilibrium and as input data item to the creation of an LD plot@en |

95 | INFO | lowercase_definition | STATO:0000124 | IAO:0000115 | a stratification rule/criteria is a criteria used to determine population strata so that a stratification process implementing the rule can result in any member of the total population being assigned to one and only one stratum@en |

96 | INFO | lowercase_definition | STATO:0000126 | IAO:0000115 | volcano plot is a kind of scatter plot which graphs the negative log of the p-value (significance) on the y-axis versus log2 of fold-change between 2 conditions on the x-axis. It is a popular method for visualizing differential occurence of variables between 2 conditions.@en |

97 | INFO | lowercase_definition | STATO:0000127 | IAO:0000115 | a confidence interval which covers 99% of the sampling distribution, meaning that there is a 1% risk of false positive (type I error)@en |

98 | INFO | lowercase_definition | STATO:0000130 | IAO:0000115 | the Breslow-Day test is a statistical test which evaluates if the odds ratios are homogenous across N 2x2 contingency tables, for instance several 2x2 contingency tables associated with different strata of a stratified population when evaluating the relationship between exposure and outcome or associated with the different samples coming from several centres in a multicentric study in clinical trial context.@en |

99 | INFO | lowercase_definition | STATO:0000131 | IAO:0000115 | a sphericity test is a null hypothesis statistical testing procedure which posits a null hypothesis of equality of the variances of the differences between levels of the repeated measures factor@en |

100 | INFO | lowercase_definition | STATO:0000134 | IAO:0000115 | specificity is a measurement datum qualifying a binary classification test and is computed by substracting the false positive rate to the integral numeral 1@en |

101 | INFO | lowercase_definition | STATO:0000135 | IAO:0000115 | strictly standardized mean difference (SSMS) is a standardized mean difference which corresponds to the ratio of mean to the standard deviation of the difference between two groups. SSMD directly measures the magnitude of difference between two groups. SSMD is widely used in High Content Screen for hit selection and quality control. When the data is preprocessed using log-transformation as normally done in HTS experiments, SSMD is the mean of log fold change divided by the standard deviation of log fold change with respect to a negative reference. In other words, SSMD is the average fold change (on the log scale) penalized by the variability of fold change (on the log scale). For quality control, one index for the quality of an HTS assay is the magnitude of difference between a positive control and a negative reference in an assay plate. For hit selection, the size of effects of a compound (i.e., a small molecule or an siRNA) is represented by the magnitude of difference between the compound and a negative reference. SSMD directly measures the magnitude of difference between two groups. Therefore, SSMD can be used for both quality control and hit selection in HTS experiments.@en |

102 | INFO | lowercase_definition | STATO:0000137 | IAO:0000115 | an homoskedasticity test is a statistical test aiming at evaluate if the variances from several random samples are similar@en |

103 | INFO | lowercase_definition | STATO:0000138 | IAO:0000115 | a 2x2 contingency table is a contingency table build for 2 dichotomous variables (i.e. 2 categorical variables, each with only 2 possible outcomes). It is the simplest of contingency tables@en |

104 | INFO | lowercase_definition | STATO:0000139 | IAO:0000115 | a subject pairing is a planned process which executes a pairing rule and results in the creation of sets of 2 subjects meeting the pairing criteria@en |

105 | INFO | lowercase_definition | STATO:0000140 | IAO:0000115 | a contigency table is a data item which displays the (multivariate) frequency distribution of the possible values of categorical variables. The first row of the table corresponds to categories of one categorical variable, the first column of the table corresponds to categories of the other categorical variable, the cells corresponding to each combination of categories is filled with the observed occurences in the sample being considered. The table also contains marginal total (marginal sums) and grand total of the occurences The term contingency table was first used by Karl Pearson in \"On the Theory of Contingency and Its Relation to Association and Normal Correlation\", part of the Drapers' Company Research Memoirs Biometric Series I published in 1904.@en |

106 | INFO | lowercase_definition | STATO:0000141 | IAO:0000115 | acute toxicity study is an investigation which use interventions organized according to a factorial design and a parallel group design to observe the effect of use of high dose xenobiotics in animal models or cellular models@en |

107 | INFO | lowercase_definition | STATO:0000144 | IAO:0000115 | a model parameter estimate is a data item which results from a model parameter estimation process and which provides a numerical value about a model parameter.@en |

108 | INFO | lowercase_definition | STATO:0000145 | IAO:0000115 | the geometric distribution is a negative binomial distribution where r is 1. It is useful for modeling the runs of consecutive successes (or failures) in repeated independent trials of a system. The geometric distribution models the number of successes before one failure in an independent succession of tests where each test results in success or failure. The geometric distribution with prob = p has density p(x) = p (1-p)^x for x = 0, 1, 2, …, 0 < p ≤ 1. If an element of x is not integer, the result of dgeom is zero, with a warning. The quantile is defined as the smallest value x such that F(x) ≥ p, where F is the distribution function.@en |

109 | INFO | lowercase_definition | STATO:0000146 | IAO:0000115 | a null hypothesis stating that there are differences observed between group of subjects@en |

110 | INFO | lowercase_definition | STATO:0000149 | IAO:0000115 | binomial logistic regression model is a model which attempts to explain data distribution associated with *dichotomous* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is logistic function.@en |

111 | INFO | lowercase_definition | STATO:0000150 | IAO:0000115 | a minimum value is a data item which denotes the smallest value found in a dataset or resulting from a calculation.@en |

112 | INFO | lowercase_definition | STATO:0000151 | IAO:0000115 | maximum value is a data item which denotes the largest value found in a dataset or resulting from a calculation.@en |

113 | INFO | lowercase_definition | STATO:0000152 | IAO:0000115 | a quartile is a quantile which splits data into sections accrued of 25% of data, so the first quartile delineates 25% of the data, the second quartile delineates 50% of the data and the third quartile, 75 % of the data@en |

114 | INFO | lowercase_definition | STATO:0000154 | IAO:0000115 | a violin plot is a plot combining the features of box plot and kernel density plot. The violin plot is therefore similar to box plot but it incorporated in the display the probability density of the data at different values. Typically violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots.@en |

115 | INFO | lowercase_definition | STATO:0000155 | IAO:0000115 | meta-analysis is a data transformation which uses the effect size estimates from several independent quantitative scientific studies addressing the same question in order to assess finding consistency.@en |

116 | INFO | lowercase_definition | STATO:0000156 | IAO:0000115 | the Scheffe test is a data transformation which evaluates all possible contrasts and adjusting the levels significance by accounting for multiple comparison. The test is therefore conservative. Confidence intervals can be constructed for the corresponding linear regression. It was developped by American statistician Henry Scheffe in 1959.@en |

117 | INFO | lowercase_definition | STATO:0000157 | IAO:0000115 | the LSD test is a statistical test for multiple comparisons of treatments by means of least significant difference following an ANOVA analysis |

118 | INFO | lowercase_definition | STATO:0000158 | IAO:0000115 | a null hypothesis which states that a linkage exists between 2 categorical variables@en |

119 | INFO | lowercase_definition | STATO:0000161 | IAO:0000115 | variable distribution is data item which denotes the spatial resolution of data point making up a variable. variable distribution may be compared to a known probability distribution using goodness of fit test or plotting a quantile-quantile plot for visual assessment of the fit.@en |

120 | INFO | lowercase_definition | STATO:0000162 | IAO:0000115 | the role played by an entity part of study group as defined by an experimental design and realized in a data analysis and data interpretation@en |

121 | INFO | lowercase_definition | STATO:0000163 | IAO:0000115 | trimmed mean or truncated mean is a measure of central tendency which involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both@en |

122 | INFO | lowercase_definition | STATO:0000165 | IAO:0000115 | a pie chart is a graph in which a circular graph is divided into sector illustrating numerical proportion, meaning that the arc length of each sector (and consequently its central angle and area), is proportional to the quantity it represents.@en |

123 | INFO | lowercase_definition | STATO:0000166 | IAO:0000115 | the bart chart is a graph resulting from plotting rectangular bars with lengths proportional to the values that they represent. |

124 | INFO | lowercase_definition | STATO:0000167 | IAO:0000115 | the first quartile is a quartile which splits the lower 25 % of the data@en |

125 | INFO | lowercase_definition | STATO:0000168 | IAO:0000115 | a real time quantitative pcr plot is a line graph which plots the signal fluorescence intensity as a function of the number of PCR cycle@en |

126 | INFO | lowercase_definition | STATO:0000170 | IAO:0000115 | the first quartile is a quartile which splits the 75 % of the data@en |

127 | INFO | lowercase_definition | STATO:0000173 | IAO:0000115 | homogeneity testing objective is the objective of a data transformation to test a null hypothesis that two or more sub-groups of a population share the same distribution of a single categorical variable. For example, do people of different countries have the same proportion of smokers to non-smokers@en |

128 | INFO | lowercase_definition | STATO:0000175 | IAO:0000115 | confidence interval calculation is a data transformation which determines a confidence interval for a given statistical parameter@en |

129 | INFO | lowercase_definition | STATO:0000176 | IAO:0000115 | t-statistic is a statistic computed from observations and used to produce a p-value in statistical test when compared to a Student's t distribution.@en |

130 | INFO | lowercase_definition | STATO:0000177 | IAO:0000115 | the beta distribution is a continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution@en |

131 | INFO | lowercase_definition | STATO:0000180 | IAO:0000115 | standard normal distribution is a normal distribution with variance = 1 and mean=0@en |

132 | INFO | lowercase_definition | STATO:0000183 | IAO:0000115 | sphericity testing objective is a statistical objective of a data transformation which aims to test a null hypothesis of sphericity holds.@en |

133 | INFO | lowercase_definition | STATO:0000185 | IAO:0000115 | a 2 by n contingency table is a contingency table built for one dichotomous variable (a categorical variable with only 2 outcomes) and one polychotomous variable (a polychomotomous variable with at least 2 outcomes)@en |

134 | INFO | lowercase_definition | STATO:0000188 | IAO:0000115 | average log signal intensity is a data time which corresponds to the sum of 2 distinct logarithm base 2 transformed signal intensity, each corresponding to a distinct condition of signal acquisition, divided by 2.@en |

135 | INFO | lowercase_definition | STATO:0000191 | IAO:0000115 | a goodness of fit statistical test is a statistical test which aim to evaluate if a sample distribution can be considered equivalent to a theoretical distribution used as input@en |

136 | INFO | lowercase_definition | STATO:0000192 | IAO:0000115 | a cartesian product is a data transformation which operates on a n Sets to produce a set of all possible ordered n-tuples where each element of the tuple comes from a Set |

137 | INFO | lowercase_definition | STATO:0000193 | IAO:0000115 | is a population whose individual members realize (may be expressed as) a combination of inclusion rule values specifications or resulting from a sampling process (e.g. recruitment followed by randomization to group) on which a number of measurements will be carried out, which may be used as input to statistical tests and statistical inference. |

138 | INFO | lowercase_definition | STATO:0000194 | IAO:0000115 | self explanatory@en |

139 | INFO | lowercase_definition | STATO:0000197 | IAO:0000115 | a genomic coordinate system is a coordinate system to describe position of sequence on a genomic scaffold (assembly of chromosome, contig....)@en |

140 | INFO | lowercase_definition | STATO:0000198 | IAO:0000115 | a statistical test which makes no assumption about the underlying data distribution@en |

141 | INFO | lowercase_definition | STATO:0000199 | IAO:0000115 | the Mauchly's test for sphericity is a statistical test which evaluates if the variance of the differences between all combinations of the groups are equal, a property known as 'sphericity' in the context of repeated measures. It is used for instance prior to repeated measure ANOVA. The test works by assessing if a Wishart-distributed covariance matrix (or transformation thereof) is proportional to a given matrix.@en |

142 | INFO | lowercase_definition | STATO:0000200 | IAO:0000115 | the statistical test power is data item which is about a statistical test and is obtained by subtracting the false negative rate (type II error rate) to 1. The power of a statistical test is the probability that it will correctly lead to the rejection of a false null hypothesis (Greene 2000). The statistical power is the ability of a test to detect an effect, if the effect actually exists (High 2000).@en |

143 | INFO | lowercase_definition | STATO:0000202 | IAO:0000115 | within subject comparison statistical test is a kind of statistical test which evaluates if a change occurs within one experimental unit over time following a treatment or an event@en |

144 | INFO | lowercase_definition | STATO:0000203 | IAO:0000115 | a cohort is a study group population where the members are human beings which meet inclusion criteria and undergo a longitudinal design@en |

145 | INFO | lowercase_definition | STATO:0000204 | IAO:0000115 | the F-distribution is a continuous probability distribution which arises in the testing of whether two observed samples have the same variance.@en |

146 | INFO | lowercase_definition | STATO:0000207 | IAO:0000115 | a planned process which etablishes and states the different hypothesis to be evaluated during a null hypothesis statistical test@en |

147 | INFO | lowercase_definition | STATO:0000209 | IAO:0000115 | area under curve is a measurement datum which corresponds to the surface define by the x-axis and bound by the line graph represented in a 2 dimensional plot resulting from an integration or integrative calculus. The interpretation of this measurement datum depends on the variables plotted in the graph@en |

148 | INFO | lowercase_definition | STATO:0000210 | IAO:0000115 | is a data item formed by dividing the fluorescence intensity obtained in one channel to that obtained in the other channel, typically the case when considering 2-color microarray data when imaging is done for Cy3 and Cy5 dyes.@en |

149 | INFO | lowercase_definition | STATO:0000211 | IAO:0000115 | odds ratio homogeneity hypothesis is a null hypothesis stating that all odds ratio are homogenous, that is remain within the same range.@en |

150 | INFO | lowercase_definition | STATO:0000212 | IAO:0000115 | a tetrachoric correlation coefficient is a polychoric correlation coefficient for 2 dichotomous variables used as proxy for correlation between 2 continuous latent variables.@en |

151 | INFO | lowercase_definition | STATO:0000213 | IAO:0000115 | discretization as a processing converting a continuous variable into a polychotomous variable by concretizing a set of discretization rules@en |

152 | INFO | lowercase_definition | STATO:0000214 | IAO:0000115 | a confidence interval which covers 50% of the sampling distribution, meaning that there is a 50% risk of false positive (type I error)@en |

153 | INFO | lowercase_definition | STATO:0000215 | IAO:0000115 | probit regression model is a model which attempts to explain data distribution associated with *ordinal* response/dependent variable in terms of values assumed by the independent variable uses a function of predictor/independent variable(s): the function used in this instance of regression modeling is the ordered probit function.@en |

154 | INFO | lowercase_definition | STATO:0000216 | IAO:0000115 | a stratum population is a population resulting from a population stratification prior to sampling process which aims to produce homogenous subpopulations from an heterogeneous population by applying one or more stratification criteria@en |

155 | INFO | lowercase_definition | STATO:0000217 | IAO:0000115 | a null hypothesis which states that a given matrix is proportional to a Wishart-distributed covariance matrix@en |

156 | INFO | lowercase_definition | STATO:0000219 | IAO:0000115 | a real time pcr standard curve is a line graph which plots the fluorescence intensity signal as a function of the concentration of a sample used as reference and used to determine relative abundance of test samples@en |

157 | INFO | lowercase_definition | STATO:0000220 | IAO:0000115 | the false negative rate is a data item which denotes the proportion of missed detection of elements known to be meeting the detection criteria@en |

158 | INFO | lowercase_definition | STATO:0000221 | IAO:0000115 | a random variable (or aleatory variable or stochastic variable) in probability and statistics, is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense)@en |

159 | INFO | lowercase_definition | STATO:0000222 | IAO:0000115 | graeco-latin square design is_a study design which allows in its simpler form controlling 3 levels of nuisance variables (also known as blocking variables). The 3 nuisance factors are divided into a tabular grid with the property that each row and each column receive each treatment exactly once.@en |

160 | INFO | lowercase_definition | STATO:0000223 | IAO:0000115 | group assignment based on blocking variable specification is a kind of group assignment process which takes into account the levels assumed by a blocking variable to allocate subjects or experimental units to a treatment group@en |

161 | INFO | lowercase_definition | STATO:0000227 | IAO:0000115 | a normal distribution is a continuous probability distribution described by a probability distribution function described here: http://mathworld.wolfram.com/NormalDistribution.html@en |

162 | INFO | lowercase_definition | STATO:0000228 | IAO:0000115 | ordinal variable is a categorical variable where the discrete possible values are ordered or correspond to an implicit ranking@en |

163 | INFO | lowercase_definition | STATO:0000230 | IAO:0000115 | the expected value (or expectation, mathematical expectation, EV, mean, or the first moment) of a random variable is a data item which corresponds to the weighted average of all possible values that this random variable can take on. The weights used in computing this average correspond to the probabilities in case of a discrete random variable, or densities in case of a continuous random variable. From a rigorous theoretical standpoint, the expected value is the integral of the random variable with respect to its probability measure.@en |

164 | INFO | lowercase_definition | STATO:0000231 | IAO:0000115 | a confidence interval which covers 95% of the sampling distribution, meaning that there is a 5% risk of false positive (type I error). If the number of observations made is large enough, the sampling distribution can be assumed to be normal, which entails that 95% of the sampling distributions falls within roughly2 (1.96) standard deviations from the mean.@en |

165 | INFO | lowercase_definition | STATO:0000232 | IAO:0000115 | number of PCR cycle is a count which enumerates how many iterations of 'annealing, renaturation, amplification,' rounds (or cycles) are performed during a polymerase chain reaction (PCR) or an assay relying on PCR.@en |

166 | INFO | lowercase_definition | STATO:0000233 | IAO:0000115 | sensitivity is a measurement datum qualifying a binary classification test and is computed by substracting the false negative rate to the integral numeral 1@en |

167 | INFO | lowercase_definition | STATO:0000234 | IAO:0000115 | a residual is a data item which is the output of an error estimate or model fitting process and which is an observable estimate of the unobservable error@en |

168 | INFO | lowercase_definition | STATO:0000236 | IAO:0000115 | the coefficient of variation is a normalized measure of dispersion of a probability distribution of frequency distribution.@en |

169 | INFO | lowercase_definition | STATO:0000238 | IAO:0000115 | high content screening is a kind of investigation which uses a standardized cellular assays to test the effect of substances (RNAi or small molecules) held in libraries on a cellular phenotype. it relies on microscopy imaging and or flow-cytometry, robotic handling to ensure fast and high-throughput.@en |

170 | INFO | lowercase_definition | STATO:0000239 | IAO:0000115 | high throughput screening is a kind of investigation which uses a standardized assays (cell based, enzymatic or chemometric) to test the effect of substances (RNAi or small molecules) held in libraries on a very specific and measureable outcome (e.g fluorence intensity). it relies on robotic handling to ensure fast and high-throughput in assay performance, data acquisition and hit selection.@en |

171 | INFO | lowercase_definition | STATO:0000242 | IAO:0000115 | statistical error is an data item denoting the amount by which an observation differs from the expected value, being based on the whole statistical population from which the statistical unit was chosen randomly@en |

172 | INFO | lowercase_definition | STATO:0000243 | IAO:0000115 | a box plot is a graph which plots datasets relying on their quartiles and the interquartile range to create the box and the whiskers.@en |

173 | INFO | lowercase_definition | STATO:0000244 | IAO:0000115 | (Rn +) − (Rn −), where Rn + = (emission intensity of reporter dye)/(emission intensity of passive reference dye) in PCR with template and Rn − = (emission intensity of reporter dye)/(emission intensity of passive reference dye) in PCR without template or early cycles of a real-time reaction. Ct = threshold cycle, i.e., cycle at which a statistically significant increase in ΔRn is first detected@en |

174 | INFO | lowercase_definition | STATO:0000247 | IAO:0000115 | odds ratio homogeneity test is a statistical test which aims to evaluate that null the hypothesis of consistency odds ratio accross different strata of population is true or not@en |

175 | INFO | lowercase_definition | STATO:0000248 | IAO:0000115 | a blocking variable is a independent variable which is used in a blocking process part of an experiment with the purpose of maximizing the signal coming from the main variable. |

176 | INFO | lowercase_definition | STATO:0000249 | IAO:0000115 | a DNA microarray hybridization is an assay relying on nucleic acid hybridization , which uses a DNA microarray device and a nucleic acid as input. It precedes a data acquisition process@en |

177 | INFO | lowercase_definition | STATO:0000250 | IAO:0000115 | group comparison objective is a data transformation objective which aims to determine if 2 or more study group differ with respect to the signal of a response variable@en |

178 | INFO | lowercase_definition | STATO:0000252 | IAO:0000115 | a categorical variable is a variable which that can only assume a finite number of value and cast observation in a small number of categories@en |

179 | INFO | lowercase_definition | STATO:0000253 | IAO:0000115 | the objective of a data transformation to test a null hypothesis of absence of difference within subject holds.@en |

180 | INFO | lowercase_definition | STATO:0000255 | IAO:0000115 | the objective of a data transformation to test a null hypothesis of absence of difference withing subject holds.@en |

181 | INFO | lowercase_definition | STATO:0000256 | IAO:0000115 | a manhattan plot for gwas is a kind of scatter plot used to facilitate presentation of genome-wide association study (GWAS) data. Genomic coordinates are displayed along the X-axis, with the negative logarithm of the association P-value for each single nucleotide polymorphism displayed on the Y-axis.@en |

182 | INFO | lowercase_definition | STATO:0000258 | IAO:0000115 | a variable is a data item which can assume any of a set of values, either as determined by an agent or as randomly occuring through observation.@en |

183 | INFO | lowercase_definition | STATO:0000259 | IAO:0000115 | the relationship between a fraction and the number below the line (or divisor)@en |

184 | INFO | lowercase_definition | STATO:0000260 | IAO:0000115 | repeated measure ANOVA is a kind of ANOVA specifically developed for non-independent observations as found when repeated measurements on the sample experimental unit. repeated measure ANOVA is sensitive to departure from normality (evaluation using Bartlett's test), more so in the case of unbalanced groups (i.e. different sizes of sample populations). Departure from sphericity (evaluation using Mauchly'test) used to be an issue which is now handled robustly by modern tools such as R's lme4 or nlme, which accommodate dependence assumptions other than sphericity.@en |

185 | INFO | lowercase_definition | STATO:0000264 | IAO:0000115 | a factor level combination is one a possible sets of factor levels resulting from the cartesian product of sets of factor and their levels as defined in a factorial design@en |

186 | INFO | lowercase_definition | STATO:0000267 | IAO:0000115 | grouped bar chart is a kind of bar chart which juxtaposes the discrete values for each of the possible value of a given categorical variable, thus providing within group comparison. Grouped bar charts are good for comparing between each element in the categories, and comparing elements across categories. However, the grouping can make it harder to tell the difference between the total of each group.@en |

187 | INFO | lowercase_definition | STATO:0000269 | IAO:0000115 | polychoric correlation coefficient is a correlation coefficient which is computed over 2 variables to characterise an association by proxy with 2 (latent) variables which are assumed to be continuous and normally distributed.@en |

188 | INFO | lowercase_definition | STATO:0000270 | IAO:0000115 | a full factorial design is a factorial design which ensures that all possible factor level combinations are defined and used so all between group differences can be explored@en |

189 | INFO | lowercase_definition | STATO:0000271 | IAO:0000115 | permutation numbering is a data tranformation allowing to count the number of possible permutations of elements in a set of size n, each element occurring exactly once. This number is factorial n.@en |

190 | INFO | lowercase_definition | STATO:0000274 | IAO:0000115 | receiver operational characteristics curve is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold (aka cut-off point) is varied by plotting sensitivity vs (1 − specificity)@en |

191 | INFO | lowercase_definition | STATO:0000277 | IAO:0000115 | hit selection is a planned process which in screening processes such as high-throughput screening, lead to the identification of perturbing agent which cause the typical signal generated by a standardized assay to significantly differ from the negative control. The selection hitself results from meeting or exceeding selection threshold (for instance 6 sigma from the mean or SSMD value beyond 5 when compared to positive controls or below -5 when compared to negative controls@en |

192 | INFO | lowercase_definition | STATO:0000278 | IAO:0000115 | pairing rule is a rule which specifies the criteria for deciding on how to associated any 2 entities.@en |

193 | INFO | lowercase_definition | STATO:0000279 | IAO:0000115 | between group comparison statistical test is a statistical test which aims to detect difference between the means computing for each of the study group populations@en |

194 | INFO | lowercase_definition | STATO:0000281 | IAO:0000115 | a false positive rate whose value is 1 per cent@en |

195 | INFO | lowercase_definition | STATO:0000283 | IAO:0000115 | negative binomial probability distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (non-random) number of failures (denoted r) occur. The negative binomial distribution, also known as the Pascal distribution or Pólya distribution, gives the probability of r-1 successes and x failures in x+r-1 trials, and success on the (x+r)th trial.@en |

196 | INFO | lowercase_definition | STATO:0000285 | IAO:0000115 | hypergeometric test is a null hypothesis test which evaluates if a random variable follows a hypergeometric distribution. It is a test of goodness of fit to that distribution. The test is suited for situation aimed at assessing cases of sampling from a finite set without replacements. For instance, testing for enrichment or depletion of elements (e.g GO categories, genes)@en |

197 | INFO | lowercase_definition | STATO:0000286 | IAO:0000115 | a one-tailed test is a statistical test which, assuming an unskewed probability distribution, allocates all of the significance level to evaluate only one hypothesis to explain a difference. The one-tailed test provides more power to detect an effect in one direction by not testing the effect in the other direction. one-tailed test should be preceded by two-tailed test in order to avoid missing out on detecting alternate effect explaining an observed difference.@en |

198 | INFO | lowercase_definition | STATO:0000287 | IAO:0000115 | a two tailed test is a statistical test which assess the null hypothesis of absence of difference assuming a symmetric (not skewed) underlying probability distribution by allocating half of the significance level selected to each of the direction of change which could explain a difference (for example, a difference can be an excess or a loss).@en |

199 | INFO | lowercase_definition | STATO:0000289 | IAO:0000115 | a design matrix is an information content entity which denotes a study design. The design matrix is a n by m matrix where n the number of rows, corresponds to the number of observations (4 rows if quadruplicates) and where m, the number of columns corresponds to the number of independent variables. Each element in the matrix correspond to a discretized value representing one of the factor levels for a given factor. A design matrix can be used as input to statistical modeling or statistical analysis. The design matrix contains data on the independent variables (also called explanatory variables) in statistical models which attempt to explain observed data on a response variable (often called a dependent variable) in terms of the explanatory variables. The theory relating to such models makes substantial use of matrix manipulations involving the design matrix: see for example linear regression. A notable feature of the concept of a design matrix is that it is able to represent a number of different experimental designs and statistical models, e.g., ANOVA, ANCOVA, and linear regression@en |