Paraconsistent Many–Valued Logic in GUHA Framework

The primary aim of this paper is to establish a formal connection between a particular many–valued paraconsistent logic and the logic of a KDD method, namely the GUHA data mining method by introducing a new quantifier called Paraconsistent Separation quantifier. This quantifier is implemented to LISp–Miner Software. The secondary aim is to demonstrate a possible usefulness of this quantifier in social and other applied sciences by examples taking from family planning context.


Introduction
Paraconistent logics are logical systems that tolerate contradictions and defective information. We may have sentences α such that both α and its negation ¬α are accepted to be true and still, unlike in most logical systems, the consequence relation does not explode into triviality; not everything is true. The argument ex contradictione quodlibet is invalid in paraconsistent logics. This is either due to the alternative way to understand negation, or/and by introduction of a new concept called evidence; if α is considered to be true, then there is of course evidence in favor of α, however, the converse may not hold. The fact that there is evidence in favor for α does not automatically imply that α would be true as there might be also evidence against α, or that there is evidence in favor of ¬α. In general paraconsistent setting, evidence against α is not necessary the same as evidence in favor of ¬α; thus, α and ¬α are not opposite or complementary to each other. Also defective information, i.e. lack of any kind of evidence, can naturally be treated in various paraconsistent logics. A comprehensive review of paraconsistent logics is presented e.g. in Carnielli & Coniglio (2016).
Fuzzy logics, in turn, are particular many-valued logics, see e.g. Cintula et al. (2011). In literature, fuzzy logic and paraconsistent logic are combined in at least two different ways. The first and quite a common way is to add a new unary operator '•' to the language of logic; '•α' means 'α behaves classically'. For such an approach, see e.g. Rodolfo et al. (2015). An alternative way is introduced by the present author and presented in Turunen et al. (2010); indeed, ideas of Belnap's logic FOUR (Belnap, 1977) inspired us to invent a paraconsistent version of Pavelka's fuzzy logic (Pavelka, 1979a,b,c).
In this short note we demonstrate how the GUHA logic (Rauch, 2013), a well-known logic approach in descriptive data mining framework, is naturally related to paraconsistent logic of the second type. These new findings are implemented to LISp-Miner software (LISp-Miner, 2018), a KDD software based on the ideas of GUHA. We also demonstrate, by a data taken in family planning framework, how our results can be utilized in analyzing data dealing with social and other applied sciences.

Problem Setting
When analyzing large data masses by LISp-Miner software, we are sometimes interested in (partly) contradictory information the data contains, or in data that is deficient. A typical example is to find in the data Boolean attributes ψ and ϕ that are distinct in the sense that they occur (at least in most cases) separately; the presence of ψ means the absence of ϕ, and vice verse. The lack of information is also possible. Due to the large amount of data being studied, it is natural to require that tools for such type of data mining should be computationally simple.
The fundamental idea to deal with paraconsistent and many-valued statements, presented in Turunen et al. (2010), is to associate to a logic formula α an evidence couple ⟨a, b⟩ ∈ [0, 1] × [0, 1], where a is evidence for α and b is evidence against α. The values a and b are mutually independent. The truth T (α), falsehood F (α), contradiction K(α) and unknown U (α) of α are calculated via They are usually presented by 2 × 2 matrices In this setting F (α) and T (α) are not each others complements, however, it holds that T (α) + F (α) + K(α) + U (α) = 1.

The Solution
The answer to both questions is affirmative. From a logical point of view, the fact that Boolean attributes ϕ and ψ appear (almost always) together, or alternatively are (almost) separate can expressed by logical equivalence and excluded disjunction respectively. On the other hand, it is well know that, given a 4ft-table • a is the number of objects satisfying both ϕ and ψ, • b is the number of objects satisfying ϕ but not ψ, • c is the number of objects not satisfying ϕ, satisfying ψ, • d is the number of objects not satisfying ϕ nor ψ, This quantifier, called founded equivalence is implemented to LISp-Miner and its properties are well known. It can be presented in paraconsistent logic framework discussed in Turunen et al. (2010); the evidence of ϕ and ψ to appear simultaneously is simply a+d n , while the opposite evidence is b+c n , yielding an evidence matrix where α stands for 'ϕ and ψ appear simultaneously'. Notice that K(α) = U (α) = 0 and therefore F (α) is the complement of T (α). Alternatively, to search evidence of ϕ and ψ to appear simultaneously in a given data, we may use the founded double implication quantifier with GUHA-truth definition Then the corresponding evidence matrix is Also in this setting F (α) is the complement of T (α). Now turn to the problem of searching attributes ϕ and ψ that appear (almost) mutually exclusively in a given data. Before our study there was no quantifier implemented in LISp-Miner to perform this task. A natural GUHA-truth definition v(α) = TRUE, where α stands for 'ϕ and ψ are mutually exclusive', should be given as follows (3) The higher is the value of the parameter p, the fewer common occurrences of ϕ and ψ there are in the data matrix. Since (3) is equivalent to the corresponding evidence couple would be ⟨ b+c a+b+c , a a+b+c ⟩ and the evidence matrix As a results of this research, this quantifier is now implemented to LISp-Miner software under the name Paraconsistent separation quantifier. An alternative way, and even better fitting to the ideas of paraconsistency, is to define It is simple arithmetic to verify that condition (5) holds if, and only if condition (4) holds.
The corresponding evidence couple is now ⟨ b+c n , a n ⟩ and the evidence matrix where α again stands for 'ϕ and ψ are mutually exclusive ′ . Indeed, the value d n indicates the unknown cases; not ϕ nor ψ is present. Since T (α)+F (α)+U (α) = 1, F (α) is, in general, not the complement of T (α); this is in line with paraconsistent philosophy. However, K(α) = 0 in this setting; this follows by the simple arithmetical fact that K(α) = max{0, b+c n + a n − 1} = max{0, b+c a+b+c+d + a a+b+c+d − a+b+c+d a+b+c+d } = 0, whenever d > 0. On the other hand, if d > 0 then U (α) = max{0, 1 − b+c n − a n } = max{0, a+b+c+d a+b+c+d − b+c a+b+c+d − a a+b+c+d } > 0. The characteristic originates from Belnap's logic FOUR (Belnap, 1977) and is discussed generally in detail in Turunen et al. (2010); if the sum of all evidence is less that 1, then there is a short of lack of information, thus U (α) > 0. On the other hand, if all evidence sums up to more that 1, then this is interpreted as a (partial) contradiction, hence K(α) > 0.
Notice that, since conditions (4) and (5) are equivalent, it is indifferent which one of them is used. However, with respect to computation time by LISp-Miner software they are not equivalent; condition (4) has one parameter less that condition (5), namely d, which requires less computing time, and therefore we use condition (4) in computations. The difference may seem meaningless, but when there are several millions of equations to be verified, the matter begins to have importance. Moreover, we notice that by allowing p = 0 in the equation (4), or equivalently, in the equation (5) we obtain a kind of nonequivalence quantifier.
Finally, comparing (only) the evidence matrix M 1 for 'ϕ and ψ appear simultaneously' to the evidence matrix M 2 for 'ϕ and ψ appear mutually exclusively', one might jump to a conclusion that, due to a symmetry of these two matrices, the introduced Paraconsistent Separation quantifier would be redundant and could be replaced, say, by negation and founded implication/equivalence quantifier. However, this is not possible. Indeed, in the truth definition of the nonequivalence quantifier the essence (in equations (4) or (5)) is to compare the value a to b + c, where the former should be much smaller than the latter. However, there is no other quantifier implemented to LISp-Miner to do such a task, as all the other quantifiers compare (some of) the values a, b, c, d or their fragments to some fixed parameter p ∈ (0, 1], e.g. equations (1) or (2).

Paraconsistent Separation Quantifier in Practice -Some LISp-Miner Experiments
Besides the main result of this study, i.e. the established theoretical connection between paraconsistent logic and the GUHA logic, the invented Paraconsistent Separation quantifier (implemented now to LISp-Miner software) has also practical significance in descriptive data mining. Indeed, using the new quantifier we discover relationships in real world data which would not be possible to discover by other quantifiers implemented to LISp-Miner, or at least finding them would be much more laborious. Indeed, to a general analytical question 'Does the given data contain mutually exclusive factors?' can be answered effortlessly by using the Paraconsistent Separation quantifier.
To demonstrate a use of Paraconsistent Separation quantifier, we take an example from the context of family planning and analyze the 1987 National Indonesia Contraceptive Prevalence Survey data (Lim, 1997). This data set is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. The samples are 1473 married women who were either not pregnant or did not know if they were at the time of interview. The data contains information about the current contraceptive method choice (no use, long-term methods, or short-term methods) of a woman together with her demographic and social-economic characteristics. The data consists of the following ten variables:
Number of children ever born: numerical 5.
Media exposure: Good, Not good 10. Contraceptive method used: No-use, Long-term, Short-term As an example of a relevant analytic question we set 'Are there factors that (in almost all cases) exclude the use of contraception?'. One of the discovered factor is 'There are no children in the family'. The corresponding 4ft- The corresponding evidence couple for the statement α ('Not using any contraception and not having any children are mutually exclusive') is ⟨0.636, 0.001⟩, and the related evidence matrix is One way to interpret this result to the following. The truth T (α) = 0.636 is several hundred times larger than the falsehood F (α) = 0.001. The degree of unknown of the truth value is U (α) = 0.363. A natural latent explanation for this outcome is that these are mostly the young married couples; they want children.
Another relevant analytic question we set is 'Are there factors that (in almost all cases) exclude the use of some particular contraception method?'. There are several outcomes, one of them being wife's low education and use of long term contraception; the corresponding 4ft- Also in this case the truth value T (α) = 0.317 is considerably larger than the falsehood value F (α) = 0.006. Another answer to this analytic question is the GUHA hypothesis α ('Wife's age ≥ 47 years and not using sort term contraception are (almost) mutually exclusive'). The corresponding 4ft- In this case the truth value T (α) = 0.401, while the falsehood value F (α) = 0.005.
This kind of findings (maybe not completely unexpected) if considered as interesting, would direct social studies to explore the matter in more detail and in a larger sample. By these simple examples we emphasize the usefulness of the GUHA data mining method in general and our new results in particular when considering applications and impact on the society.

Conclusion and future work
The primary purpose and novelty of this paper is to demonstrate a theoretical connection between two non-classical logics; the GUHA logic for data mining purposes on the one hand, and paraconsistent logic to deal with contradictory and/or defective information on the other hand. By combining two different theoretical approaches, we achieve new knowledge that enriches both theories. In the present case, the benefit is the introduction of paraconsistent approach in data mining, and even a new quantifier implemented to the LISp-Miner software. Thus, we have introduced a new direction of research in applied logics. The secondary objective of this study is to show how the new approach can possibly be utilized in concrete data mining tasks; we have used the well-known National Indonesia Contraceptive Prevalence Survey data. To our knowledge and experience, the dependencies found in the given example cannot easily be achieved by other quantifiers implemented in LISp-Miner software. Moreover, the question of whether they could be found by other data mining methods is not in the scope of this research.
There are several interesting topics of further research. One of them is negation in the GUHA context. At present GUHA operations are based on classical negation understand as set theoretical complement; the negation of green, in the set of green, blue, yellow and red, is identified with blue, yellow and red. However, in some context the negation of green is red. Such a negation is closely related to paraconsistent negation. Another interesting topic to investigate is the relation of the GUHA logic and the many-valued and paraconsistent logic introduced in Rodriguez et al. (2014).