关键词不能为空

当前您在: 主页 > 英语 >

浣熊的英文基于决策树方法的分类规则的挖掘(Mining of classification rules based on deci

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-01-19 09:37
tags:

torrente-浣熊的英文

2021年1月19日发(作者:amends)
基于决策树方法的分类规则的挖掘(
Mining of classification
rules based on decision tree method


Mining of classification rules based on decision tree method

Category: AUTO--HTML

Fifth authors: Wang Shuyan

Abstract:

Data mining is motivated by the decision support problem
(decision support) faced by some large retail organizations.
The
large
amount
of
sales
data
collected
by
bar
code
technology
is the basis of mining. Based on the data mining, we can find
for sales and production for some useful information (the
information reflected by a certain pattern), so as to improve
sales and production efficiency, reduce costs, maximize
business benefits, this is the significance of data mining.
This paper introduces a model of data mining: classification
rules based on decision tree method. And an algorithm named
--C4.5 algorithm
for
constructing
decision tree classifier
is
discussed in detail

1 research background

1.1 terms

Definition of KDD technology:

KDD is the process of extracting credible, novel, effective,
and understandable patterns from large amounts of data. This
process is a high-level process.

Typically,
KDD
consists
of
three
phases:
data
preparation,
data
mining, and interpretation and evaluation of results. Data
mining is based on the decision needs, determine the mission
and
purpose
of
data
mining,
and
the
mining
algorithm
from
data
mining useful knowledge by specific data, is the core part of
KDD, is the main topic of KDD research. In general, we do not
distinguish between KDD and data mining.

Data
mining
(Data
Mining)
is
from
the
large,
incomplete,
noisy,
fuzzy and random data, in which the extraction of implicit,
people do not know in advance, but is potentially useful
information and knowledge.

2 classification rule analysis

2.1 classifier

The problem
of
sorting
is to categorize an
event
or
object.
In
use, you can use this model to analyze existing data, or you
can
use
it
to
predict
future
data.
The
concept
of
classification
is
to
learn
a
classification
function
on
the
basis
of
existing
data or to construct a classification model (that is, what we
usually refer to as a classifier (Classifier)). The function
or model can map data records in a database to one of a given
class, which can be applied to data prediction. For example,
use the classification to predict which customers are most
likely
to
sell
for
direct
mail
to
respond,
and
what
the
customer
may
change
his
mobile
phone
service
provider,
or
in
the
medical
field
when
a
case
with
the
classification
to
determine
what
is
good from what to drug.

To
construct
a
classifier,
a
training
sample
data
set
is
needed
as
input.
The training
set
(Training set)
consists
of
a
set
of
database
records
or
tuples,
each
record
is
a
by
the
field
value
of
the
feature
vector,
we
put
these
fields
(Attribute),
called
attributes for classification attribute called tag (Label),
label attribute is the training set. The form of a specific
sample can be represented as (V1, V2,..., VN, c), where VI
represents the field value and C represents the category.
Training
set
is
the
basis
of
constructing
classifier.
The
type
of
the
tag
property
must
be
discrete,
and
the
fewer
the
possible
values
for
the
tag
property,
the
better
(preferably
two
or
three
values).
The
fewer the tag
values,
the
lower
the
error rate
of
the classifier constructed.

There
are
three
common
classifiers:
decision
tree
classifier,
selection tree classifier and evidence classifier. We mainly
study decision tree classifiers. An algorithm that
automatically constructs classifiers from a training set is
called a generator.

2.2 decision tree classifier

The decision tree method originated from the CLS:Concept
Learning System (C4.5), and then developed the ID3 method and
reached its peak. At last it evolved into a continuous
processing system. The famous decision tree methods include
CART and Assistant.

A
decision tree
provides a
way
to
demonstrate such
rules
as
to
what values will be obtained under what conditions. Such as,

In
the
proposal, to
insure
the
size
of
the
risk
judgment, this
is a decision tree built in order to solve this problem, the
basic
part
we
can
see
from
the
decision
tree:
the
decision
node,
branches and leaves.

The
top
node
in
the
decision
tree
is
called
the
root
node,
which
is the beginning of the whole decision tree. The root node in
this case is the
to this question of

The number of node nodes in a decision tree is related to the
algorithm
used
by
the
decision
tree.
Such
as
the
decision
tree
obtained
by
the
CART
algorithm,
each
node
has
two
branches,
and
this tree is called two fork tree. A tree that allows nodes to
have more than two nodes is called a multiway tree.

Each branch is either a new decision node, or the end of the
tree,
called
a
leaf.
In
the
process
of
traversing
the
decision
tree from top to bottom, each node encounters a problem, and
the different answers to the questions on each node lead to
different branches and eventually arrive at a leaf node. This
process is the process of classifying a decision tree using
several variables (each of which corresponds to a problem) to
determine
which
category
(the
last
leaf
will
correspond
to
one
category).

The process of building a decision tree, that is, the growth
process
of
trees
is
the
process
of
continuous
data
segmentation,
each segmentation corresponds to a problem, but also
corresponds to a node. The maximum difference between groups
that are required to be split into each segmentation.

The main difference between the various decision tree
algorithms
is
the
difference
between
the

measures.
Here, we just need to look at segmentation as a group of data
that is divided into several parts, which are as different as
possible, while the same data are the same. The process of
segmentation can also be called
our example, which contains two categories - claims and no
claims.
If
we
get
a
group
after
a
segmentation,
the
data
in
each
grouping belong to the same category. Obviously, the
segmentation method that achieves this effect is what we
pursue.

So
far,
the
examples
we've
discussed
are
very
simple,
and
trees
are easy to understand. Of course, the decision trees used in
practice can be very complex. Assume that we have established
more than ten kinds of a decision tree contains hundreds of
attributes,
the
output
of
the
class
based
on
the
historical
data,
such
a
tree
may
be
too
complex,
but
each
path
from
the
root
node
to
a
leaf
node
of
the
described
meaning
still
is
understandable.
The understandability of decision trees is a significant
advantage for data mining users.

This
clarity
of
decision
trees,
however,
can
be
misleading.
For
example,
the
definition
of
the
corresponding
decision
tree
each
node
segmentation
are
very
clear
unambiguous,
but
this
in
real
life
may
bring
trouble
(clear
what
the
annual
income
of
$$40001
has smaller credit risk and $$40000 were on no).

The establishment
of
a decision tree may be as
long
as
several
times after the scanning can be done on the database, it also
means less computing resources needed, and can easily contain
a lot of prediction variables, so the decision tree model can
be built quickly, and is suitable for application to a large
number of data.

The
final
decision
tree
to
show
it
to
people,
it
is
not
necessary
to grow too
so
as
to
reduce
the
tree's
understandability
and
usability,
but
also
make
the
decision
tree
itself
to
the
increasing
dependence
of historical data, that is to say the decision tree this
historical data may be very accurate, once applied to the
accuracy decrease sharply when the new data, this is called
overtraining. In order to make the rules contained in the
decision tree universal, it is necessary to prevent
overtraining
and
reduce
training
time.
So
we
need
a
way
to
stop
the tree from growing at the right time. The commonly used
method is to set the maximum height (number of layers) of the
decision tree to limit tree growth.

Another
method
is
to
set
the
minimum
number
of
records
each
node
must
contain,
and
when
the
number
of
records
in
the
node
is
less
than this value, the segmentation is stopped. In contrast to
the set off growth condition, the tree is pruned after it is
established. Allow the tree to grow as much as possible, then
prune
it
to
a
smaller
size.
Of
course,
while
pruning,
you
should
try
to
keep
the
accuracy
of
the
decision
tree,
so
as
not
to
drop
too much.

Decision
trees
are
good
at
dealing
with
non
numeric
data,
which
eliminates
much
data
preprocessing
compared
to
neural
networks
that can only handle numeric data.

Introduction to 3 C4.5 algorithm

C4.5 algorithm is an algorithm for constructing decision tree
classifier. It is an extension of ID3 algorithm. The ID3
algorithm can
only deal with
discrete
descriptive
attributes,
and the C4.5 algorithm can handle descriptive attributes that
are
continuous.
This
algorithm
uses
the
size
of
the
Gian
value
of each descriptive attribute to select the largest attribute
of the Gain value for classification. If there are continuous
descriptive
attributes,
the
first
thing
to
do
is
to
divide
the
values
of
these
continuous
attributes
into
different
intervals,
that is, discretization

1., the specific method of discretization of continuous
attribute values is:

1)
look
for
the
minimum
value
of
the
continuous
type
attribute
and assign it to MIN,

Find the maximum value of the continuous type attribute and
assign it to MAX;

2) sets the N equal breakpoint Ai in the interval [MIN, MAX],
respectively

MAX - MIN

Ai = MIN * I +-------------

N

Among them, I = 1, 2,..., N

3) respectively calculate [MIN, Ai] and (Ai, MAX'>)

(I = 1, 2,..., N) as the interval value of the Gain value and
compare them

4)
select
the Ak
with the largest
Gain
value
as
the
breakpoint
of the continuous type attribute, set the attribute values to
[MIN, Ak], and (Ak, MAX'>)

Two interval values.

2. Gain function

Decision tree is built on the basis of Information (Theory),
and
the
method
of
decision
tree
is
used
to
find
a
standard,
which
can bring the maximum information related to this
classification.
The
key
to
a
good
decision
tree
is
how
to
select
good attributes. For the same set of records, there are many
decision
trees
that
fit
the
set
of
records.
People
have
studied
that
in
general,
the
smaller
the
tree,
the
stronger
the
tree's
ability
to
predict.
The
key
to
constructing
a
decision
tree
as
small as possible is to select the appropriate attributes.
Attribute selection relies on a variety of methods for
measuring the purity (impurity) of subsets of instances. Not
the purity measurement methods include information gain

torrente-浣熊的英文


torrente-浣熊的英文


torrente-浣熊的英文


torrente-浣熊的英文


torrente-浣熊的英文


torrente-浣熊的英文


torrente-浣熊的英文


torrente-浣熊的英文



本文更新与2021-01-19 09:37,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/531795.html

基于决策树方法的分类规则的挖掘(Mining of classification rules based on deci的相关文章

  • 爱心与尊严的高中作文题库

    1.关于爱心和尊严的作文八百字 我们不必怀疑富翁的捐助,毕竟普施爱心,善莫大焉,它是一 种美;我们也不必指责苛求受捐者的冷漠的拒绝,因为人总是有尊 严的,这也是一种美。

    小学作文
  • 爱心与尊严高中作文题库

    1.关于爱心和尊严的作文八百字 我们不必怀疑富翁的捐助,毕竟普施爱心,善莫大焉,它是一 种美;我们也不必指责苛求受捐者的冷漠的拒绝,因为人总是有尊 严的,这也是一种美。

    小学作文
  • 爱心与尊重的作文题库

    1.作文关爱与尊重议论文 如果说没有爱就没有教育的话,那么离开了尊重同样也谈不上教育。 因为每一位孩子都渴望得到他人的尊重,尤其是教师的尊重。可是在现实生活中,不时会有

    小学作文
  • 爱心责任100字作文题库

    1.有关爱心,坚持,责任的作文题库各三个 一则150字左右 (要事例) “胜不骄,败不馁”这句话我常听外婆说起。 这句名言的意思是说胜利了抄不骄傲,失败了不气馁。我真正体会到它

    小学作文
  • 爱心责任心的作文题库

    1.有关爱心,坚持,责任的作文题库各三个 一则150字左右 (要事例) “胜不骄,败不馁”这句话我常听外婆说起。 这句名言的意思是说胜利了抄不骄傲,失败了不气馁。我真正体会到它

    小学作文
  • 爱心责任作文题库

    1.有关爱心,坚持,责任的作文题库各三个 一则150字左右 (要事例) “胜不骄,败不馁”这句话我常听外婆说起。 这句名言的意思是说胜利了抄不骄傲,失败了不气馁。我真正体会到它

    小学作文
基于决策树方法的分类规则的挖掘(Mining of classification rules based on deci随机文章