Errata on the 3rd printing of "Data Mining: Concepts and Techniques"

Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2000. ISBN 1-55860-489-8

Please send us by e-mails (1) the errors you found in the book but not yet listed in the errata, and (2) your suggestion and comments on the revision of the book. Thanks!

Chapter 1. Introduction

Chapter 2. Data Warehouse and OLAP Technology for Data Mining

P. 67, second paragraph, "3. The top tier is a client, which contains ..." should be "3. The top tier is a front-end client layer, which contains ..." (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Aug. 28, 2001)
P. 72, paragraph 3, line 1, "sum" should not be in sans serif font but in DMQL keyword font. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Aug. 30, 2001)
P. 73, line 6 of paragraph 4, "day < week < month < quarter < year" should be "day < month < quarter < year" (pointed out by Ming Fan, on Feb. 17, 2001)
P. 81, Example 2.14, line 2, "sum" should not be in sans serif font but in DMQL keyword font. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Aug. 28, 2001)
P. 82, Example 2.15, line 1, "[time, item, location]" should be "[item, location, time]" (to be consistent with the concrete examples following it). (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Aug. 28, 2001)
P. 82, Example 2.15, line 2, "sum" should not be in sans serif font but in DMQL keyword font. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Aug. 28, 2001)
P. 98, paragraph 2 from the bottom, line 3, "the top tier is a client" should be changed to "the top tier is a front-end client layer" (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Oct. 16, 2001)

Chapter 3. Data Preprocessing

P. 146, in the footnote, "relevant dimensions" should be printed in bold, not just "dimensions". (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Oct. 16, 2001)
P. 158, Figure 4.4, The pie chart is disproportional, because the sum of class AĄ's count and class BĄ's count is 2440, which is larger than class CĄ's count, 2160. Thus the pie of CĄ should be smaller (less than half) (pointed out by Ming Fan on Feb. 16, 2001)

Chapter 4. Data Mining Primitives, Languages, and System Architectures

Chapter 5. Concept Description: Characterization and Comparison

P. 186, in Example 5.3, item 4. line 3, "city < province_or_state < country". should be changed to "birth_city < birth_province_or_state < birth_country". (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Oct. 16, 2001)
P. 187, in Table 5.2, the third column in the table should be called "birth_region" instead of "birth_country". (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Oct. 16, 2001)
P. 203, in Tables 5.7 and 5.8, "Que" and "Alt" should be changed to "QC" and "AB" respectively (for consistency of abbreviations). (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Oct. 16, 2001)

Chapter 6. Mining Association Rules in Large Databases

P. 231, 4th paragraph, use bold font for the words "Apriori property", Italicize the remaining setence: "All nonempty subsets of a frequent itemset must also be frequent." This sentence should be a paragraph on its own. That is, make the following as the beginning of a new paragraph (paragraph 5): "The Apriori property is based on the following observation. ..."
P. 241, paragraph 2, Line 3, "I2 I1: 2" should be "I2 I4: 2" (pointed out by Ming Fan on Feb. 16, 2001)
P. 241, paragraph 3, line 3, "I1 I3:2" should be "I1 I3:4". (pointed out by Ming Fan on Feb. 16, 2001)
P. 242 under Algorithm Part 1. (a), "frequent items F" should be "frequent items F (where an item is frequent if its support is no less than min_sup)" (pointed out by Ming Fan on Feb. 16, 2001)
On P. 266, line 2, the line should be broken in front of "^ S". Similarly, line 4, the line should be broken in front of "^ T". That is, it should look like,
lives(C, _, "Vancouver")
^ sales(C, ?I1, S1) ^ ... ^ sales(C, ?Ik, Sk) ^ I = {I1, ..., Ik}
^ S = {S1, ..., Sk}
=> sales(C, ?J1, T1) ^ ... ^ sales(C, ?Jm, Tm) ^ J = {J1, ..., Jm}
^ T={T1, ..., Tm}
(pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Oct. 16, 2001)
P. 268, parag. 2, from line 8 on, "Specifically, such a set must contain at least one item whose price is no less than $500 It is of the form S1 U S2, whew S1 != 0 is a subset of the set of all those items with prices no less than $500, and S2 possibly empty, is a subset of the set of all those items with prices no greater than $500." should be changed to "Specifically, the price of every item in such a set must be no less than $500." (pointed out by Anthony K. H. Tung "atung@comp.nus.edu.sg", on Sept. 26, 2002)
P. 274, Exercise 6.7 (b) "multilevel assoication rules" should be changed to "multilevel (but not cross-level) assoication rules"
P. 274, Exercise 6.7 (c) "multilevel assoication rules" should be changed to "multilevel (but not cross-level) assoication rules"

Chapter 7. Classification and Prediction

P. 285, Figure 7.3, it is better to change all the occurrences of "-" (hiphen) to "_" (underscore) in the algorithm to make it more readable (pointed out by Ming Fan on Feb. 16, 2001)
P. 304, Fig 7.8. Change the weight w subscripted by kj (i.e., $w_{kj}$) to jk (i.e., $w_{jk}$). (suggested by Ming Fan on Feb. 16, 2001)
P. 304, Fig 7.8. Label Oj and Ok should be relocated to much closer to the last circle in the second and third columns of circles---now it looks like Oj and Ok point to the whole columns, not just the last circles. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Nov. 20, 2001)
P. 331, line 3, change "Zytko" to "Zytkow" (by author on Aug. 29, 2001)

Chapter 8. Cluster Analysis

P. 352. Figure 8.3. To be consistent with the text, the labels used in the figure (O_i, O_j, O_random) should be o_i, o_j, and o_random (with lowercase "o"). And the "p" in the second box should be written in boldface like the other p's in the other boxes. (suggested by Jonghyun Lee "jlee17@cs.uiuc.edu", on Nov. 20, 2001)
P. 368. parag. 3, starting with "From the density function ...", and ending with "...for a 2-D data set." This whole paragraph should be removed (discovered by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Dec. 19, 2000)
P. 373, line 6, "around each data point The ..." should be "around each data point. The ..." (pointed out by Jian Pei "peijian@cs.sfu.ca", on July 22, 2002)
P. 374, the 2nd sentence in the 2nd paragraph, "it conforms ..." should be "It conforms ...". Also, in the same sentence, "... a good clustering algorithm: It handles ...". should be changed to: "... a good clustering algorithm: it handles ...". (suggested by Jonghyun Lee "jlee17@cs.uiuc.edu", on Nov. 20, 2001)
P. 375, Figure 8.17. Dense areas shown in the third graph do not match well with the ones shown in the first two graphs. Also, it is better to show the third plane and project the 3-plane intersection in the 3-D graph. (suggested by Jonghyun Lee "jlee17@cs.uiuc.edu", on Nov. 20, 2001)

Chapter 9. Mining Complex Types of Data

P. 407, 2nd line of Example 9.6. "It consists of four dimensions: region temperature, ...", should be "It consists of four dimensions: region, temperature, ...". (suggested by Jonghyun Lee "jlee17@cs.uiuc.edu", on Nov. 20, 2001)
P. 407 Figure 9.2, in region dimension table, "region_name" should be changed to "region", in BC_weather fact table, "region_name" should be changed to "probe_location". (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Nov. 20, 2001)
P. 408 Figure 9.3, line 1, "region_name" should be changed to "region". (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Nov. 20, 2001)
P. 413, line 6, "image-" should be changed to "image". (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Dec. 11, 2001)
P. 425, the 3rd paragraph from the bottom, boldface for "time interval", not just for "interval". (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Dec. 11, 2001)

Chapter 10. Data Mining Applications and Trends in Data Mining

P. 466. This is too obvious, but text or another figure is needed for the blank space. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Dec. 11, 2001)

Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

P. 488, line 1, "(Minimum_size = 3)" should be written in norman font. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Dec. 11, 2001)
P. 488, 6th line from the bottom, "INTO CLAUSE" is written in the same bold font, CLAUSE should be written in lowercase, normal font. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Dec. 11, 2001)
P. 490 line 9, "OLE DB for DM provides a number of functions that can used ..." should be "OLE DB for DM provides a number of functions that can be used ..." (pointed out by Kalman Balogh (KBalogh@matavnet.hu), on March 25, 2001).
PP. 490-491. For the various functions shown in the pages (i.e. Cluster(), ClusterProbability(), PredictHistogram(), etc), the main text uses bold font but the illustrative example uses a different font. The font used in the text should be changed to that used in the example to make them consistent. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Dec. 11, 2001)
P.520, the last line, "http://www.microsoft.com/data/oledb/dm.html" should be "http://www.microsoft.com/data/oledb/dm.htm". (pointed out by Illhoi Yoo "potence@drexel.edu", on Aug. 11, 2002)

Appendix B. An Introduction to DBMiner

Index

P. 534, column 1, line 25, (i.e., before "maxpaterrn") add an index entry: "lift 261" (suggested by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Dec. 19, 2000)
P. 542. Column 2, line 7 (before Google", add an index entry: "Gini index, 292" (suggested by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Dec. 19, 2000)
P. 535. Columns 1 and 3, C4.5 add: "C4.5 291", (suggested by Jian Pei (jianpei@cse.buffalo.edu) on Oct. 18, 2002)

Errata on the 3rd printing of "Data Mining: Concepts and Techniques"

Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2000. ISBN 1-55860-489-8

P. 466. This is too obvious, but text or another figure is needed for the blank space. (pointed out by Jonghyun Lee "jlee17@cs.uiuc.edu", on Dec. 11, 2001)

Back to Jiawei Han's Home Page Back to the Home Page of Intelligent Database Systems Research Laboratory, Computing Science , Simon Fraser University

Back to Jiawei Han's Home Page

Back to the Home Page of Intelligent Database Systems Research Laboratory, Computing Science , Simon Fraser University