Errata on the first and second printings of the book: "Data Mining: Concepts and Techniques"

Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2000. ISBN 1-55860-489-8

Please send us by e-mails (1) the errors you found in the book but not yet listed in the errata, and (2) your suggestion and comments on the revision of the book. Thanks!

Chapter 1. Introduction

P. 10, Example 1.1, line 3, the sentence "For each relation, the attribute that represents the key or composite key component is underlined." should be removed. (pointed out by Nancy Tsao , CMPT459 student, on Oct. 2, 2000)
P. 19, under the subheader "Heterogeneous databases and legacy databases", "Objects in one component database may differ ..." should be "A {\bf heterogeneous database} consists of a set of interconnected, autonomous component databases. The components communicate in order to exchange information and answer queries. Objects in one component database may differ ...".
P. 28, 2nd paragraph, line 5 "This is often sufficient to ensure the completeness of the algorithm." should be "For some mining tasks, such as association, this is often sufficient to ensure the completeness of the algorithm."

Chapter 2. Data Warehouse and OLAP Technology for Data Mining

P. 41 2nd paragraph, Line 8. "souces" should be "sources" (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Sept. 20, 2000)
P. 59. Figure 2.10 In "Example of Typical OLAP Operations on Multidimensional Data", for slice operation, the [Vancouver, security] subcube should be 400 instead of 440. (pointed out by Ronald Chen (ychenb@sfu.ca), CMPT459 student, on Oct. 5, 2000)
P. 73, line 6 of paragraph 4, "day < week < month < quarter < year" should be "day < month < quarter < year" (pointed out by Ming Fan, on Feb. 17, 2001)
P. 75, line 4 of Example 2.12. "equisized partitions" should be "equal-sized partitions" (pointed out by Ronald Chen (ychenb@sfu.ca), CMPT459 student, on Oct. 5, 2000)
P. 77 2nd paragraph, Line 5. "By scanning chunks 1 to four of ABC," should be "By scanning chunks 1 to 4 of ABC," (pointed out by Nancy Tsao (ntsao@sfu.ca), CMPT459 student, on Oct. 4, 2000)
P. 78 1st line, instead of "10 x 4000 (for one row of the AC plane)...", it should be "40 x 1000 (for one row of the AC plane)...". Similarly, the same change should be done for P. 78, paragraph 2, line 5. (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Oct. 9, 2000)
P. 78 Fig. 2.16 (Left chart), "all" should be calculated from A rather than B (i.e., the solid line to "all should be from node A instead of node B. (pointed out by Jian Pei (peijian@sfu.ca), CMPT459 TA, on Oct. 6, 2000)
Page 81 Figure 2.18. The line linked to T459 should be linked to T238 which will be consistent with the text on P. 81 and the index table in Figure 2.19. (pointed out by Nancy Tsao , CMPT459 student, on Oct. 6, 2000)
Page 96 4th line under subsection "Architecture ..." "an graphical ...", should be "a graphical ..." (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Oct. 18, 2000)
Page 102, question 2.13, Line 3, ", and whose shelf life is within 25\% of the minimum shelf life, and within 50\% of the minimum shelf life." should be ", and whose shelf life is between 1.25 and 1.5 of the minimum shelf life."

Chapter 3. Data Preprocessing

P. 137 Figure 3.16 Line 1. "Low' = $1,000,000" should be "Low' = - $1,000,000" (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Sept. 24, 2000)

Chapter 4. Data Mining Primitives, Languages, and System Architectures

P. 158, Figure 4.4, The pie chart is disproportional, because the sum of class A¡¯s count and class B¡¯s count is 2440, which is larger than class C¡¯s count, 2160. Thus the pie of C¡ should be smaller (less than half) (pointed out by Ming Fan on Feb. 16, 2001)
P. 162 line 1, "attribute_or_dimemsion_list", should be "attribute_or_dimension_list", (pointed out by David Halim (dhalim@sfu.ca), CMPT459 student, on Oct. 15, 2000)

Chapter 5. Concept Description: Characterization and Comparison

P. 188, Figure 5.1, one line below "3. P <- generalization(W)", "by replacing each value v in W" should be "by replacing each value v in W by its corresponding v' in the mapping" (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Oct. 17, 2000)

Chapter 6. Mining Association Rules in Large Databases

P. 231, 1st sentence of 3rd paragraph, CHANGE "In order to use the Apriori property, all nonempty subsets of a frequent itemset must also be frequent." TO "Apriori property. All nonempty subsets of a frequent itemset must also be frequent" (Authors' note: bold face the words "Apriori property". This sentence should be a paragraph on its own.)
P. 231, 2nd sentence of the 3rd paragraph, CHANGE "This property is based on the following observation." TO "The Apriori property is based on the following observation." (Authors' note: Make this the beginning of a new paragraph (paragraph 4).)
P. 240 line 4, "I5: 1)" should be "(I5: 1)" (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Oct. 7, 2000)
P. 240 line 5, "I5 is linked to I2" should be "I5 is linked to I1" (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Oct. 7, 2000)
P. 241 Table 6.1 2nd line for I4 under conditional pattern base "I2: 1)" should be "(I2: 1)" (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Oct. 7, 2000)
P. 241 Table 6.1 3rd line for item I3. under "frequent patterns generated", "I1, I3: 2" should be "I1 I3: 4" (pointed out by Baosheng Zhang from China (bszhang@sina.com) on Nov. 3, 2000)
P. 241, paragraph 2, Line 3, "I2 I4: 2" should be "I2 I1: 2" (pointed out by a CMPT459 student, on Nov. 21, 2000) Note: this errata is incorrect! It should remain the same as "I2 I4: 2" (pointed out by Ming Fan on Feb. 16, 2001)
P. 241, paragraph 3, line 3, "I1 I3:2" should be "I1 I3:4". (pointed out by Ming Fan on Feb. 16, 2001)
P. 242 under Algorithm Part 1. (a), "frequent items F" should be "frequent items F (where an item is frequent if its support is no less than min_sup)" (pointed out by Ming Fan on Feb. 16, 2001)
P. 242 under Algorithm Part 2. Line (6), "(beta),s" should be "(beta)'s", (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Oct. 13, 2000)
P. 242 under Algorithm Part 2. Line (8), "F_growth(...)", should be "FP_growth(...)", (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Oct. 13, 2000)
P. 261, Table 6.4, The upperline over the text "game" only covers "g" but not "ame". (pointed out by Ronald Chen , CMPT459 student, on Oct. 23, 2000). The sigma "row" has r printed in subscript which should not be subscript. (pointed out by Ronald Chen , CMPT459 student, on Dec. 1, 2000) The sigma "col" has "col" printed in subscript which should not be subscript. (pointed out by M. Kamber, Feb. 21, 2001).
P. 274, Line 9. "nationality: {Asia, Europe, U.S.A., Latin_America} \in foreign. Let the minimum support threshold be 2% and ..." should be changed to
"nationality: {Asia, Europe, Latin_America} \in foreign
{Canada, U.S.A.} \in North_America
Let the minimum support threshold be 20% and ... "
(suggested by Dennis Chai , CMPT459 student, on Dec. 7, 2000).
P. 274, Line 14. "for all levels" should be changed to "for all levels, for the following rule template,
$\forall S \in R, P(S, x) \wedge Q(S, y) \Rightarrow gpa(S, z)$ [s, c] where P, Q \in {status, major, age, nationality}
Do not mine cross-level association rules.
(suggested by Dennis Chai , CMPT459 student, on Dec. 7, 2000).
P. 274, Line 16. "... where a reduced support of 1% is used for the lowest abstraction level" should be changed to "... where a reduced support of 10% is used for the lowest abstraction level for the following rule template,
$\forall S \in R, P(S, x) \wedge Q(S, y) \Rightarrow gpa(S, z)$ [s, c] where P, Q \in {status, major, age, nationality}
Do not mine cross-level association rules.
(suggested by Dennis Chai , CMPT459 student, on Dec. 7, 2000).
P. 275, 6.15 (e) "variant(S) \le v" should be changed to "avg(S) \ge v" (suggested by Jian Pei , Ph.D. student, on Nov. 7, 2000).
P. 276, line 2. "The Apriori algorithm discussed in Section 6.2.1 was published independently by Agrawal and Srikant [AS94], and Mannila, Toivonen, and Verkamo [MTV94]." should be changed to "The Apriori algorithm discussed in Section 6.2.1 was done by Agrawal and Srikant [AS94]. A variation of the algorithm using a similar pruning heuristic was developed independently by Mannila, Toivonen, and Verkamo [MTV94]."
P. 276, 1st paragraph, line 3 from the bottom. "Iceberg queries are described in Fang, Shivakumar, Garcia-Molina, et al. [FSGM+98], and Beyer and Ramakrishnan [BR99]." should be changed to "Iceberg query computation was studied in Fang, Shivakumar, Garcia-Molina, et al. [FSGM+98], and an efficient iceberg cube computation method was developed by Beyer and Ramakrishnan [BR99]."

Chapter 7. Classification and Prediction

P. 285, Figure 7.3, it is better to change all the occurrences of "-" (hiphen) to "_" (underscore) in the algorithm to make it more readable (pointed out by Ming Fan on Feb. 16, 2001)
P. 304, Fig 7.8. Change the weight w subscripted by kj (i.e., $w_{kj}$) so that it is subscripted instead by jk (i.e.,e $w_{jk}$). (suggested by Ming Fan on Feb. 16, 2001)
P. 311. parag. 1, 3rd sentence "This consists of removing weighted links that do not result in a decrease in the classification accuracy of the given network." should be changed to "This consists of simplifying the network structure by removing weighted links that have the least effect on the trained network. For example, a weighted link may be deleted if such removal does not result in a decrease in the classification accuracy of the network." (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Nov. 1, 2000, and changed by M. Kamber))
P. 324 line 1, parag. 1 of Section 7.9.2, "boostrap" should be "bootstrap" (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Nov. 1, 2000)
P. 328 line 1 from the bottom, (Exercise 7.6) "Let salary be ..." should be "Let status be ...". Similarly, on p. 329 Exercise 7.6 (c) "junior" should be changed to "46..50K" and "salary" to "junior". (pointed out one student in CMPT459 student, on Nov. 27, 2000)

Chapter 8. Cluster Analysis

P. 367. parag. 2, line 6 "... function that can be determined the distance...", should be changed to "... function that can be determined by the distance...". (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Dec. 19, 2000)
P. 368. parag. 3, starting with "From the density function ..." The whole paragraph should be removed (discovered by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Dec. 19, 2000)

Chapter 9. Mining Complex Types of Data

Chapter 10. Data Mining Applications and Trends in Data Mining

P. 480 line 3 from bottom, "and {\it data mining process visualization}." should be "{\it data mining process visualization}, and {\it interactive visual data mining}."

Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

Appendix B. An Introduction to DBMiner

Index

P. 535 2nd column, line 17. "characterization, 162-163" should be "characterization, 21, 162-163" (pointed out by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Sept. 25, 2000)
P. ??? add an index entry: "lift 261" (suggested by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Dec. 19, 2000)
P. ??? add detailed plus an index entry: "Gini index" (suggested by Steven Y. Lee (sleep@sfu.ca), CMPT459 student, on Dec. 19, 2000)