Week 12
Quiz
Homework
o Who is planning on using input data file and standard format?
o Agree no changes without everyone’s ok
Trees – cont.
o Algorithm for node deletion
o Show example program
Searching
o Very common operation
o Linear searches
o Go down list sequentially
o Best case time O(1) – don’t really factor these
o Worst case O(n)
o Average case
o On average n*(n+1)/2 divided by n = O(n)
o Binary searches
o Must have ordered list/array
o Best case O(1)
o Worst case O(log n) (really O(log n+1)
o Average case
o To motivate, show a binary tree and note that most of the nodes are in the bottom two depths, so average is O(log n)
o Hash tables
o Problem: have data to search for but not ‘contiguous’
o E.g. part numbers (book uses 000, 100, 200, 300, …, 4900)
o Want to store in an array
o Define a hash function that operates on a key
o Hash function returns an index into the hash array
§ Must return valid index
o Collisions occur when a two keys return the same hash value
§ In example above, 350, 325 may both return 3
o Resolve collision by storing object in the next open array index
§ open-address hashing or open addressing
§ when go off the end of the array, must go to 0
o Show a high level algorithm
o Non-integer keys
§ Have function that converts key to integer, then hash
§ Java has .hashCode() method for all objects
o Abstract data type hashtable
§ Methods
· Constructor with the size
· numfree() (book calls capacity)
· getsize()
· put(key, element)
o replaces objects with same key
· containskey(key)
· get(key)
· remove(key)
§ definition
public class hashtable
{
private Object[] data;
private Object[] keys;
private int manyitems;
private boolean[] hasbeenused;
}
§ private methods
· hash(key)
· nextIndex(i)
· findIndex(key)
o like get but returns an index
o returns -1 if can’t find
§ do hash method for put()
· first use findIndex() to see if already there
· use if (manyitems < data.length) to determine if there’s space
§ do remove, using findIndex()
· return the object and set the array values = null
§ do findIndex
· basically hash and then go through each item sequentially as long has hasbeenused is true
· ask why use hasbeenused rather than simply empty position? – (hint, some item may have been removed!)
· need to keep counter so that can test if < data.length for case when table is full (so don’t loop forever)
o Choosing a hash function
§ Want to avoid clustering (lots of collisions)
§ Division hash function (e.g. mod) want size = prime # = 4k +3
§ Mid-square hash (times itself, take middle digits
§ Multiplicative hash (time #<1, take first n digits)
§ Avoiding collisions:
· Double hashing
· Compute a hash2 to be the index increment to try next
· Must avoid coming back to same spot
· See text for using prime numbers
· Chained hashing
o Create a linked list at each hash index!
o Run-time analysis
§ See text p. 559
§ Run time is proportional to the load factor or percent full
In class (perhaps)