============================================ vis/cheongc.out 06:45:06_Monday_14_May_2001 ============================================ Found SpellChecker.hs file ---- Testing getWords for task 3: -> getWords mary1: ["Mary","had","a","little","lamb","little","lamb","little","lamb","Mary","had","a","little","lamb","its","fleece","was","white","as","snow"] -> getWords mary2: ["Mary","Mry","quite","contrary","how","does","your","garden","grow","With","silver","bells","and","cocle","shells","and","pretty","maids","alll","in","a","row"] -> getWords mary3: ["Mary","spies","some","little","flies","little","flies","little","flies","Mary","spies","sme","little","flys","upon","the","wine","glass","shelves"] ------------------------------------- ---- Testing makePlural for task 4: -> makePlural "maid": "maids" -> makePlural "shelf": "shelves" -> makePlural "fix": "fixes" ------------------------------------- ---- Testing getLongPrefixes for task 5: -> getLongPrefixes smalldict "mry": ["maid","marigold","mary","marry"] -> getLongPrefixes smalldict "cocle": ["cockle"] -> getLongPrefixes smalldict "dog": ["does"] ------------------------------------- ---- Testing minEditDist for task 6: -> minEditDist "amry" "mary": 2 -> minEditDist "dog" "army": 7 -> minEditDist "mary" "marry": 1 ------------------------------------- ---- Testing findGoodSuggestions for Task 6 -> findGoodSuggestions smalldict "amry": ["mary"] -> findGoodSuggestions smalldict "dog": ["does"] -> findGoodSuggestions smalldict "marry": ["marry"] ------------------------------------- ---- Testing doSpellCheck: -> doSpellCheck smalldict mary1: -> doSpellCheck smalldict mary2: on line 1 word "Mry" might in fact be one of: mary on line 4 word "cocle" might in fact be one of: cockle on line 5 word "alll" might in fact be one of: all -> doSpellCheck smalldict mary3: on line 2 word "sme" might in fact be one of: some on line 2 word "flys" might in fact be one of: fly on line 3 word "wine" might in fact be one of: in ------------------------------------- ============================================ src/SpellChecker.hs 06:44:59_Monday_14_May_2001 ============================================ ------------------------------------------------ -- Project: Spell Checker (Project A) -- -- -- -- Subject: Computer Fundamentals A -- -- 433-141 Semester 1, 2001 -- -- University of Melbourne -- -- -- -- Name : Colin Cheong (cheongc@ecr.mu.oz.au) -- -- Enrolment number: 134835 -- -- -- -- Purpose: A simple Haskell Hugs spellchecker-- -- -- -- Acknowledgments: SPCLib.hs and Version 0 -- -- SpellChecker.hs provided by -- -- Martin Sulzmann, sulzmann@cs.mu.oz.au -- -- Alistair Moffat, alistair@cs.mu.oz.au -- -- Department of Computer Science -- -- University of Melbourne -- ------------------------------------------------ ------------------------------------------------------------------------------- import SPCLib import List -- The above module is supplied on the 141 web page. -- Make a copy of it in your directory, but do NOT -- make any changes to it. All of your changes are -- to be made in your copy of this file, SpellChecker.hs -- Some simple testing routines... test1 = doSpellCheck smalldict mary1 test2 = doSpellCheck smalldict mary2 test3 = doSpellCheck smalldict mary3 -- Version 0: This first one is stupid! It just says that every word -- in the input string is a spelling mistake. Really, the only reason -- we have given it to you is as a basis for the versions that you -- have to write... --doSpellCheck :: SpellCheck --doSpellCheck dict text -- = putStr result -- where -- result = presentNicely errors -- errors = [(num, w, noAnswers) | (num, w) <- wordlist] -- wordlist = prepareWords words text ------------------------------------------------------------------------------- {- TASK ONE: reports words as errors if they are not in the dictionary, but still does not offer any alternatives. doSpellCheck :: SpellCheck doSpellCheck dictionary inputtext = putStr result where result = presentNicely errors errors = [(num, w, noAnswers) | (num, w) <- wordlist, elem w dictionary == False] wordlist = prepareWords words inputtext -} ------------------------------------------------------------------------------- {- TASK TWO: reports words as errors if they are not in the dictionary regardless of upper or lower case, but still does not offer any alternatives. doSpellCheck :: SpellCheck doSpellCheck dictionary inputtext = putStr result where result = presentNicely errors errors = [(num, w, noAnswers) | (num, w) <- wordlist, elem (convert2lower w) dictionary == False] wordlist = prepareWords words inputtext convert2lower is a function that takes a word input (variable name "inputword") as a string and converts it to all lower case, so that checks against the dictionary can be made regardless of capitations. -} convert2lower :: String -> String convert2lower inputword = map toLower inputword ------------------------------------------------------------------------------- {- TASK THREE: does not report words with puncuation as errors, in addition to the features from Task Two. doSpellCheck :: SpellCheck doSpellCheck dictionary inputtext = putStr result where result = presentNicely errors errors = [(num, w, noAnswers) | (num, w) <- wordlist, elem (convert2lower w) dictionary == False] wordlist = prepareWords getWords inputtext getWords is a function that takes a string input (variable name "sentence"), and using the prelude function "words", breaks up the string into single words to form "newword". The variable "newword" is then passed through the function "isAlphanumerical" to filter non-alphanumerical characters. -} getWords :: String -> [MyWord] getWords sentence = [filter isAlphanumerical newword | newword <- (words sentence)] -- isAlphanumerical is a function that tests whether a character input -- (variable name "inputchar") is an alphabetical or numerical character, -- and returns a boolean value. isAlphanumerical :: Char -> Bool isAlphanumerical inputchar = isAlpha inputchar || isDigit inputchar ------------------------------------------------------------------------------- {- TASK FOUR: does not report words which form plurals as errors, in addition to the features from Task Two and Three. doSpellCheck :: SpellCheck doSpellCheck dictionary inputtext = putStr result where result = presentNicely errors errors = [(num, w, noAnswers) | (num, w) <- wordlist, elem (convert2lower w) dictionary == False && elem (convert2lower w) (pluraldict dictionary) == False] wordlist = prepareWords getWords inputtext the function "pluraldict" returns the plural variation of dictionary "smalldict". -} pluraldict :: Dictionary -> Dictionary pluraldict dictionary = [makePlural dictionary | dictionary <- dictionary] -- the makePlural function is a messy but nonetheless effective funtion. -- the latter half of the function takes the "plurals" function a.k.a the -- list of singular word endings and their respective plural substitute -- and assigns them the variables "endletter, replaceplural" respectively. -- It then states that if the length of the inputword minus the length -- of a possible endletter is greater than zero and is the same as a -- prospective endletter when this same number is dropped -- to proceed, -- [(ie. "i" or "a" cannot have a plural)] -- [(ie. length "mary" = 4, (endletter, replaceplural) = (y, ies) -- length endletter = 1, therefore (4-1)=3, drop 3 "mary" = "y" == endletter] -- if the conditions are met for the latter half of the function, then the -- first half of the function can proceed. This bit takes the number of -- required characters from the inputcharacter (specified with length inputword -- - length endletter, and concatenates the respective replaceplural value that -- was paired with the endletter. -- Note: The head function at the beginning is required so that the output -- corresponds to the type definition MyWord. makePlural :: MyWord -> MyWord makePlural inputword = head [take ((length inputword) - (length endletter)) inputword ++ replaceplural | (endletter, replaceplural) <- plurals, ((length inputword) - (length endletter)) > 0 && drop ((length inputword) - (length endletter)) inputword == endletter ] -- plurals is a function that contains the list of singular word endings -- and respectively their possible plural substitute. plurals :: [(String, String)] plurals = [("lf","lves"),("y","ies"),("x","xes"),("s","ses"),("","s")] ------------------------------------------------------------------------------- {- TASK FIVE: "doSpellCheck" now offers suggestions for misspelt words via the "getLongPrefixes" function. doSpellCheck :: SpellCheck doSpellCheck dictionary inputtext = putStr result where result = presentNicely errors errors = [(num, wordcheck, (getLongPrefixes dictionary wordcheck)) | (num, wordcheck) <- wordlist, elem (convert2lower wordcheck) dictionary == False && elem (convert2lower wordcheck) (pluraldict dictionary) == False] wordlist = prepareWords getWords inputtext getLongPrefixes is a function that returns all elements of the dictionary that have the same degree of a common prefix to the inputword. It does so by utilising the function "comparison". -} getLongPrefixes :: Dictionary -> MyWord -> [MyWord] getLongPrefixes dictionary inputword = comparison inputword dictionary [] -- comparison is the function that forms the guts of Task 5. It includes -- recursion to loop the inputword minus the last letter (init inputword) -- or the dictionary minus the head word (head dictionary) so that a match -- (and an addition to the "answers" tuple will only be produced when the same -- degree of a common prefix exists between the modified inputword and -- a dictionary word comparison :: MyWord -> Dictionary -> [MyWord] -> [MyWord] comparison wordinput dictionary answers | wordinput == [] = noAnswers | dictionary == [] && answers == [] = comparison (init wordinput) smalldict [] | dictionary == [] && answers /= [] = answers | samebeginning wordinput (head dictionary) = comparison wordinput (drop 1 dictionary) (answers++[head dictionary]) | otherwise = comparison wordinput (drop 1 dictionary) (answers) -- the samebeginning function takes a part or full word, and checks it -- against a dictionary word, to check if they have the same set of initial -- characters samebeginning :: MyWord -> MyWord -> Bool samebeginning inputword dictionaryheadword | inputword == [] = True | dictionaryheadword == [] = False samebeginning (prefix1:restofword1)(prefix2:restofword2) = if prefix1 ==prefix2 then samebeginning restofword1 restofword2 else False ------------------------------------------------------------------------------- {- TASK SIX: "doSpellCheck" now offers suggestions with minimum edit distance via "findGoodSuggestions".-} doSpellCheck :: SpellCheck doSpellCheck dictionary inputtext = putStr result where result = presentNicely errors errors = [(num, wordcheck,(findGoodSuggestions dictionary wordcheck))| (num, wordcheck) <- wordlist, elem (convert2lower wordcheck) dictionary == False && elem (convert2lower wordcheck) (pluraldict dictionary) == False] wordlist = prepareWords getWords inputtext {- findGoodSuggestions relays the dictionary and inputword through to "allmin", along with an initial value of 99,999 as the minimum edit distance. Through the function "allmin" each word of the dictionary is checked against the inputword and if it is found to be lower, it resets the answers list and assigns a new value to the minedit. If it is equal it is simply concatenated to answers. If it is greater is simple does nothing and goes through to the next dictionary word, this process is recursive until there is nothing left in the dictionary. -} findGoodSuggestions :: Dictionary -> MyWord -> [MyWord] findGoodSuggestions dictionary inputword = allmin dictionary inputword [] 99999 allmin :: Dictionary -> MyWord -> [MyWord] -> Int -> [MyWord] allmin dictionary inputword answers minedit | dictionary == [] = answers | minEditDist inputword (head dictionary) < minedit = allmin (drop 1 dictionary) inputword [head dictionary] (minEditDist inputword (head dictionary)) | minEditDist inputword (head dictionary) == minedit = allmin (drop 1 dictionary) inputword (answers++[head dictionary]) minedit | minEditDist inputword (head dictionary) > minedit = allmin (drop 1 dictionary) inputword (answers) minedit {- the function minEditDist forms the first step of filters to attain the minimum edit distance of two words. If none of the initial requirements are met, then the words are passed through to the next function "transform2" along with count of zero so far.-} minEditDist :: MyWord -> MyWord -> Int minEditDist inputword dictword | inputword == dictword = 0 | inputword == [] = length dictword | dictword == [] = length inputword {- (This guard was added at the very end when I found that the maintaining the order of a word does not 100% ensure that its minimal edit distance is found. (eg. "army" + "mary" = 4 if order is maintained because the code does not pick up that the 'm' can be simply moved up the front instead of 'a' being moved first, and then 'm'. Therefore this extra guard eliminates errors by also checking the reverse transformation, and then selecting the minimal of the two.) This is also where the inputword is converted to all lowercase if any uppercase exists in the original inputword. -} |transform2 dictword (convert2lower inputword) 0 < transform2 (convert2lower inputword) dictword 0 = transform2 dictword (convert2lower inputword) 0 |transform2 (convert2lower inputword) dictword 0 < transform2 dictword (convert2lower inputword) 0 = transform2 (convert2lower inputword) dictword 0 -- the only scenario left is that they are equal, in which either one can -- be used. |otherwise = transform2 (convert2lower inputword) dictword 0 {- after the two words have managed to get through the initial filters from minEditDist, they are put through a recursive function of transform2. This function checks methodically whether each character (taken from the head inputword argument) is an element of the dictionary word - and alters the count tally as required. -} transform2 :: MyWord -> MyWord -> Int -> Int transform2 inputword dictword count | inputword == [] = count + length dictword | dictword == [] = count + length inputword -- (character is already in the right place, no editing needed, therefore no -- count needed) | (head inputword) == (head dictword) = transform2 (drop 1 inputword) (drop 1 dictword) count -- (character is in the wrong place, therefore needs to be deleted and -- reinserted so count = +2) | elem (head inputword) dictword = transform2 (drop 1 inputword) (dictdrop inputword (dictword++[' '])) (count +2) -- (character does not exist at all in dictionary word, needs to be deleted -- permanatly, so count = +1 ) | not(elem (head inputword) dictword) = transform2 (drop 1 inputword) dictword (count +1) {- If the character is found to be an element of the dictionary word but is in the wrong place, the character must be taken out of the dictionary word before it can be put into recursion again through "transform2". The critical factor here is that the order remains the same, so an accurate count can be attained. Otherwise the answers may be random depending on where the mistake is in the original wrongly spelt word. To ensure that the order is maintained, a Space character is 'flagged' onto the end of incoming dictionary word, so that once the required character has been dropped it is returned to the original order minus the dropped character from the inputword that was found to be an element of the dictionary word. This is done via the "orderdict" function that rotates the order of the dictword until it detects the Space character (thus signifing the correct original order) and filters is out to return the desired original ordered string. -} dictdrop :: MyWord -> MyWord -> MyWord dictdrop inputword dictword | (head inputword) == (head dictword) = orderdict (drop 1 dictword) | otherwise = dictdrop inputword (drop 1 dictword ++[head dictword]) orderdict :: MyWord -> MyWord orderdict dictword | (head dictword) == ' ' = filter isAlphanumerical dictword | otherwise = orderdict (drop 1 dictword ++[head dictword])