2020-04
18

死磕二分法搜索

By xrspook @ 15:00:07 归类于: 扮IT

我是看着题目的中文版做题的

练习10:要检查一个单词是不是在上面这个词汇列表里,你可以使用 in 运算符,但可能会很慢,因为这个 in 运算符要从头到尾来搜索整个词汇表。我们知道这些单词是按照字母表顺序组织的,所以我们可以加速一下,用一种对折搜索(也叫做二元搜索),这个过程就和你在现实中用字典来查单词差不多。你在中间部分开始,看看这个要搜索的词汇是不是在中间位置的前面。如果在前面,就又对前半部分取中间,继续这样来找。当然了,不在前半部分,就去后半部分找了,思路是这样的。不论怎样,每次都会把搜索范围缩减到一半。如果词表包含了113809个单词,最多就是17步就能找到单词,或者能确定单词不在词汇表中。那么问题来了,写一个函数,名为 in_bisect,接收一个整理过的按照字母顺序排列的列表,以及一个目标值,在列表中查找这个值,找到了就返回索引位置,找不到就返回空。

做到死去活来词语在词汇表里有索引正确,没有时却会疯掉的时候我不得不去看答案,看到答案后傻眼了,答案对单词的判断只有True和False,再去找原题,我那个去,题目改了!不要求索引了好吗!

Exercise 10: To check whether a word is in the word list, you could use the in operator, but it would be slow because it searches through the words in order. Because the words are in alphabetical order, we can speed things up with a bisection search (also known as binary search), which is similar to what you do when you look a word up in the dictionary (the book, not the data structure). You start in the middle and check to see whether the word you are looking for comes before the word in the middle of the list. If so, you search the first half of the list the same way. Otherwise you search the second half. Either way, you cut the remaining search space in half. If the word list has 113,809 words, it will take about 17 steps to find the word or conclude that it’s not there. Write a function called in_bisect that takes a sorted list and a target value and returns True if the word is in the list and False if it’s not. Or you could read the documentation of the bisect module and use that! Solution: http://thinkpython2.com/code/inlist.py.

又纠结一番后我终于写出了一句“first > last”返回例外情况,终于,世界被拯救了!记录索引和不记录索引很不一样啊,按照参考答案的解法,i即便返回也永远是1,索引无能。纠结是有好处的,让我明白到二分法搜索有多么的高效,简直甩while循环几十条街,但如果真索引的话,估计我会很懒地直接用list.index(),虽然用之前必须用in历遍列表,判断是否存在。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import time
def in_bisect(library, first, last, myword): # 二分法搜索,10万数据查询最多只需不到20步
    if first > last: # 这是一句拯救了我的条件
        return None
    else:
        mid = (first + last)//2
        if myword == library[mid]:
            return mid
        elif library[mid] > myword:
            return in_bisect(library, first, mid-1, myword)
        else:
            return in_bisect(library, mid+1, last, myword)
myword = 'zoo' # input('myword is: ')
i = 0
library = []
fin = open('words.txt')
for line in fin:
    word = line.strip()
    library.append(word)
library.sort()
start = time.time()
 
# j = 0
# while i < len(library) - 1: # 我的脑洞第一反应用的循环
#     if myword == library[i]:
#         j = i
#         break
#     i += 1
# if j == 0:
#     print('myword is not in library')
# else:
#     print('index =', j)
 
# if myword in library: # 伟大列表自带的查询索引号,但先得确定单词在那里
#     i = library.index(myword, 0, len(library)-1)
# if i == 0:
#     print('myword is not in library')
# else:
#     print('index =', i)
 
if in_bisect(library, 0, len(library), myword) == None: 
    print('myword is not in library')
else:
    print('index =', in_bisect(library, 0, len(library), myword))
end = time.time()
print(end-start)
# myword is not in library while 0.07   index 0.003  bisect 0.001
# apple 4450               while 0.003  index 0.001  bisect 0.001
# zoo 113707               while 0.07   index 0.005  bisect 0.001
# while,index和bisect没有对比就没有伤害
© 2004 - 2024 我的天 | Theme by xrspook | Power by WordPress