2020-06
5

随机单词扎堆成文

By xrspook @ 14:47:54 归类于: 扮IT

从某本书里随机找单词拼出句子段落。重点是把握好前缀和后缀,前缀要捆绑查找,后缀要关联对应。

Exercise 8: Markov analysis: Write a program to read a text from a file and perform Markov analysis. The result should be a dictionary that maps from prefixes to a collection of possible suffixes. The collection might be a list, tuple, or dictionary; it is up to you to make an appropriate choice. You can test your program with prefix length two, but you should write the program in a way that makes it easy to try other lengths. Add a function to the previous program to generate random text based on the Markov analysis. Here is an example from Emma with prefix length 2: He was very clever, be it sweetness or be angry, ashamed or only amused, at such a stroke. She had never thought of Hannah till you were never meant for me?” “I cannot make speeches, Emma:” he soon cut it all himself. For this example, I left the punctuation attached to the words. The result is almost syntactically correct, but not quite. Semantically, it almost makes sense, but not quite. What happens if you increase the prefix length? Does the random text make more sense? Once your program is working, you might want to try a mash-up: if you combine text from two or more books, the random text you generate will blend the vocabulary and phrases from the sources in interesting ways. Credit: This case study is based on an example from Kernighan and Pike, The Practice of Programming, Addison-Wesley, 1999. You should attempt this exercise before you go on; then you can download my solution from http://thinkpython2.com/code/markov.py. You will also need http://thinkpython2.com/code/emma.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import string
import random
from collections import defaultdict
def set_book(fin1,num):
    d = defaultdict(list) # 默认键值为列表
    l = []
    header = ()
    for line in fin1:
        line = line.replace('-', ' ')
        for word in line.rstrip().split(): # 空格换行为分割,单词存入列表
            l.append(word)
    for i in range(len(l)-num): # 以列表序号逐一推进方式建立字典
        header = (l[i-1],) # 元组header为前缀,做键
        for j in range(i,i+num-1):
            header += (l[j],)
            j += 1
        if l[i+num-1] not in d[header]:
            d[header].append(l[i+num-1]) # 列表后缀做键值
    return d
def next(start, book):
    return random.choice(book[start])
fin1 = open('emma.txt', encoding='utf-8')
prefix_num = 3 # 前缀个数
suffix_num = 100 # 后缀个数
book = set_book(fin1,prefix_num)
start = random.choice(list(book.keys())) # 随机前缀开头
final =  start
for i in range(suffix_num): # 截取最后几个单词为前缀找后缀
    final += (next(final[len(final)-prefix_num:], book),) 
for word in final:
    print(word, end=' ')
# reigns alone. A very proper compliment! and then follows the application, 
# which I think, my dear, you said you had a great deal happier if she had no 
# intellectual superiority to make atonement to herself, or frighten those 
# who might hate her into outward respect. She had never seen her look so well, 
# so lovely, so engaging. There was consciousness, animation, and warmth; 
# there was every appearance of its being all in proof of how much he was 
# in love with, how to be able to return! I shall try what I can do. 
# Harriet's features are very delicate, which makes a likeness
2020-06
5

上路

By xrspook @ 8:29:07 归类于: 烂日记

我已经不记得对上一次,写python是什么时候的事了,感觉好遥远,起码一个多月以前。具体时间,我实在记不清了,但是我依然记得,上一次我卡在了哪里,我应该在哪里重新开始。当时我看到的是第14章,但实际上第13章的内容我还没有全部消化掉,前面的那些我花的时间还多一点,后面的那些简直就是囫囵吞枣。第13章最后一道练习题,我觉得自己是无论如何不会去想的了,因为我根本不知道题目到底要我做些什么,之所以这样,大概是因为我的数学学得不好,所以我无法理解题目的意思。但是倒数第二道题目,我觉得自己还是可以做到的。

那是一道从一本书里随机的选择某些单词组成一些可能有意思的句子。随机拼凑句子语意当然乱来,但是如果能保证单词前面和后面相对稳定,那么起码单词组合起来会有某些意思,虽然可能句子的意思还是很无厘头。随着前面后面单词的整体性加强,整个句子的意思也会越发明了。这其实就是一个靠着前缀找后缀的运行模式。开始的时候默认的前缀是两个单词。由前面的两个单词找出后面一个单词,然后再利用后面的两个单词找下一个单词,如此类推。这种方法理论上可以扩展为结合前面N个单词找后面一个单词,然后再撇掉第1个单词,继续找下一个。思路不复杂,但是该用什么实现这个呢?的确是需要点心思的是Think Python那本书没有把所有方法都告诉你,在最终写出这道题目的解答之前,我看过他们的答案,但我觉得自己没看懂,因为里面加入了很多书里之前根本没说过的东西。里面默认带入了很多他们认为你必须知道,所以无需解释的东西。如果这是一本传统的教程,这简直让人日子没法过了!做这本书的习题的时候,我也吐槽过无数次,他们会无底线地超纲。但也正是因为这些说来就来的超纲,让你除了要看这本书以外,你还必须动脑筋,还必须自己手动去搜索解决方法,找那些他们觉得你一定得懂,但实际上他们又没说的东西。最终我写出了我想要的东西,至于结果跟他们的差多远,我没有比较。很多人说python是一种类似于乐高积木的编程,是一个模块叠加一个模块的。但是里面的递归却让我很头晕,所以当参考答案用上全局函数,用上递归的时候,我选择的依然是循环,依然是在主函数里输出那些东西,同时也在一句话里面嵌套了好几个我想做的事。我当然可以把我嵌套的东西单独出来定制一个函数,但是一句话能说清的事情我不想再写几行,虽然在用的时候,多写几行可能会调取得方便一些。现在我之所以不这么干,是因为我要实现的功能暂时来说还很简单。我用一句话就实现了,只不过嵌套了好几个参数而已,Excel的函数也是这么玩的。虽然有些时候,我也会狠狠地吐槽那些几万公里那么长的Excel函数公式。

我从来没想过,自己能在半天之内解决一个之前我曾经想过但是却没想出解决办法的问题。

© 2004 - 2020 我的天 | Theme by xrspook | Power by WordPress