2025-11
12

分析数据本地引用

By xrspook @ 8:25:41 归类于: 烂日记

用了一个晚上的时间,把之前直接用Deepseek或者手动转化出来的Excel数据改为引用Excel文件。这个操作并不难,但因为东西比较多,所以比较烦。首先我需要在一个Excel文件里面建好几个工作表,把每一组数据贴在不同的工作表,这样的好处是读取的时候就不怕会搞混了。在做那些工作表的同时我搞了个目录,把工作表到底是什么内容都安顿好。因为没有最终版本,所以我还没有设置超连接。做目录和超连接都挺简单。接下来的事情就是在每一个直接引用数据的py文件里修改引用方式。

因为DS的神经,所以那些格式化过的数据有些是列表,有些是元组。之所以要用元组,因为这样就一定不会出现字段长度不一致。可以这么说,被DS处理过的数据如果用列表去表达,几乎每一次可视化分析的时候都会被告知字段长度不一致。因为这个,所以后来我在提要求告诉它要作什么图的时候,我直接它先把数据以元组方式格式化,用元组的方式格式化了以后保证字段长度一定是一样的,但关键是会不会少了几个不知道。幸亏几年前我是认真学过Python的,基础的字符串列表元组字典我还是有点懂。我感觉哪怕我不是自己手动把Excel里面的某列数据转化为列表,而是把它贴掉某个地方在线转化,也不至于在转化之后东西丢失。DS是怎么做到把我的数据弄丢了呢?

之前用Python操作Excel的时候,打开Excel文件用的是xlwings,之所以用那个是因为那本叫《超简单 用Python让Excel飞起来》的书主要用的就是那个。那个跟其它库相比我觉得差异主要在于支持打开的Excel文件的后缀比较多。这样我就不需要针对这个文件用这种打开方式那个文件用另外一款,但如果我用了xlwings,但是我的程序不通过,做到一半就卡住了,那么Excel文件就会处于一个打开的状态。我只能去任务管理器那里手动把已经打开的文件关闭,否则我没办法继续下去。之前我好像没有试过光是读取Excel文件里面的数据,不把加工后的东西写回到Excel里。如果要进行Excel文件的写入,我感觉有必要把Excel文件打开,但打开了以后程序卡住无法进行下去,难以避免得有一个关闭的过程,估计可以写一段代码,把Excel通通关闭掉。只是我当初没有干这个,只是很老实地手动操作。

如果我只是读取Excel文件的某些内容,不往那个文件里面写入数据,是不是意味着或许我可以用一种类似ADO的方式读取文件数据,不需要进行实际的打开和关闭呢?

这一次我需要用pandas读取Excel里面的数据,但我不需要把加工后的数据写入到Excel,因为只有两个结果,一个是生成png图,另外一个是一些相关分析的结论。相关分析的结论我可以直接在终端里拷贝,又或者我可以直接生成txt文件。这次我用pandas直接读取,发现的确可以,而且貌似也没有那种程序虽然被卡住,但是Excel没有被正常关闭的问题。据说用panda读取Excel数据,实际上panda是引用了其它库,所以如果要在pandas里实现这个功能,要安装其它库才行。我在一开始的时候只引用了panda,就可以做到读取Excel文件,不需要把pandas引用的那些库也都引用一遍,但可以肯定的是其它库估计我都已经安装过了。

一开始尝试阶段会比较慢,后来熟了之后速度加快了,但因为我是一个完美主义者,所以对一些细节的把控还是翻来覆去,纠结了好些时间,比如网格线到底要用半透明的还是直接不透明。

但总算用一个晚上的时间,我就实现了我想做的全部。

2020-06
5

随机单词扎堆成文

By xrspook @ 14:47:54 归类于: 扮IT

从某本书里随机找单词拼出句子段落。重点是把握好前缀和后缀,前缀要捆绑查找,后缀要关联对应。

Exercise 8: Markov analysis: Write a program to read a text from a file and perform Markov analysis. The result should be a dictionary that maps from prefixes to a collection of possible suffixes. The collection might be a list, tuple, or dictionary; it is up to you to make an appropriate choice. You can test your program with prefix length two, but you should write the program in a way that makes it easy to try other lengths. Add a function to the previous program to generate random text based on the Markov analysis. Here is an example from Emma with prefix length 2: He was very clever, be it sweetness or be angry, ashamed or only amused, at such a stroke. She had never thought of Hannah till you were never meant for me?” “I cannot make speeches, Emma:” he soon cut it all himself. For this example, I left the punctuation attached to the words. The result is almost syntactically correct, but not quite. Semantically, it almost makes sense, but not quite. What happens if you increase the prefix length? Does the random text make more sense? Once your program is working, you might want to try a mash-up: if you combine text from two or more books, the random text you generate will blend the vocabulary and phrases from the sources in interesting ways. Credit: This case study is based on an example from Kernighan and Pike, The Practice of Programming, Addison-Wesley, 1999. You should attempt this exercise before you go on; then you can download my solution from http://thinkpython2.com/code/markov.py. You will also need http://thinkpython2.com/code/emma.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import string
import random
from collections import defaultdict
def set_book(fin1,num):
    d = defaultdict(list) # 默认键值为列表
    l = []
    header = ()
    for line in fin1:
        line = line.replace('-', ' ')
        for word in line.rstrip().split(): # 空格换行为分割,单词存入列表
            l.append(word)
    for i in range(len(l)-num): # 以列表序号逐一推进方式建立字典
        header = (l[i-1],) # 元组header为前缀,做键
        for j in range(i,i+num-1):
            header += (l[j],)
            j += 1
        if l[i+num-1] not in d[header]:
            d[header].append(l[i+num-1]) # 列表后缀做键值
    return d
def next(start, book):
    return random.choice(book[start])
fin1 = open('emma.txt', encoding='utf-8')
prefix_num = 3 # 前缀个数
suffix_num = 100 # 后缀个数
book = set_book(fin1,prefix_num)
start = random.choice(list(book.keys())) # 随机前缀开头
final =  start
for i in range(suffix_num): # 截取最后几个单词为前缀找后缀
    final += (next(final[len(final)-prefix_num:], book),) 
for word in final:
    print(word, end=' ')
# reigns alone. A very proper compliment! and then follows the application, 
# which I think, my dear, you said you had a great deal happier if she had no 
# intellectual superiority to make atonement to herself, or frighten those 
# who might hate her into outward respect. She had never seen her look so well, 
# so lovely, so engaging. There was consciousness, animation, and warmth; 
# there was every appearance of its being all in proof of how much he was 
# in love with, how to be able to return! I shall try what I can do. 
# Harriet's features are very delicate, which makes a likeness
2020-06
5

上路

By xrspook @ 8:29:07 归类于: 烂日记

我已经不记得对上一次,写python是什么时候的事了,感觉好遥远,起码一个多月以前。具体时间,我实在记不清了,但是我依然记得,上一次我卡在了哪里,我应该在哪里重新开始。当时我看到的是第14章,但实际上第13章的内容我还没有全部消化掉,前面的那些我花的时间还多一点,后面的那些简直就是囫囵吞枣。第13章最后一道练习题,我觉得自己是无论如何不会去想的了,因为我根本不知道题目到底要我做些什么,之所以这样,大概是因为我的数学学得不好,所以我无法理解题目的意思。但是倒数第二道题目,我觉得自己还是可以做到的。

那是一道从一本书里随机的选择某些单词组成一些可能有意思的句子。随机拼凑句子语意当然乱来,但是如果能保证单词前面和后面相对稳定,那么起码单词组合起来会有某些意思,虽然可能句子的意思还是很无厘头。随着前面后面单词的整体性加强,整个句子的意思也会越发明了。这其实就是一个靠着前缀找后缀的运行模式。开始的时候默认的前缀是两个单词。由前面的两个单词找出后面一个单词,然后再利用后面的两个单词找下一个单词,如此类推。这种方法理论上可以扩展为结合前面N个单词找后面一个单词,然后再撇掉第1个单词,继续找下一个。思路不复杂,但是该用什么实现这个呢?的确是需要点心思的是Think Python那本书没有把所有方法都告诉你,在最终写出这道题目的解答之前,我看过他们的答案,但我觉得自己没看懂,因为里面加入了很多书里之前根本没说过的东西。里面默认带入了很多他们认为你必须知道,所以无需解释的东西。如果这是一本传统的教程,这简直让人日子没法过了!做这本书的习题的时候,我也吐槽过无数次,他们会无底线地超纲。但也正是因为这些说来就来的超纲,让你除了要看这本书以外,你还必须动脑筋,还必须自己手动去搜索解决方法,找那些他们觉得你一定得懂,但实际上他们又没说的东西。最终我写出了我想要的东西,至于结果跟他们的差多远,我没有比较。很多人说python是一种类似于乐高积木的编程,是一个模块叠加一个模块的。但是里面的递归却让我很头晕,所以当参考答案用上全局函数,用上递归的时候,我选择的依然是循环,依然是在主函数里输出那些东西,同时也在一句话里面嵌套了好几个我想做的事。我当然可以把我嵌套的东西单独出来定制一个函数,但是一句话能说清的事情我不想再写几行,虽然在用的时候,多写几行可能会调取得方便一些。现在我之所以不这么干,是因为我要实现的功能暂时来说还很简单。我用一句话就实现了,只不过嵌套了好几个参数而已,Excel的函数也是这么玩的。虽然有些时候,我也会狠狠地吐槽那些几万公里那么长的Excel函数公式。

我从来没想过,自己能在半天之内解决一个之前我曾经想过但是却没想出解决办法的问题。

2020-04
24

用两天琢磨一道题

By xrspook @ 20:32:30 归类于: 扮IT

前面还在沾沾自喜我写出来的脚本运行效率战胜了参考答案,但这道题目我是看着参考答案都不知道他们在说什么。如果只是一个词,我的确可以列举出它一次减少一个字母可以出现的所有可能,但怎么知道上一层可能和这一层的哪个配套???我花了2天时间去研究、消化答案。一边搞清楚答案为什么这样,另一边考虑有没有其它容易吃透的表达方式。这道题之所以让我非常纠结,根本的原因是我想不透到底我可以用什么手段实现。没有可以实现的逻辑,就不会有可行的编程。

Exercise 4: Here’s another Car Talk Puzzler (http://www.cartalk.com/content/puzzlers): What is the longest English word, that remains a valid English word, as you remove its letters one at a time? Now, letters can be removed from either end, or the middle, but you can’t rearrange any of the letters. Every time you drop a letter, you wind up with another English word. If you do that, you’re eventually going to wind up with one letter and that too is going to be an English word—one that’s found in the dictionary. I want to know what’s the longest word and how many letters does it have? I’m going to give you a little modest example: Sprite. Ok? You start off with sprite, you take a letter off, one from the interior of the word, take the r away, and we’re left with the word spite, then we take the e off the end, we’re left with spit, we take the s off, we’re left with pit, it, and I. Write a program to find all words that can be reduced in this way, and then find the longest one. This exercise is a little more challenging than most, so here are some suggestions: You might want to write a function that takes a word and computes a list of all the words that can be formed by removing one letter. These are the “children” of the word. Recursively, a word is reducible if any of its children are reducible. As a base case, you can consider the empty string reducible. The wordlist I provided, words.txt, doesn’t contain single letter words. So you might want to add “I”, “a”, and the empty string. To improve the performance of your program, you might want to memoize the words that are known to be reducible. Solution: http://thinkpython2.com/code/reducible.py.

最终,我觉得自己总算消化了,顺便画了个思维导图帮助大家理解到底分解到什么程度叫做完成,什么状态叫做分解失败。[”]和[]是两种不同的东西!!!!!!

is_reducible()是最关键的函数,memos用在这里,memos初始设置了known[”] = [”]也很关键,这是个守卫模式,没有守卫is_reducible()根本没法玩。这个脚本里的5个函数,除了一开始的创建字典函数,其余函数都可以单独测试,把一个固定单词放进去脚手架测试,可以帮助理解。cut_letter(),is_reducible()和all_reducible()这三个函数最终返回的都是列表,它们的样式都是类似的。希望我理解过程中的注释能帮助到有需要的人。PS一句:参考答案的打印效果让人很晕,我修改版的打印效果很美丽:)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
from time import time
def set_dict(fin):
    d = {}
    for line in fin:
        word = line.strip()
        d[word] = 0
    for word in ['a', 'i', '']:
        d[word] = 0
    return d
def cut_letter(word, d): # 生成子单词,返回列表
    l = []
    for i in range(len(word)):
        new_word = word[:i] + word[i+1:]
        if new_word in d:
            l.append(new_word)
    return l # ['']长度为1,[]长度为0,无子词不能分解时返回[],'a'返回['']
def is_reducible(word, d): # 判断能否生成无限子单词,返回列表
    if word in known: # 守卫模式下,''空字符串被列入初始字典,不列入永远会被递归到[],无果
        return known[word]
    # if word == '': # 不用memos的时候,需要加入这句守卫
    #     return ['']
    l = []
    for new_word in cut_letter(word, d):
        if len(is_reducible(new_word, d)) > 0:
            l.append(new_word)
    known[word] = l
    return l
def all_reducible(d): # 收集所有无限子单词的单词,返回列表
    l = []
    for word in d:
        if len(is_reducible(word, d)) > 0: # 有列表,即有无限子单词
            l.append((len(word), word)) # 列表含有N个元组,元组里有2个元素,1为单词的字母数量,2为单词本尊
    new_l = sorted(l, reverse = True) # 每次减少一个字母,单词的字母越多当然就能降解出越多层了
    return new_l
def word_list(word): # 打印单词及子单词
    if len(word) == 0: # 最后一个进入is_reducible()的是[''],对应l[0]为无,打印结束
        return
    print(word)
    l = is_reducible(word, d) # 因为是被鉴定过词汇表里的词,所以必定有无限子单词
    word_list(l[0]) # 子单词有多个时只选第1个
known = {} # memos实际上只在is_reducible()起作用,除了提高效率,还能用作守卫
known[''] = [''] # 因为is_reducible()返回的是列表,所以即便是空字符串,键值也必须是列表!
fin = open('words.txt')
start = time()
d = set_dict(fin) # 普通的字典,键为单词,键值为0
words = all_reducible(d) # 列表,元组,2元素
for i in range(5):
    word_list(words[i][1]) # 列表里第某个元组的第2个元素
end = time()
print(end - start)
# complecting
# completing
# competing
# compting
# comping
# coping
# oping
# ping
# pig
# pi
# i
# twitchiest
# witchiest
# withiest
# withies
# withes
# wites
# wits
# its
# is
# i
# stranglers
# strangers
# stranger
# strange
# strang
# stang
# tang
# tag
# ta
# a
# staunchest
# stanchest
# stanches
# stances
# stanes
# sanes
# anes
# ane
# ae
# a
# restarting
# restating
# estating
# stating
# sating
# sting
# ting
# tin
# in
# i
# 0.6459996700286865
# 无memos 1.5830001831054688, 有memos 0.6459996700286865
2020-04
22

字典转元组

By xrspook @ 13:15:37 归类于: 扮IT

不搞复杂的,不用超纲的方法做感觉上很简单的事其实不简单。搞明白这个练习后,列表、字典、元组的相爱相杀我算是有点明白了。感谢那个我觉得过于复杂的参考答案,逼我折腾出了我自己的版本。

开心!居然习题1就用上了zip这个这章书最后才提到的大招。字典的键值对互换变得如此简单,我的脑洞又开大了。

Exercise 1: Write a function called most_frequent that takes a string and prints the letters in decreasing order of frequency. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at http://en.wikipedia.org/wiki/Letter_frequencies. Solution: http://thinkpython2.com/code/most_frequent.py.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def most_frequent(sth):
    d = {}
    for letter in sth: # 字符串转为字符映射到频率的字典
        d[letter.lower()] = d.get(letter.lower(), 0) + 1 # 大写的你给我降为小写
    t = tuple(zip(d.values(), d.keys())) # 用zip互换字典的键和键值,生成元组(也可以生成列表,但生成新字典你会哭死)
    return sorted(t, reverse = True) # 以第一元素降序输出
sth = 'This chapter presents one more built-in type, the tuple, and then shows how lists, dictionaries, and tuples work together. I also present a useful feature for variable-length argument lists, the gather and scatter operators.'
t = most_frequent(sth)
for item in t:
    print(item) # 原汁原味输出元组好,因为空格不用''圈着都不知道那里有东西
# (33, ' ')
# (25, 'e')
# (23, 't')
# (16, 's')
# (15, 'r')
# (14, 'a')
# (11, 'o')
# (11, 'n')
# (10, 'i')
# (10, 'h')
# (9, 'l')
# (7, 'u')
# (7, 'p')
# (5, ',')
# (4, 'g')
# (4, 'd')
# (3, 'w')
# (3, 'f')
# (3, 'c')
# (2, 'm')
# (2, 'b')
# (2, '.')
# (2, '-')
# (1, 'y')
# (1, 'v')
# (1, 'k')
© 2004 - 2026 我的天 | Theme by xrspook | Power by WordPress