我是如何用这段python代码过滤一个副词的,它是有效的,但只过滤了一半的单词,剩下的就剩下了,我不知道原因
首先,我创建空字典来统计段落中的单词,并为段落创建变量
D={}
x="Wow, first off, a huge congrats. You're about to start the final project of the course. Amazing job making it all the way here. I hope you've been having as much fun as I have on this journey. We're going to be working on our final project using Jupyter notebooks. You might be starting to feel pretty confident using them. But remember if you have any issues you can always ask for help in the discussion forums. Okay, before we dive in, we're going to chat a little bit about what you'll be doing for the project, it's going to be really fun. The goal of the project is to create a word cloud. A word cloud is an image that's made up of different sized words. Usually the sizes of the words are determined by how many times each word appears in a specific text. To create the image itself, we're going to use an external Python module called creatively Word cloud. [LAUGH] Your job is to create a script that would go through the text and count how many times each word appears. We've done this a few times already, any ideas how we should tackle this one? If you're thinking of using a dictionary to count how many times each word appears, then. Ding ding ding, good answer! You're going to prepare a dictionary and use that as a parameter for the word cloud module, not too tricky, right? I think you can handle a little more, so two things you have to watch out for. One, punctuation marks, before counting the frequency of the words, you need to make sure that there are no punctuation marks in the text. If you don't, a string example with a comma at the end would be different from a string example with a dot at the end. So before you put words into the dictionary, you have to clean up the text to remove any punctuation marks. And the second thing we want to keep our word cloud interesting. Certain words in our language crop up a lot and if we include all of these we're going to get a pretty dull word cloud. Think about words like, the, two or if. They usually appear a whole lot in text but aren't too relevant to the text's overall message. We want our Cloud to show words that are relevant to the text we're using for the input. So you need to find a way to exclude irrelevant or uninteresting words when processing the text. For the input, you're going to upload a text file. You can choose any text file you like for your input. It could be the contents of a website, a full novel or even everything that one author has ever written. You just need to make sure that it's one text file, so that it can be processed by the code. Okay, before jumping into the project, remember you can re-watch this video If something isn't clear. Yep, I'm starting to sound like a broken record, but this time it's extra extra important. This final project is the real test of how much you've gotten your head around and can highlight areas you need to brush up on. So we want you to be super clear on what you need to do on that point. You'll find an overview of what you have to do in the next reading. Can you guess the best way of tackling this problem? Yep, you got it, our step-by-step approach that we outlined earlier. Understand the problem statement, research available options, plan your approach, write your code and finally execute. Okay, feeling good? Ready to go? Remember, you know this stuff and you've totally got it."
我想过滤掉由这些符号组成的单词
t= "!()-[]{};:\,<>./?@#$%^&*_~'"
这是我写的第一个代码,它将段落设置为小写,并将其拆分成列表
b=x.lower()
y=b.split()
print(y)
这是第二个代码,它用符号过滤单词,但它只过滤其中的一部分,除非我再执行几次。“
for i in y:
for o in range(len(t)):
if t[o] in i:
if i not in y:
continue
W=y.index(i)
y.pop(W)
#print(y)
print(len(y))
这些是我在符号后面过滤掉的单词
uninteresting_words = ['a', "the", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
这就是我为它写的代码,但这段代码也不会过滤掉所有的单词,除非我执行几次
count=0
for n in y:
for n in uninteresting_words:
h=y.index(n)
count+=1
y.pop(h)
print(count)
#print(y)
print(len(y))
这是将单词添加到dic中的代码
for words in y:
D[words]=D.get(words,0)+1
#print(D)
转载请注明出处:http://www.dannyxu.com/article/20230526/2075171.html
随机推荐
-
我是如何用这段python代码过滤一个副词的,它是有效的,但只过滤了一半的单词,剩下的就剩下了,我不知道原因
首先,我创建空字典来统计段落中的单词,并为段落创建变量D={} x=Wow, first off, a huge congrats. Youre about to start the final project of the course....
-
如何在单击时刷新验证码。我用的是thytanium captcha for laravel 5.4
当我点击刷新链接时,我想刷新验证码。目前我正在使用以下代码,有人能建议我应该做什么修改才能使用刷新它。span id=capimg style=margin-bottom: 20px; display: inline-block; img ...
-
Css和AngularJS :如果一个类是用ng-AngularJS生成的,我如何在css选择器中嵌套两个类?
如果其中一个类是在angular js中使用ng-class生成的,我如何在css选择器中嵌套两个类。我的代码如下所示input class=form-control class-inputs ng-class={active : cond...
-
我是如何零基础开始能写Python爬虫的
刚开始接触爬虫的时候,简直惊为天人,十几行代码,就可以将无数网页的信息全部获取下来,自动选取网页元素,自动整理成结构化的文件。 利用这些数据,可以做很多领域的分析、市场调研,获得很多有价值的信息。这种技能不为我所用实在可惜,于是果断开始...
-
我应该如何用python创建隐式评分矩阵?
我正在研究一个使用Gowalla数据集的推荐系统。但是,数据集没有位置评级,因此我必须将该数据生成一个隐式评级,对于访问过该位置的人,该数据的值为1‘,对于从未访问过该位置的人,该数据的值为0’。我应该如何使用python创建该矩阵?Thi...
-
我如何在一个表单中设置字段的值,该表单是我用Angular从另一个组件获取的?
我在Angular中迈出了我的第一步,它在开始使用基础知识时有点困难。基本上我有一个人的列表,我想导航到其中一个人的详细信息来修改他们,但我无法获得他们的ID显示在表单上。我已经设法获得了url的id。这是一种..。component1 h...
-
如何用可选函数(函数式编程)替换我的旧Java代码?
如何使用Java8 Optionals (函数式编程)替换以下代码?ClassA classA = dbService.findByA(a); if (classA == null) { classA = dbService.fi...
-
我正在考虑用django做一个网页应用,这基本上是一个墙纸生成器,但是在heroku上存储和如何显示图像呢?
基本上我会使用pillow来创建随机的壁纸,但我从来没有在django中处理过图像,所以我应该如何在heroku上托管时显示生成的image....because呢?我知道人们在AWS之类的东西上存储图像,但我应该如何将从枕头生成的图像自动...
-
如何编写这段python代码。我试过了,但不管用
from seaborn import load_dataset df = load_dataset(diamonds) df.head()复制例如,我想创建一个钻石列表,其中每个元素都是一个字典,关键字是克拉、切割、颜色、清晰度、深度...
-
如何允许用户编辑帖子?我用的是laravel 8
我正在尝试允许用户编辑帖子,但是我一直收到这个错误:Property [caption] does not exist on this collection instance. (View: C:\xampp\htdocs\myprojec...