项目1 爬取学生信息

1.2 Flask Web网站随堂测验

1、import flask app=flask.Flask(__name__) @app.route("/") def index(): try: fobj=open("index.htm","rb") data=fobj.read() fobj.close() return data except Exception as err: return str(err) if __name__=="__main__": app.run() index.htm文件 <h1>Welcome Python Flask Web</h1> It is very easy to make a website by Python Flask. 那么访问http://127.0.0.1:5000可以看到index.htm的结果?

1.3 GET方法访问网站随堂测验

1、服务器程序可以接受get与post的提交信息 import flask app=flask.Flask(__name__) @app.route("/",____________________) def index():     try:         province=flask.request.values.get("province") if "province" in flask.request.values  else ""         city = flask.request.values.get("city") if "city" in flask.request.values else ""         note = flask.request.values.get("note") if "note" in flask.request.values else ""         return province+","+city+"\n"+note     except Exception as err:         return str(err) if __name__=="__main__":     app.run() 缺失的语句是
    A、methods=["GET","POST"]
    B、method=["GET","POST"]
    C、methods=["POST"]
    D、method=["POST"]

1.4 POST方法向网站发送数据随堂测验

1、编程客户端client.py程序如下: import urllib.parse import urllib.request url="http://127.0.0.1:5000" try:     province= urllib.parse.quote("广东")     city= urllib.parse.quote("深圳")     data="province="+province+"&city="+city     ___________________________     ____________________________     html = html.read()     html = html.decode()     print(html) except Exception as err:     print(err) 服务器server.py程序 import flask app=flask.Flask(__name__) @app.route("/",methods=["POST"]) def index():     try:         province=flask.request.form.get("province") if "province" in flask.request.form else  ""         city = flask.request.form.get("city") if "city" in flask.request.form else ""         return province+","+city     except Exception as err:         return str(err) if __name__=="__main__":     app.run() 缺失的语句是
    A、data=data.decode();html=urllib.request.urlopen("http://127.0.0.1:5000",data=data)
    B、data=data.encode();html=urllib.request.urlopen("http://127.0.0.1:5000",data=data)
    C、data=data.encode();html=urllib.request.urlopen("http://127.0.0.1:5000?data="+data)
    D、data=data.decode();html=urllib.request.urlopen("http://127.0.0.1:5000?data="+data)

1.5 Web下载文件随堂测验

1、服务器程序可以下载文件"图像.jpg" import flask import os app=flask.Flask(__name__) @app.route("/") def index():     if "fileName" not in flask.request.values:         return "图像.jpg"     else:         data = b""         try:             _____________________________________________             if fileName != "" and os.path.exists(fileName):                 fobj = open(fileName, "rb")                 _________________________                 fobj.close()         except Exception as err:             data = str(err).encode()         return data if __name__=="__main__":     app.run() 缺失的语句是
    A、fileName = flask.request.values.get("fileName"); data = fobj.read()
    B、fileName = flask.request.args.get("fileName"); data = fobj.read()
    C、fileName = flask.request.form.get("fileName"); data = fobj.read()
    D、都不对

1.6 Web上传文件随堂测验

1、服务器程序接受客户端上传的文件名称fileName,然后获取文件数据保存 import flask app=flask.Flask(__name__) @app.route("/upload",methods=["POST"]) def uploadFile():     msg=""     try:         if "fileName" in flask.request.values:             fileName = flask.request.values.get("fileName")             __________________________________             fobj=open("upload "+fileName,"wb")             fobj.write(data)             fobj.close()             msg="OK"         else:             msg="没有按要求上传文件"     except Exception as err:         print(err)         msg=str(err)     return msg if __name__=="__main__":     app.run() 缺失的语句是
    A、data=flask.request.read()
    B、data=flask.request.get_data()
    C、data=flask.request.values.read()
    D、data=flask.request.values.get_data()

1.7 Web学生管理程序随堂测验

1、class StudentDB:     def openDB(self):         self.con=sqlite3.connect("students.db")         self.cursor=self.con.cursor()     def closeDB(self):         self.con.commit()         self.con.close()     def initTable(self):         res={}         try:             self.cursor.execute("create table students (No varchar(16) primary key,Name varchar(16), Sex varchar(8), Age int)")             res["msg"]="OK"         except Exception as err:             res["msg"]=str(err)         return res     def insertRow(self,No,Name,Sex,Age):         res={}         try:             ___________________________________________             res["msg"]="OK"         except Exception as err:             res["msg"]=str(err)         return res 程序插入一条学生记录,缺失的语句是
    A、self.cursor.execute("insert into students (No,Name,Sex,Age) values (%s,%s,%s,%s)",(No,Name,Sex,Age))
    B、self.cursor.execute("insert into students (No,Name,Sex,Age) values (%s,%s,%s,%d)",(No,Name,Sex,Age))
    C、self.cursor.execute("insert into students (No,Name,Sex,Age) values (@No,@Name,@Sex,@Age)",(No,Name,Sex,Age))
    D、self.cursor.execute("insert into students (No,Name,Sex,Age) values (?,?,?,?)",(No,Name,Sex,Age))

1.8 正则表达式随堂测验

1、import re s="testing search" reg=r"[A-Za-z]+\b" m=re.search(reg,s) while m!=None:     start=m.start()     end=m.end()     print(s[start:end],end=" ")     s=s[end:]     m=re.search(reg,s) 结果:
    A、testing
    B、testing search
    C、search
    D、search testing

测验1

1、import flask ____________ @app.route("/") def index(): return "hello" app.run() 缺少的语句是:
    A、app=flask("web")
    B、app=Flask("web")
    C、app=flask.Flask("web")
    D、app=flask.Flask()

2、import flask app=flask.Flask(__name__) @app.route("/",methods=["POST"]) def index():     try:         ________________________________________________         return province     except Exception as err:         return str(err) if __name__=="__main__":     app.run() 程序或是province参数,缺失的语句是
    A、province=flask.request.args.get("province")
    B、province=flask.request.values.get("province")
    C、province=flask.response.args.get("province")
    D、province=flask.response.values.get("province")

3、import flask app=flask.Flask("web") @app.route("/",___________) def index(): #...... return "hello" app.run() 程序要求能接收POST数据,缺失的语句是
    A、methods=["GET"]
    B、methods=["POST"]
    C、method=["GET"]
    D、method=["POST"]

4、import re s="abbcabab" —————————————— print(re.search(reg,s)) 查找到"abab",缺失的语句是
    A、reg=r"(ab)+"
    B、reg=r"(ab)+$"
    C、reg=r"ab+$"
    D、reg=r"ab+"

5、服务器程序可以接受get与post的提交信息 import flask app=flask.Flask(__name__) @app.route("/",____________________) def index():     try:         province=flask.request.values.get("province")         return province     except Exception as err:         return str(err) if __name__=="__main__":     app.run() 缺失的语句是
    A、methods=["GET","POST"]
    B、method=["GET","POST"]
    C、methods=["POST"]
    D、method=["POST"]

6、import urllib.request resp=urllib.request.urlopen("http://127.0.0.1:5000") ______________ print(data) 获取网站的二进制数据,缺少的语句是:
    A、data=resp.read()
    B、data=resp.get()
    C、data=resp.readBinary()
    D、data=resp.getBinary()

7、import urllib.request resp=urllib.request.urlopen("http://127.0.0.1:5000") ______________ print(html) 获取网站的HTML文本数据,缺少的语句是:
    A、html=resp.read.decode()
    B、html=resp.read().decode()
    C、html=resp.read.encode()
    D、html=resp.read().encode()

8、import re s="searching search" _______________ print(re.search(reg,s)) 查找s中的第一个search字符串,缺失的语句是
    A、reg=r"[a-zA-Z]+"
    B、reg=r"[a-zA-Z]+$"
    C、reg=r"[a-zA-Z]+$"
    D、reg=r"$[a-zA-Z]+"

9、import re s="testing search" reg=r"[A-Za-z]+\b" m=re.search(reg,s) while m!=None:     start=m.start()     end=m.end()     print(s[start:end],end=" ")     s=s[end:]     m=re.search(reg,s) 结果:
    A、testing
    B、testing search
    C、search
    D、search testing

10、import re s="searching search" _______________ print(re.search(reg,s)) 查找s中的最后一个search单词,缺失的语句是
    A、reg=r"[A-Za-z]+$"
    B、reg=r"[A-Za-z]+$"
    C、reg=r"[A-Za-z]+"
    D、reg=r"[A-Za-z]+"

11、import re reg=r"x[ab0-9]y" m=re.search(reg,"xayx2yxcy") print(m) 结果匹配"xcy": <_sre.SRE_Match object; span=(6, 9), match='xcy'>

12、import re reg=r"x[0-9]y" m=re.search(reg,"xyx2y") print(m) 结果匹配"x2y": <_sre.SRE_Match object; span=(0, 2), match='xy'>

13、import re reg=r"car\b" m=re.search(reg,"The car is black") print(m) 结果匹配"car",因为"car"后面是以个空格: <_sre.SRE_Match object; span=(4, 7), match='car'>

14、import re reg=r"a\nb?" m=re.search(reg,"ca\nbcabc") print(m) 结果匹配"a\n\b": <_sre.SRE_Match object; span=(1, 4), match='ab'>

15、import re s="xaabababy" m=re.search(r"ab|ba",s) print(m) 结果匹配"ab"或者"ba"都可以: <_sre.SRE_Match object; span=(2, 4), match='ba'>

16、import re s="xaxby" m=re.search(r"a.b",s) print(m) 结果"."代表了字符"x" <_sre.SRE_Match object; span=(1, 4), match='axb'>

17、import re reg=r"ab?" m=re.search(reg,"abbcabc") print(m) 结果: <_sre.SRE_Match object; span=(0, 3), match='abb'>

18、import re reg=r"ab+" m=re.search(reg,"acabc") print(m) reg=r"ab*" m=re.search(reg,"acabc") print(m) 结果: <_sre.SRE_Match object; span=(2, 4), match='ab'> <_sre.SRE_Match object; span=(2, 4), match='ab'>

19、import re reg=r"b\d+" m=re.search(reg,"a12b123c") print(m) 结果找到了"b123": <_sre.SRE_Match object; span=(3, 7), match='b123'>

20、import re reg=r"\d" m=re.search(reg,"abc123cd") print(m) 结果找到了第一个数值"1": <_sre.SRE_Match object; span=(3, 4), match='1'>

项目2 爬取天气预报数据

2.3 BeautifulSoup查找文档元素随堂测验

1、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") _______________________________ print(tag) 程序结果找到class="story"的<p>元素,缺失的语句是
    A、tag=soup.find("p",attrs={"class":"story"})
    B、tag=soup.find("p",attr={"class":"story"})
    C、tag=soup.find("p")
    D、tag=soup.find("p",class="story")

2.4 BeautifulSoup遍历文档元素随堂测验

1、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") ___________________________________ for tag in tags: print(tag) 查找文档中class="sister"的元素,缺失语句是:
    A、tags=soup.find(name=None,attrs={"class":"sister"})
    B、tags=soup.find(attrs={"class":"sister"})
    C、tags=soup.find_all(attrs={"class":"sister"})
    D、tags=soup.find_all(name=None,attrs={"class":"sister"})

2.5 BeautifulSoup使用CSS语法查找元素随堂测验

1、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") ______________________________________ for tag in tags: print(tag["href"]) 结果是: http://example.com/elsie http://example.com/lacie http://example.com/tillie 缺失的语句是:
    A、tags=soup.select("pa")
    B、tags=soup.select("p[] a")
    C、tags=soup.select("p[class]>a")
    D、tags=soup.select("p[class='story'] a")

测验2

1、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") _______________________________ print(tag) 程序结果找到class="story"的<p>元素,缺失的语句是
    A、tag=soup.find("p",attrs={"class":"story"})
    B、tag=soup.find("p",attr={"class":"story"})
    C、tag=soup.find("p")
    D、tag=soup.find("p",class="story")

2、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") ___________________________________ for tag in tags: print(tag) 查找文档中class="sister"的所有所有元素,缺失语句是:
    A、tags=soup.find(name=None,attrs={"class":"sister"})
    B、tags=soup.find(attrs={"class":"sister"})
    C、tags=soup.find_all(attrs={"class":"sister"})
    D、tags=soup.find_all(name=None,attrs={"class":"sister"})

3、查找文档中所有超级链接地址 from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") ______________________ for tag in tags: _________________ 缺失的语句是:
    A、tags=soup.find("a");print(tag.href)
    B、tags=soup.find_all("a");print(tag("href"))
    C、tags=soup.find_all("a");print(tag["href"]);
    D、tags=soup.find("a");print(tag.("href"))

4、查找文档中所有<a>超级链接包含的文本值 from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") _____________________________ for tag in tags: ________________________ 缺失的语句是:
    A、tags=soup.find_all("a"); print(tag["text"])
    B、tags=soup.find("a"); print(tag.text)
    C、tags=soup.find("a"); print(tag["text"])
    D、tags=soup.find_all("a"); print(tag.text)

5、查找文档中所有<p>超级链接包含的文本值 from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") __________________________________ for tag in tags: ________________________ 缺失的语句是:
    A、tags=soup.find("p"); print(tag.text)
    B、tags=soup.find("p"); print(tag["text"])
    C、tags=soup.find_all("p"); print(tag.text)
    D、tags=soup.find_all("p"); print(tag["text"])

6、我们查找文档中的href="http://example.com/lacie"的节点元素<a> from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <a href="http://example.com/elsie" >Elsie</a> <a href="http://example.com/lacie" >Lacie</a> <a href="http://example.com/tillie" >Tillie</a> </body> </html> ''' def myFilter(tag): print(tag.name) __________________________________________ soup=BeautifulSoup(doc,"lxml") tag=soup.find_all(myFilter) print(tag) 缺失的语句是:
    A、return (tag.name=="a" and tag["href"]=="http://example.com/lacie")
    B、return (tag.name=="a" and tag.has_attr("href") and tag["href"]=="http://example.com/lacie")
    C、return (tag.name=="a" and tag.has_attr("href") and tag.href=="http://example.com/lacie")
    D、return (tag.name=="a" and tag.href=="http://example.com/lacie")

7、通过函数查找可以查找到一些复杂的节点元素,查找文本值以"cie"结尾所有<a>节点 from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <a href="http://example.com/elsie" >Elsie</a> <a href="http://example.com/lacie" >Lacie</a> <a href="http://example.com/tillie" >Tillie</a> <a href="http://example.com/tilcie" >Tilcie</a> </body> </html> ''' def endsWith(s,t): if len(s)>=len(t): ___________________________ return False def myFilter(tag): return (tag.name=="a" and _____________________________) soup=BeautifulSoup(doc,"lxml") tags=soup.find_all(myFilter) for tag in tags: print(tag) 缺失的语句是:
    A、return s[len(s)-len(t)-1:]==t; endsWith(tag.text,"cie")
    B、return s[len(s)-len(t):]==t; endsWith(tag.text,"cie")
    C、return s[len(s)-len(t)-1:]==t; endsWith(tag["text"],"cie")
    D、return s[len(s)-len(t):]==t; endsWith(tag["text"],"cie")

8、找出文档中<p class="title"><b>The Dormouse's story</b></p>的<b>元素节点的所有父节点的名称。 from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") print(soup.name) ________________________________ while tag: print(tag.name) ____________________________ 缺失的语句是:
    A、tag=soup.find("b"); tag=tag.parent
    B、tag=soup.find("b"); tag=tag["parent"]
    C、tag=soup.find_all("b"); tag=tag.parent
    D、tag=soup.find_all("b"); tag=tag["parent"]

9、获取<p>元素的所有直接子元素节点 from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The <i>Dormouse's</i> story</b> Once upon a time ...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") ________________ for x in __________________: print(x) 缺失的语句是:
    A、tag=soup.find("p"); tag.children
    B、tag=soup.find("p"); tag.child
    C、tag=soup.find_all("p"); tag.children
    D、tag=soup.find_all("p"); tag.child

10、获取<p>元素的所有子孙元素节点 from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The <i>Dormouse's</i> story</b> Once upon a time ...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") ______________________________ for x in ________________________: print(x) 缺失的语句是:
    A、tag=soup.find("p"); tag.children
    B、tag=soup.find("p"); tag.descendants
    C、tag=soup.find_all("p"); tag.children
    D、tag=soup.find_all("p"); tag.descendants

11、from bs4 import BeautifulSoup doc=''' <title>有缺失元素的HTML文档</title> <div> <A href='one.html'>one</a> <p> <a href='two.html'>two</a> </DIV> ''' soup=BeautifulSoup(doc,"lxml") s=soup.prettify() print(s) 程序结果如下: html> <head> <title> 有缺失元素的HTML文档 </title> </head> <body> <div> <a href="one.html"> one </a> <p> <a href="two.html"> two </a> </p> </div> </body> </html> 对吗?

12、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") tag=soup.find("title") print(type(tag),tag) 程序结果: <class 'bs4.element.Tag'> <title>The Dormouse's story</title> 对吗?

13、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") tags=soup.find_all("a") for tag in tags: print(tag) 程序结果找到1个<a>元素: <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> 对吗?

14、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") tag=soup.find("a") print(tag) 程序结果找到第一个<a>元素: <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> 对吗?

15、soup.select("a") 查找文档中所有<a>元素节点;

16、soup.select("p a") 查找文档中所有<p>节点下的所有<a>元素节点;

17、soup.select("p[class='story'] a") 查找文档中所有属性class="story"的<p>节点下的所有<a>元素节点;

18、soup.select("p[class] a") 查找文档中所有具有class属性的<p>节点下的所有<a>元素节点;

19、soup.select("a[id='link1']") 查找属性id="link1"的<a>节点;

20、soup.select("body head title") 查找<body>下面<head>下面的<title>节点;

项目3 爬取网站图像文件

3.1 网站树的爬取路径随堂测验

1、(1) books.htm <h3>计算机</h3> <ul> <li><a href="database.htm">数据库</a></li> <li><a href="program.htm">程序设计</a></li> <li><a href="network.htm">计算机网络</a></li> </ul> (2) database.htm <h3>数据库</h3> <ul> <li><a href="mysql.htm">MySQL数据库</a></li> </ul> (3) program.htm <h3>程序设计</h3> <ul> <li><a href="python.htm">Python程序设计</a></li> <li><a href="java.htm">Java程序设计</a></li> </ul> (4) network.htm <h3>计算机网络</h3> (5) mysql.htm <h3>MySQL数据库</h3> (6) python.htm <h3>Python程序设计</h3> (7) java.htm <h3>Java程序设计</h3> from bs4 import BeautifulSoup import urllib.request def spider(url): try: data=urllib.request.urlopen(url) data=data.read() data=data.decode() soup=BeautifulSoup(data,"lxml") print(soup.find("h3").text) ____________________________________ for link in links: href=link["href"] ___________________________________ spider(url) except Exception as err: print(err) start_url="http://127.0.0.1:5000" spider(start_url) print("The End") 递归调用
    A、links=soup.select("a");url=start_url+href
    B、links=soup.select("li");url=start_url+"/"+href
    C、links=soup.select("a");url=start_url+"/"+href
    D、links=soup.select("li");url=start_url+href

3.2 网站图的爬取路径随堂测验

1、from bs4 import BeautifulSoup import urllib.request class Stack: def __init__(self): self.st=[] def pop(self): return self.st.pop() def push(self,obj): self.st.append(obj) def empty(self): return len(self.st)==0 def spider(url): stack=Stack() stack.push(url) while not stack.empty(): url=stack.pop() try: data=urllib.request.urlopen(url) data=data.read() data=data.decode() soup=BeautifulSoup(data,"lxml") print(soup.find("h3").text) links=soup.select("a") for i in _______________________________: href=links[i]["href"] url=start_url+"/"+href stack.push(url) except Exception as err: print(err) start_url="http://127.0.0.1:5000" spider(start_url) print("The End")
    A、range(len(links)-1,-1,-1)
    B、range(len(links),-1,-1)
    C、range(len(links)-1,0,-1)
    D、range(len(links),0,-1)

3.3 Python实现多线程随堂测验

1、在主线程中启动一个子线程执行reading函数。 import threading import time import random def reading(): for i in range(10): print("reading",i) time.sleep(random.randint(1,2)) _______________________________ r.setDaemon(False) r.start() print("The End")
    A、r=threading.Thread(reading)
    B、r=threading.Thread(target=reading())
    C、r=threading.Thread(target=reading)
    D、r=Thread(target=reading)

3.4 爬取网站复杂数据随堂测验

1、不重复访问网站,使用队列的程序 from bs4 import BeautifulSoup import urllib.request class Queue: def __init__(self): self.st=[] def fetch(self): return self.st.pop(0) def enter(self,obj): self.st.append(obj) def empty(self): return len(self.st)==0 def spider(url): global urls queue=Queue() queue.enter(url) while ________________________: url=queue.fetch() if url not in urls: try: urls.append(url) data=urllib.request.urlopen(url) data=data.read() data=data.decode() soup=BeautifulSoup(data,"lxml") print(soup.find("h3").text) links=soup.select("a") for link in links: ________________ url=start_url+"/"+href queue.enter(url) except Exception as err: print(err) start_url="http://127.0.0.1:5000" urls=[] spider(start_url) print("The End")
    A、queue.empty(); href=link["href"]
    B、not queue.empty(); href=link["href"]
    C、queue.empty(); href=link.href
    D、not queue.empty(); href=link.href

测验3

1、def spider(url):   #获取新的地址newUrl if newUrl: spider(newUrl) 下面说法正确的是:
    A、不是递归调用
    B、一定会出现死循环
    C、找不到newUrl时会结束递归调用
    D、找不到newUrl时也不会结束递归调用

2、栈的设计如下: class Stack: def __init__(self): self.st=[] def pop(self): _____________________ def push(self,obj): self.st.append(obj) def empty(self): return len(self.st)==0
    A、return self.st.pop(0)
    B、return self.st.pop()
    C、return st.pop()
    D、return st.pop(0)

3、深度优先爬取说法正确的是
    A、结果与递归调用爬取一样
    B、结果与递归调用爬取不一样
    C、效率比函数递归调用爬取低
    D、效率比函数递归调用爬取高

4、队列设计如下: class Queue: def __init__(self): self.st=[] def fetch(self): return self.st.pop(0) def enter(self,obj): _________________________________ def empty(self): return len(self.st)==0
    A、self.st.append(obj)
    B、self.st.insert(0,obj)
    C、st.append(obj)
    D、st.insert(0,obj)

5、广度优先爬取数据,说法正确的是:
    A、爬取数据的顺序与深度优先的不同
    B、爬取数据的顺序与深度优先的相同
    C、爬取数据的顺序与函数递归方法相同
    D、都不对

6、有一个dowbload(url)函数下载url图像: import threading def download(url): pass 用多线程调用它,方法是:
    A、T=threading.Thread(target=download,args=[url]) T.start()
    B、T=threading.Thread(target=download,args=url) T.start()
    C、T=threading.Thread(target=download,args=(url)) T.start()
    D、都不对

7、爬取网站的很多图片时,说法正确是:
    A、使用单线程效率高,程序简单
    B、使用单线程效率高,程序复杂
    C、使用多线程效率高,程序简单
    D、使用多线程效率高,程序复杂

8、import threading def download(url): print(url) threads=[] urls=["http://A","http://B"] for url in urls: T=threading.Thread(target=download,args=[url]) T.start() threads.append(T) ———————— 要等待所有线程结束,缺少语句是
    A、for T in threads: T.join()
    B、for T in threads: T.wait()
    C、threads.waitAll()
    D、threads.joinAll()

9、在主线程中启动一个子线程执行reading函数。 import threading import time import random def reading(): for i in range(10): print("reading",i) time.sleep(random.randint(1,2)) _______________________________ r.setDaemon(False) r.start() print("The End")
    A、r=threading.Thread(reading)
    B、r=threading.Thread(target=reading())
    C、r=threading.Thread(target=reading)
    D、r=Thread(target=reading)

10、启动一个前台线程 import threading import time import random def reading(): for i in range(5): print("reading",i) time.sleep(random.randint(1,2)) r=threading.Thread(target=reading) __________________ r.start() print("The End")
    A、r.setDaemon(True)
    B、r.setDaemon(true)
    C、r.setDaemon(False)
    D、r.setDaemon(false)

11、import threading def download(url): print(url) threads=[] urls=["http://A","http://B"] for url in urls: T=threading.Thread(target=download,args=(url)) T.start()

12、import threading import time import random def reading(): for i in range(5): print("reading",i) time.sleep(random.randint(1,2)) t=threading.Thread(target=reading) t.setDaemon(False) t.start() t.join() print("The End") 程序结果: reading 0 reading 1 reading 2 reading 3 reading 4 The End 你认为结果可能吗?

13、import threading def download(url): print(url) threads=[] urls=["http://A","http://B"] for url in urls: T=threading.Thread(target=download,args=url) T.start()

14、import threading def download(url): print(url) threads=[] urls=["http://A","http://B"] for url in urls: T=threading.Thread(target=download,args=[url]) T.start()

15、from bs4 import UnicodeDammit dammit=UnicodeDammit(data,["utf-8","gbk"]) data=dammit.unicode_markup 能100%自动识别data的编码

16、url="http://www.weather.com.cn/weather/101280601.shtml" headers={"User-Agent":"Mozilla/5.0 (Windows; U; Windows NT 6.0 x64; en-US; rv:1.9pre) Gecko/2008072421 Minefield/3.0.2pre"} req=urllib.request.Request(url,headers=headers) data=urllib.request.urlopen(req) data=data.read() 其中headers的作用是为了模拟浏览器

17、from bs4 import BeautifulSoup doc="<div><p>A</p><span><p>B</p></span></div><div><p>C</p></div>" soup=BeautifulSoup(doc,"lxml") tags=soup.select("div > p") for tag in tags: print(tag) 程序结果: <p>A</p>

18、from bs4 import BeautifulSoup doc="<div><p>A</p><span><p>B</p></span></div><div><p>C</p></div>" soup=BeautifulSoup(doc,"lxml") tags=soup.select("div p") for tag in tags: print(tag) 程序结果: <p>A</p>

19、soup.select("body [class] a") 查找<body>下面所有具有class属性的节点下面的<a>节点;

20、soup.select("body [class] ") 查找<body>下面所有具有class属性的节点;

21、soup.select("body head title") 查找<body>下面<head>下面的<title>节点;

22、soup.select("a[id='link1']") 查找属性id="link1"的<a>节点;

项目4 爬取网站图书数据

4.1 scrapy框架爬虫简介随堂测验

1、import scrapy class MySpider(scrapy.Spider): name = "mySpider" def start_requests(self): url ='http://127.0.0.1:5000' _________________________________________ def parse(self, response): print(response.url) data=response.body.decode() print(data)
    A、yield scrapy.Request(url=url,callback=self.parse)
    B、yield scrapy.Request(url=url,callback=parse)
    C、return scrapy.Request(url=url,callback=self.parse)
    D、return scrapy.Request(url=url,callback=parse)

4.2 scrapy中查找HTML元素随堂测验

1、from scrapy.selector import Selector htmlText=''' <html><body> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title lang="eng">Learning XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) print(type(selector)); print(selector) _______________ print(type(s)) print(s) 查找所有的<title>
    A、s=selector.xpath("title")
    B、s=selector.xpath("//title")
    C、s=selector.xpath("/title")
    D、s=selector.xpath("///title")

4.2 scrapy中查找HTML元素随堂测验

1、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <title>books</title> <book> <title>Novel</title> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title>TextBook</title> <title lang="eng">Learning XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) _____________________________________ for e in s: print(e) 程序结果: <Selector xpath='//book/title' data='<title>Novel</title>'> <Selector xpath='//book/title' data='<title lang="eng">Harry Potter</title>'> <Selector xpath='//book/title' data='<title>TextBook</title>'> <Selector xpath='//book/title' data='<title lang="eng">Learning XML</title>'>
    A、s=selector.xpath("/book").xpath("./title")
    B、s=selector.xpath("//book").xpath("./title")
    C、s=selector.xpath("//book").xpath("/title")
    D、s=selector.xpath("/book").xpath("/title")

4.3 scrapy爬取与存储数据随堂测验

1、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <book id="b1"> <title lang="english">Harry Potter</title> <price>29.99</price> </book> <book id="b2"> <title lang="chinese">学习XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) ____________________________________________________ print(s.extract_first()) s=selector.xpath("//book[@id='b1']/title") print(s.extract_first()) 程序结果: 学习XML <title lang="english">Harry Potter</title>
    A、s=selector.xpath("//book/title[@lang='chinese']/text")
    B、s=selector.xpath("//book/title[lang='chinese']/text")
    C、s=selector.xpath("//book/title[@lang='chinese']/text()")
    D、s=selector.xpath("//book/title[@lang='chinese']/text")

4.4 scrapy爬取网站数据随堂测验

1、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <book id="b1"> <title lang="english">Harry Potter</title> <price>29.99</price> </book> <book id="b2"> <title lang="chinese">学习XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) ____________________________________ print(s.extract_first()) ____________________________________ print(s.extract_first()) 程序结果: <title lang="english">Harry Potter</title> <title lang="chinese">学习XML</title>
    A、s=selector.xpath("//book[position=1]/title"); s=selector.xpath("//book[position=2]/title")
    B、s=selector.xpath("//book[position()=2]/title"); s=selector.xpath("//book[position()=1]/title")
    C、s=selector.xpath("//book[position=2]/title"); s=selector.xpath("//book[position=1]/title")
    D、s=selector.xpath("//book[position()=1]/title"); s=selector.xpath("//book[position()=2]/title")

测验4

1、import scrapy class MySpider(scrapy.Spider): name = "mySpider" def start_requests(self): url ='http://127.0.0.1:5000' _________________________________________ def parse(self, response): print(response.url) data=response.body.decode() print(data)
    A、yield scrapy.Request(url=url,callback=self.parse)
    B、yield scrapy.Request(url=url,callback=parse)
    C、return scrapy.Request(url=url,callback=self.parse)
    D、return scrapy.Request(url=url,callback=parse)

2、def fun(): s=['a','b','c'] for x in s: __________________ print("fun End") f=fun() print(f) for e in f: print(e) 程序结果: <generator object fun at 0x0000003EA99BD728> a b c fun End
    A、yield s
    B、yield x
    C、return x
    D、return s

3、from scrapy.selector import Selector htmlText=''' <html><body> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title lang="eng">Learning XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) print(type(selector)); print(selector) _______________ print(type(s)) print(s) 查找所有的<title>
    A、s=selector.xpath("title")
    B、s=selector.xpath("//title")
    C、s=selector.xpath("/title")
    D、s=selector.xpath("///title")

4、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <title>books</title> <book> <title>Novel</title> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title>TextBook</title> <title lang="eng">Learning XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) _____________________________________ for e in s: print(e) 程序结果: <Selector xpath='//book/title' data='<title>Novel</title>'> <Selector xpath='//book/title' data='<title lang="eng">Harry Potter</title>'> <Selector xpath='//book/title' data='<title>TextBook</title>'> <Selector xpath='//book/title' data='<title lang="eng">Learning XML</title>'>
    A、s=selector.xpath("/book").xpath("./title")
    B、s=selector.xpath("//book").xpath("./title")
    C、s=selector.xpath("//book").xpath("/title")
    D、s=selector.xpath("/book").xpath("/title")

5、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <book id="b1"> <title lang="english">Harry Potter</title> <price>29.99</price> </book> <book id="b2"> <title lang="chinese">学习XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) s=selector.xpath("//book/price") print(type(s),s) ____________________________________ print(type(s),s) s=selector.xpath("//book/price").extract_first() print(type(s),s) 程序结果: <class 'scrapy.selector.unified.SelectorList'> [<Selector xpath='//book/price' data='<price>29.99</price>'>, <Selector xpath='//book/price' data='<price>39.95</price>'>] <class 'list'> ['<price>29.99</price>', '<price>39.95</price>'] <class 'str'> <price>29.99</price>
    A、s=selector.xpath("/book/price").extract()
    B、s=selector.xpath("/book//price").extract()
    C、s=selector.xpath("//book//price").extract()
    D、s=selector.xpath("//book/price").extract()

6、htmlText=''' <html> <body> <bookstore> <book id="b1"> <title lang="english">Harry Potter</title> <price>29.99</price> </book> <book id="b2"> <title lang="chinese">学习XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) _________________________________________ print(s) print(s.extract()) for e in s: print(e.extract()) 程序结果: [<Selector xpath='//book/@id' data='b1'>, <Selector xpath='//book/@id' data='b2'>] ['b1', 'b2'] b1 b2
    A、s=selector.xpath("/book/@id")
    B、s=selector.xpath("//book/@id")
    C、s=selector.xpath("//book/id")
    D、s=selector.xpath("/book/id")

7、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <book id="b1"> <title lang="english">Harry Potter</title> <price>29.99</price> </book> <book id="b2"> <title lang="chinese">学习XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) _____________________________________ print(s) print(s.extract()) for e in s: print(e.extract()) 程序结果: [<Selector xpath='//book/title/text()' data='Harry Potter'>, <Selector xpath='//book/title/text()' data='学习XML'>] ['Harry Potter', '学习XML'] Harry Potter 学习XML
    A、s=selector.xpath("/book/title/text()")
    B、s=selector.xpath("/book/title/text")
    C、s=selector.xpath("//book/title/text()")
    D、s=selector.xpath("book/title/text()")

8、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <book id="b1"> <title lang="english"><b>H</b>arry <b>P</b>otter</title> <price>29.99</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) _________________________ print(s) print(s.extract()) for e in s: print(e.extract()) 程序结果: [<Selector xpath='//book/title/text()' data='arry '>, <Selector xpath='//book/title/text()' data='otter'>] ['arry ', 'otter'] arry otter
    A、s=selector.xpath("//book/title/text()")
    B、s=selector.xpath("/book/title/text()")
    C、s=selector.xpath("//book/title/text")
    D、s=selector.xpath("/book/title/text")

9、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <book id="b1"> <title lang="english">Harry Potter</title> <price>29.99</price> </book> <book id="b2"> <title lang="chinese">学习XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) ____________________________________________________ print(s.extract_first()) s=selector.xpath("//book[@id='b1']/title") print(s.extract_first()) 程序结果: 学习XML <title lang="english">Harry Potter</title>
    A、s=selector.xpath("//book/title[@lang='chinese']/text")
    B、s=selector.xpath("//book/title[lang='chinese']/text")
    C、s=selector.xpath("//book/title[@lang='chinese']/text()")
    D、s=selector.xpath("//book/title[@lang='chinese']/text")

10、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <book id="b1"> <title lang="english">Harry Potter</title> <price>29.99</price> </book> <book id="b2"> <title lang="chinese">学习XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) ____________________________________ print(s.extract_first()) ____________________________________ print(s.extract_first()) 程序结果: <title lang="english">Harry Potter</title> <title lang="chinese">学习XML</title>
    A、s=selector.xpath("//book[position=1]/title"); s=selector.xpath("//book[position=2]/title")
    B、s=selector.xpath("//book[position()=2]/title"); s=selector.xpath("//book[position()=1]/title")
    C、s=selector.xpath("//book[position=2]/title"); s=selector.xpath("//book[position=1]/title")
    D、s=selector.xpath("//book[position()=1]/title"); s=selector.xpath("//book[position()=2]/title")

11、selector.xpath("//bookstore/book") 搜索<bookstore>下一级的<book>元素,找到2个;

12、selector.xpath("//body/book") 搜索<body>下一级的<book>元素,结果为空;

13、selector.xpath("//body//book") 搜索<body>下<book>元素,找到2个;

14、selector.xpath("/body//book") 搜索文档下一级的<body>下的<book>元素,找结果为空,因为文档的下一级是<html>元素,不是<body>元素;

15、selector.xpath("/html/body//book")或者selector.xpath("/html//book") 搜索<book>元素,找到2个;

16、selector.xpah("//book/title") 搜索文档中所有<book>下一级的<title>元素,找到2个,结果与selector.xpah("//title")、selector.xpath("//bookstore//title")一样;

17、selector.xpath("//book//price")与selector.xpath("//price")结果一样,都是找到2个<price>元素;

18、selector.xpath("/book//price")与selector.xpath("//price")结果一样;

19、selector.xpath("//book[id='book1']//price")与selector.xpath("//price")结果一样;

20、selector.xpath("//price/text()").extract()返回price下面的文本字符串

期中考试

课程考试

1、import flask ____________ @app.route("/") def index(): return "hello" app.run()
    A、app=flask.Flask("web")
    B、app=flask("web")
    C、app=Flask("web")
    D、app=flask.Flask()

2、import flask app=flask.Flask(__name__) @app.route("/",methods=["POST"]) def index():     try:        ________________________________________________         return province     except Exception as err:         return str(err) if __name__=="__main__":     app.run() 程序或是province参数,缺失的语句是
    A、province=flask.request.args.get("province")
    B、province=request.values.get("province")
    C、province=flask.request.values.get("province")
    D、province=flask.respone.values.get("province")

3、import flask app=flask.Flask("web") @app.route("/",___________) def index(): #...... return "hello" app.run() 程序要求能接收POST数据,缺失的语句是
    A、method=["GET"]
    B、method=["POST"]
    C、methods=["GET"]
    D、methods=["POST"]

4、import re s="abbcabab" —————————————— print(re.search(reg,s)) 查找到"abab",缺失的语句是
    A、reg=r"ab+$"
    B、reg=r"(ab)+$"
    C、reg=r"ab+$"
    D、reg=r"ab+"

5、服务器程序可以接受get与post的提交信息 import flask app=flask.Flask(__name__) @app.route("/",____________________) def index():     try:         province=flask.request.values.get("province") if "province" in flask.request.values  else ""         city = flask.request.values.get("city") if "city" in flask.request.values else ""         note = flask.request.values.get("note") if "note" in flask.request.values else ""         return province+","+city+"\n"+note     except Exception as err:         return str(err) if __name__=="__main__":     app.run() 缺失的语句是
    A、methods=["GET","POST"]
    B、method=["GET","POST"]
    C、methods=["POST"]
    D、method=["POST"]

6、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") _______________________________ print(tag) 程序结果找到class="story"的<p>元素,缺失的语句是
    A、tag=soup.find("p",attrs={"class":"story"})
    B、tag=soup.find("p",attr={"class":"story"})
    C、tag=soup.find("p")
    D、tag=soup.find("p",class="story")

7、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") ___________________________________ for tag in tags: print(tag) 查找文档中class="sister"的元素,缺失语句是:
    A、tags=soup.find(name=None,attrs={"class":"sister"})
    B、tags=soup.find(attrs={"class":"sister"})
    C、tags=soup.find_all(attrs={"class":"sister"})
    D、tags=soup.find_all(name=None,attrs={"class":"sister"})

8、查找文档中所有<a>超级链接包含的文本值 from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") _____________________________ for tag in tags: ________________________ 缺失的语句是:
    A、tags=soup.find("a"); print(tag.text)
    B、tags=soup.find("a"); print(tag["text"])
    C、tags=soup.find_all("a"); print(tag["text"])
    D、tags=soup.find_all("a"); print(tag.text)

9、from bs4 import BeautifulSoup doc="<body>demo<div>A</div><b>X</b><p>B</p><span><p>C</p></span><p>D</p></div></body>" soup=BeautifulSoup(doc,"lxml") print(soup.prettify()) tags=soup.select("div ~ p") for tag in tags: print(tag) print() tags=soup.select("div + p") for tag in tags: print(tag) 结果是
    A、<p>B</p>; <p>D</p>
    B、<p>B</p>; <p>C</p>
    C、<p>C</p>; <p>D</p>
    D、<p>D</p>; <p>C</p>

10、from bs4 import BeautifulSoup doc=''' <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well. </p> <p class="story">...</p> </body> </html> ''' soup=BeautifulSoup(doc,"lxml") ______________________________________ for tag in tags: print(tag["href"]) 结果是: http://example.com/elsie http://example.com/lacie http://example.com/tillie 缺失的语句是:
    A、tags=soup.select("pa")
    B、tags=soup.select("p[] a")
    C、tags=soup.select("p[class]>a")
    D、tags=soup.select("p[class='story'] a")

11、import flask app=flask.Flask(__name__) @app.route("/") def index(): try: fobj=open("index.htm","rb") data=fobj.read() fobj.close() return data except Exception as err: return str(err) if __name__=="__main__": app.run() index.htm文件 <h1>Welcome Python Flask Web</h1> It is very easy to make a website by Python Flask. 那么访问http://127.0.0.1:5000可以看到index.htm的结果

12、import re reg=r"ab$" m=re.search(reg,"abcab") print(m) <_sre.SRE_Match object; span=(0,2), match='ab'> * N *** import re reg=r"(ab)+" m=re.search(reg,"ababcab") print(m) 结果匹配"abab","+"对"ab"进行了重复: <_sre.SRE_Match object; span=(0, 4), match='abab'>

13、import re reg=r"ab" m=re.search(reg,"cabcab") print(m) 结果: None

14、import re reg=r"\w+" m=re.search(reg,"Python is easy") print(m) 结果匹配"Python": <_sre.SRE_Match object; span=(0, 6), match='Python'>

15、import re s="1a ba\tbxy" m=re.search(r"a\sb",s) print(m) 结果匹配"a b": <_sre.SRE_Match object; span=(1, 4), match='a b'>

16、import re reg=r"x[ab0-9]y" m=re.search(reg,"xayx2yxcy") print(m) 结果匹配"xcy": <_sre.SRE_Match object; span=(6, 9), match='xcy'>

17、import re reg=r"x[0-9]y" m=re.search(reg,"xyx2y") print(m) 结果匹配"x2y": <_sre.SRE_Match object; span=(0, 2), match='xy'>

18、import re reg=r"car\b" m=re.search(reg,"The car is black") print(m) 结果匹配"car",因为"car"后面是以个空格: <_sre.SRE_Match object; span=(4, 7), match='car'>

19、import re reg=r"a\nb?" m=re.search(reg,"ca\nbcabc") print(m) 结果匹配"a\n\b": <_sre.SRE_Match object; span=(1, 4), match='ab'>

20、import re s="xaabababy" m=re.search(r"ab|ba",s) print(m) 结果匹配"ab"或者"ba"都可以: <_sre.SRE_Match object; span=(2, 4), match='ba'>

期末考试

期末考试

1、import urllib.request resp=urllib.request.urlopen("http://127.0.0.1:5000") ______________ print(data) 获取网站的二进制数据,缺少的语句是:
    A、data=resp.get()
    B、data=resp.getBinary()
    C、data=resp.read()
    D、data=resp.readBinary()

2、栈的设计如下: class Stack: def __init__(self): self.st=[] def pop(self): _____________________ def push(self,obj): self.st.append(obj) def empty(self): return len(self.st)==0
    A、return self.st.pop(0)
    B、return self.st.pop()
    C、return st.pop()
    D、return st.pop(0)

3、import urllib.request resp=urllib.request.urlopen("http://127.0.0.1:5000") ______________ print(html) 获取网站的HTML文本数据,缺少的语句是:
    A、html=resp.read().decode()
    B、html=resp.read().encode()
    C、html=resp.read()
    D、html=resp.get()

4、队列设计如下: class Queue: def __init__(self): self.st=[] def fetch(self): return self.st.pop(0) def enter(self,obj): _________________________________ def empty(self): return len(self.st)==0
    A、self.st.append(obj)
    B、self.st.insert(0,obj)
    C、st.append(obj)
    D、st.insert(0,obj)

5、def spider(url):   #获取新的地址newUrl if newUrl: spider(newUrl) 下面说法正确的是:
    A、程序死循环
    B、不是递归调用
    C、找不到newUrl时会结束递归调用
    D、找不到newUrl时也不会结束递归调用

6、深度优先爬取说法正确的是
    A、结果与递归调用爬取一样
    B、结果与递归调用爬取不一样
    C、效率比函数递归调用爬取低
    D、效率比函数递归调用爬取高

7、广度优先爬取数据,说法正确的是:
    A、爬取数据的顺序与深度优先的不同
    B、爬取数据的顺序与深度优先的相同
    C、爬取数据的顺序与函数递归方法相同
    D、都不对

8、from scrapy.selector import Selector htmlText="<a>A1</a><b>B1</b><c>C1</c><d>D<e>E</e></d><b>B2</b><c>C2</c>" selector=Selector(text=htmlText) s=selector.xpath("//a/preceding-sibling::*") print(s.extract()) s=selector.xpath("//b/preceding-sibling::*[position()=1]") print(s.extract()) s=selector.xpath("//b[position()=2]/preceding-sibling::*") print(s.extract()) ___________________________________________________________________ print(s.extract()) 程序结果: [] ['<a>A1</a>', '<d>D<e>E</e></d>'] ['<a>A1</a>', '<b>B1</b>', '<c>C1</c>', '<d>D<e>E</e></d>'] ['<d>D<e>E</e></d>']
    A、s=selector.xpath("//b[position()=1]/preceding-sibling::*[position()=2]")
    B、s=selector.xpath("//b[position=1]/preceding-sibling::*[position=2]")
    C、s=selector.xpath("//b[position=2]/preceding-sibling::*[position=1]")
    D、s=selector.xpath("//b[position()=2]/preceding-sibling::*[position()=1]")

9、from scrapy.selector import Selector htmlText="<a>A1</a><b>B1</b><c>C1</c><d>D<e>E</e></d><b>B2</b><c>C2</c>" selector=Selector(text=htmlText) s=selector.xpath("//a/following-sibling::*") print(s.extract()) s=selector.xpath("//a/following-sibling::*[position()=1]") print(s.extract()) s=selector.xpath("//b[position()=1]/following-sibling::*") print(s.extract()) _____________________________________________ print(s.extract()) 程序结果: ['<b>B1</b>', '<c>C1</c>', '<d>D<e>E</e></d>', '<b>B2</b>', '<c>C2</c>'] ['<b>B1</b>'] ['<c>C1</c>', '<d>D<e>E</e></d>', '<b>B2</b>', '<c>C2</c>'] ['<c>C1</c>']
    A、s=selector.xpath("//b[position()=1]/following-sibling::*[position()=1]")
    B、s=selector.xpath("//b[position()=1]/following-sibling::*")
    C、s=selector.xpath("//b/following-sibling::*[position()=1]")
    D、s=selector.xpath("//b/following-sibling::*")

10、from scrapy.selector import Selector htmlText=''' <html> <body> <bookstore> <book> <title lang="english">Harry Potter</title> <price>29.99</price> </book> <book id="b2"> <title lang="chinese">学习XML</title> <price>39.95</price> </book> </bookstore> </body></html> ''' selector=Selector(text=htmlText) ______________________________________________ print(s.extract()) 程序结果: ['<book id="b2">\n <title lang="chinese">学习XML</title>\n <price>39.95</price>\n</book>']
    A、s=selector.xpath("//title[@lang='chinese']/parent:*")
    B、s=selector.xpath("//title[lang='chinese']/parent::*")
    C、s=selector.xpath("//title[@lang='chinese']/parent::*")
    D、s=selector.xpath("//title[lang='chinese']/parent:*")

11、import threading def download(url): print(url) threads=[] urls=["http://A","http://B"] for url in urls: T=threading.Thread(target=download,args=[url]) T.start()

12、soup.select("body head title") 查找<body>下面<head>下面的<title>节点;

13、import threading def download(url): print(url) threads=[] urls=["http://A","http://B"] for url in urls: T=threading.Thread(target=download,args=(url)) T.start()

14、import threading def download(url): print(url) threads=[] urls=["http://A","http://B"] for url in urls: T=threading.Thread(target=download,args=url) T.start()

15、selector.xpath("//bookstore/book") 搜索<bookstore>下一级的<book>元素,找到2个;

16、selector.xpath("//body/book") 搜索<body>下一级的<book>元素,结果为空;

17、selector.xpath("//body//book") 搜索<body>下<book>元素,找到2个;

18、selector.xpath("/body//book") 搜索文档下一级的<body>下的<book>元素,找结果为空,因为文档的下一级是<html>元素,不是<body>元素;

19、selector.xpah("//book/title") 搜索文档中所有<book>下一级的<title>元素,找到2个,结果与selector.xpah("//title")、selector.xpath("//bookstore//title")一样;

20、selector.xpath("//book//price")与selector.xpath("//price")结果一样,都是找到2个<price>元素;