China Naming Network - Almanac query - How to use Python as a reptile?

How to use Python as a reptile?

In our daily surfing the Internet, we often see some beautiful pictures, so we hope to save and download these pictures, or users can use them as desktop wallpaper or design materials.

our most common practice is to click the right mouse button and select Save As. However, there is no save as option when you right-click some pictures, and there is another way to intercept them through the screenshot tool, but this will reduce the clarity of the pictures. Well, in fact, you are very good. Right click to view the page source code.

we can use python? To realize such a simple crawler function and crawl the code we want to the local area. Let's take a look at how to use python to achieve such a function.

specific steps

get the whole page data. first, we can get the whole page information of the pictures to be downloaded.

getjpg.py

#coding=utf-8import urllibdef getHtml(url):

page = urllib.urlopen(url)

html = page.read() Return html

html = gethtml ("blogs.com/fnng/archive/213/5/2/389816.html

If we find some beautiful wallpapers in Baidu Post Bar, we can check the tools in the front section. Found the address of the picture, such as: src = "/forum ... jpg" pic _ ext = "JPEG"

Modify the code as follows:

Import Reimport URL def getHTML (URL):

Page = URL. URL

HTML = page.read. \.jpg)" pic_ext'

imgre = re.compile(reg)

imglist = re.findall(imgre,html) return imglist ?

html = gethtml ("/p/24615866") print getImg (html)

We also created the getimg () function, which is used to filter the required picture connections in the whole page. The re module mainly contains regular expressions:

re.pile ()? You can compile a regular expression into a regular expression object.

re.findall ()? Method to read html? Included in? Imgre (regular expression) data.

running the script will get the URL address of the picture in the whole page.

3. Save the data filtered by the page to the local area

Traverse the filtered image address through the for loop and save it to the local area. The code is as follows:

# coding = utf-8 importurllib importredef gethtml (URL):

page = urllib. URL (URL)

html = page.read () return htmldef getimg (html):

reg = r 'src = "(. \.jpg)" pic_ext'

imgre = re.compile(reg)

imglist = re.findall(imgre,html)

x = for imgurl in imglist:

urllib.urlretrieve(imgurl, % s.jpg'% x)

x+= 1 html = gethtml ("/p/24615866") print getimg (html)

The core here is to use the urllib.urlretrieve () method to download remote data directly to the local area.

the obtained image connection is traversed through a for loop. In order to make the file name of the image look more standardized, it is renamed, and the naming rule adds 1 through the X variable. The saved location defaults to the storage directory of the program.

after the program runs, you will see the downloaded files in the directory.