python - redswallow's Blog

python字符串替换的2种方法

python 字符串替换 是python 操作字符串的时候经常会碰到的问题，这里简单介绍下字符串替换方法。
python 字符串替换可以用2种方法实现:
1是用字符串本身的方法。
2用正则来替换字符串

下面用个例子来实验下：
a = 'hello word'
把a字符串里的word替换为python

1用字符串本身的replace方法

a.replace('word','python')

输出的结果是hello python

2用正则表达式来完成替换:

import re
strinfo = re.compile('word')
b = strinfo.sub('python',a)
print b

输出的结果也是hello python

至于用哪个方法的话，看你自己的选择了。

Posted by redswallow Mon, 02 Aug 2010 22:15:29 +0800

Category: python Tag: Comment: (1570)

python去除空格/回车符/换行符

print 的最后加个逗号：

for j in range(0,len(list)):
    print list[j],

或者把line先strip()一下：

s=某字符串
print "".join(s.split())

for j in range(0,len(list)):
    print "".join(list[j].split())

Posted by redswallow Mon, 02 Aug 2010 21:20:48 +0800

Category: python Tag: Comment: (2713)

python用time函数计算程序运行时间

内置模块time包含很多与时间相关函数。我们可通过它获得当前的时间和格式化时间输出。

time()，以浮点形式返回自Linux新世纪以来经过的秒数。在linux中，00:00:00 UTC, January 1, 1970是新**49**的开始。

import time

start = time.clock()

#当中是你的程序

elapsed = (time.clock() - start)
print("Time used:",elapsed)

Posted by redswallow Sun, 18 Jul 2010 00:56:21 +0800

Category: python Tag: Comment: (8164)

python模块之 HTMLParser: 解析html,获取url

HTMLParser是python用来解析html的模块。它可以分析出html里面的标签、数据等等，是一种处理html的简便途径。 HTMLParser采用的是一种事件驱动的模式，当HTMLParser找到一个特定的标记时，它会去调用一个用户定义的函数，以此来通知程序处理。它主要的用户回调函数的命名都是以handler_开头的，都是HTMLParser的成员函数。当我们使用时，就从HTMLParser派生出新的类，然后重新定义这几个以handler_开头的函数即可。这几个函数包括：

handle_startendtag 处理开始标签和结束标签
handle_starttag     处理开始标签，比如<xx>
handle_endtag       处理结束标签，比如</xx>
handle_charref      处理特殊字符串，就是以&#开头的，一般是内码表示的字符
handle_entityref    处理一些特殊字符，以&开头的，比如  
handle_data         处理数据，就是<xx>data</xx>中间的那些数据
handle_comment      处理注释
handle_decl         处理<!开头的，比如<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
handle_pi           处理形如<?instruction>的东西

本文主要简单讲一下HTMLParser的用法.

使用时需要定义一个从类HTMLParser继承的类，重定义函数：

handle_starttag( tag, attrs)
handle_startendtag( tag, attrs)
handle_endtag( tag)

来实现自己需要的功能。

tag是的html标签，attrs是 (属性，值)元组(tuple)的列表(list).
HTMLParser自动将tag和attrs都转为小写。

下面给出的例子抽取了html中的所有链接：

from HTMLParser import HTMLParser
 
class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.links = []
 
    def handle_starttag(self, tag, attrs):
        #print "Encountered the beginning of a %s tag" % tag
        if tag == "a":
            if len(attrs) == 0: pass
            else:
                for (variable, value)  in attrs:
                    if variable == "href":
                        self.links.append(value)
 
if __name__ == "__main__":
    html_code = """
    <a href="www.google.com"> google.com</a>
    <A Href="www.pythonclub.org"> PythonClub </a>
    <A HREF = "www.sina.com.cn"> Sina </a>
    """
    hp = MyHTMLParser()
    hp.feed(html_code)
    hp.close()
    print(hp.links)

输出为：

['www.google.com', 'www.pythonclub.org', 'www.sina.com.cn']

如果想抽取图形链接

<img src='http://www.google.com/intl/zh-CN_ALL/images/logo.gif' />

就要重定义 handle_startendtag( tag, attrs) 函数

Posted by redswallow Sun, 18 Jul 2010 00:29:51 +0800

Category: python Tag: Comment: (1477)

Python初始化多维数组

Python中初始化一个5 x 3每项为0的数组，最好方法是：

multilist = [[0 for col in range(5)] for row in range(3)]

我们知道，为了初始化一个一维数组，我们可以这样做：

alist = [0] * 5

没错，那我们初始化一个二维数组时，是否可以这样做呢：

multi = [[0] * 5] * 3

其实，这样做是不对的，因为[0] * 5是一个一维数组的对象，* 3的话只是把对象的引用复制了3次，比如，我修改multi[0][0]：

multi = [[0] * 5] * 3
multi[0][0] = 'Love China'
print multi

输出的结果将是：

[['Love China', 0, 0, 0, 0], ['Love China', 0, 0, 0, 0], ['Love China', 0, 0, 0, 0]]

我们修改了multi[0][0]，却把我们的multi[1][0]，multi[2][0]也修改了。这不是我们想要的结果。

如果我们这样写呢：

multilist = [[0] * 5 for row in range(3)]
multilist[0][0] = 'Love China'
print multilist

我们看输出结果：

[['Love China', 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

恩，没问题。但是，由于使用 * 的方法比较容易引起混淆导致Bug，所以还是推荐使用上面第一种方法，即：

multilist = [[0 for col in range(5)] for row in range(3)]

Posted by redswallow Sat, 10 Jul 2010 04:07:52 +0800

Category: python Tag: Comment: (2050)

给Python IDLE加上自动补全和历史功能

许多时候，我们使用Python，并不用写一个程序，一些不复杂的任务，我更喜欢在 IDLE（也就是交互式提示模式）下输入几行代码完成。然而，在这个模式下编辑代码，也有不够便利的地方，最主要的就是，不能用Tab自动补全，不能记忆上一次输入的命令（没办法，谁让我们在Shell下习惯了呢）。
这时候，我们可以直接使用Python启动脚本，解决这个问题。

启动脚本的程序非常简单，这里不多说明，只给出代码：

import readline
import rlcompleter
import atexit
import os
# tab autocomplete
readline.parse_and_bind(‘tab: complete’)
# history file
histfile = os.path.join(os.environ['HOME'], ‘.pythonhistory’)
try:
readline.read_history_file(histfile)
except IOError:
pass
atexit.register(readline.write_history_file, histfile)
del os, histfile, readline, rlcompleter

完成之后，我们把它保存为.pythonstartup，存放在自己的目录下（譬如/home/yurii），再将PYTHONSTARTUP变量指向刚才放的地址，就可以了。最省事的办法是在bashrc中添加这样一行：

export PYTHONSTARTUP=/home/yurii/.pythonstartup

这样，不但增加了tab的自动补全功能，而且重新启动IDLE时，通过上下键，还能翻到上次输入的命令，非常方便。

Posted by redswallow Sun, 23 May 2010 22:03:41 +0800

Category: python Tag: Comment: (3200)

redswallow's Blog

Happy coding

python字符串替换的2种方法

python去除空格/回车符/换行符

python用time函数计算程序运行时间

python模块之 HTMLParser: 解析html,获取url

Python初始化多维数组

给Python IDLE加上自动补全和历史功能

redswallow

Categories

New Comments

New Messages

Links

Meta