diff --git a/cookbook/c05/__init__.py b/cookbook/c05/__init__.py new file mode 100644 index 0000000..0baca96 --- /dev/null +++ b/cookbook/c05/__init__.py @@ -0,0 +1,7 @@ +#!/usr/bin/env python +# -*- encoding: utf-8 -*- +""" +Topic: sample +Desc : +""" + diff --git a/cookbook/c05/p01_rw_text.py b/cookbook/c05/p01_rw_text.py new file mode 100644 index 0000000..4f4a1eb --- /dev/null +++ b/cookbook/c05/p01_rw_text.py @@ -0,0 +1,22 @@ +#!/usr/bin/env python +# -*- encoding: utf-8 -*- +""" +Topic: 读写文本文件 +Desc : +""" + +def rw_text(): + # Iterate over the lines of the file + with open('somefile.txt', 'rt') as f: + for line in f: + # process line + print(line) + + # Write chunks of text data + with open('somefile.txt', 'wt') as f: + f.write('text1') + f.write('text2') + +if __name__ == '__main__': + rw_text() + diff --git a/source/c05/p01_read_write_text_data.rst b/source/c05/p01_read_write_text_data.rst index 0ff63b0..7cbc878 100644 --- a/source/c05/p01_read_write_text_data.rst +++ b/source/c05/p01_read_write_text_data.rst @@ -5,14 +5,69 @@ ---------- 问题 ---------- -todo... +你需要读写各种不同编码的文本数据,比如ASCII,UTF-8或者UTF-16编码等。 + +| ---------- 解决方案 ---------- -todo... +使用带有rt模式的open()函数读取文本文件。如下所示: + +.. code-block:: python + + # Read the entire file as a single string + with open('somefile.txt', 'rt') as f: + data = f.read() + + # Iterate over the lines of the file + with open('somefile.txt', 'rt') as f: + for line in f: + # process line + ... + +类似的,为了写入一个文本文件,使用带有wt模式的open()函数,如果之前文件内容存在则清除并覆盖掉。如下所示: + +.. code-block:: python + + # Write chunks of text data + with open('somefile.txt', 'wt') as f: + f.write(text1) + f.write(text2) + ... + + # Redirected print statement + with open('somefile.txt', 'wt') as f: + print(line1, file=f) + print(line2, file=f) + ... + +如果是在已存在文件中添加内容,使用模式为at的open()函数。 + +文件的读写操作默认使用系统编码,可以通过调用sys.getdefaultencoding()来得到。 +在大多数机器上面都是utf-8编码。如果你已经知道你要读写的文本是其他编码方式, +那么可以通过传递一个科学的encoding参数给open()函数。如下所示: + +.. code-block:: python + + with open('somefile.txt', 'rt', encoding='latin-1') as f: + ... + +Python支持非常多的文本编码。几个常见的编码是ascii, latin-1, utf-8和utf-16。 +在web应用程序中通常都使用的是UTF-8。 +ascii对应从U+0000到U+007F范围内的7位字符。 +latin-1是字节0-255到U+0000至U+00FF范围内Unicode字符的直接映射。 +当读取一个未知编码的文本时使用latin-1编码永远不会产生解码错误。 +使用latin-1编码读取一个文件的时候也许不能产生完全正确的文本解码数据, +但是它也能从中提取出足够多的有用数据。同时,如果你之后将数据回写回去,原先的数据还是会保留的。 + +| ---------- 讨论 ---------- -todo... \ No newline at end of file +Reading and writing text files is typically very straightforward. However, there are a +number of subtle aspects to keep in mind. First, the use of the with statement in the +examples establishes a context in which the file will be used. When control leaves the +with block, the file will be closed automatically. You don’t need to use the with statement, +but if you don’t use it, make sure you remember to close the file: diff --git a/source/chapters/p05_files_and_io.rst b/source/chapters/p05_files_and_io.rst index fdd5e2c..a9504cd 100644 --- a/source/chapters/p05_files_and_io.rst +++ b/source/chapters/p05_files_and_io.rst @@ -2,12 +2,9 @@ 第五章:文件与IO ============================= -Python provides a variety of useful built-in data structures, such as lists, sets, and dictionaries. -For the most part, the use of these structures is straightforward. However, -common questions concerning searching, sorting, ordering, and filtering often arise. -Thus, the goal of this chapter is to discuss common data structures and algorithms -involving data. In addition, treatment is given to the various data structures contained -in the collections module. +所有程序都要处理输入和输出。 +这一章将涵盖处理不同类型的文件,包括文本和二进制文件,文件编码和其他相关的内容。 +对文件名和目录的操作也会涉及到。 Contents: