Turning data into knowledge.

Pylab

Visualization of data. Pylab, an easy way to analyze data and present the results as plots.

本周从Pylab开始讲数据的可视化。

import pylab

pylab.figure(1) # make figure 1 the current figure
pylab.plot([1,2,3,4], [1,7,3,5]) # draw on figure 1
pylab.show() # show figure on screen

pylab.plot()中两个list的长度要相同,因为这里分别给出坐标系中每个点的x坐标和y坐标。

import pylab

pylab.figure(1) #create figure 1
pylab.plot([1,2,3,4], [1,2,3,4]) #draw on figure 1
pylab.figure(2) #create figure 2
pylab.plot([1,4,2,3], [5,6,7,8]) #draw on figure 2
pylab.savefig('Figure-Eric') #save figure 2
pylab.figure(1) #go back to working on figure 1
pylab.plot([5,6,10,3]) #draw again on figure 1
pylab.savefig('Figure-Grimson') #save figure 1

pylab不仅可以使用pylab.show()把绘制的chart显示出来,还可以保存为png文件。

如果pylab.plot()中只给出一个list,这个list会被当作y坐标。而x坐标会生成range(len(y)), 在这里也就是range(4)既[0,1,2,3]。

第二个例子,是利率的例子,这也是编程学习中常用的例子。

import pylab

principal = 10000 #initial investment
interestRate = 0.05
years = 20
values = []
for i in range(years + 1):
   values.append(principal)
   principal += principal*interestRate
pylab.plot(range(years+1), values)
pylab.title('5% Growth, Compounded Annually')
pylab.xlabel('Years of Compounding')
pylab.ylabel('Value of Principal ($)')
pylab.show()

这个图体现了一笔1万美金的投资,在复合年利率5%情况下,20年的增长曲线。也就是说在第20年,这笔钱会变成26000多。同时,通过pylab.title()pylab.xlable()pylab.ylable()。我们还对图表给出了进一步的定义。

对于绘制的曲线,我们可以定义它的color和style,例如在pylab.plotb-表示蓝色连续的线,ro,表示红色的点。还可以定义线宽,linewidth=30表示线宽是30points。在定义string的时候还可以用fontsize=12来定义字的大小。

Mortgage

接下来介绍了一个稍微复杂一点的案例

  • 从银行借款$200,000。
  • 期限:30年
  • 银行给出3种还款方式:
    • 30年固定7%年利率。
    • 先还款3.5%,可以享受30年5%的固定年利率。
    • 前48个月利率5%,之后利率升为9.5%。

问哪种最划算(总还款额最低)?。

这里主要讲如何利用class来实现代码复用。

首先我们写一个function,来确定在等额还款的情况下,每月需要还多少钱。

Month Interests Balance
1 loan*R loan*(1+R)-X
2 (loan*(1+R)-X)*R (loan*(1+R)-X)*(1+R)-X
3 ((loan*(1+R)-X)*(1+R)-X)*R ((loan*(1+R)-X)*(1+R)-X)*(1+R)-X
4    
5    
   
m    

因为还款额不变,所以每个月要计算利息的金额一直在变。这里求X的值有两种计算方法。一种是:

mx = loan + totalInterests

例如如果只借一个月,那么

mx = loan + loan*R, m = 1

也就是:x = loan*(1+R)

如果借两个月

mx = loan + 一月利息 + 二月利息

mx = loan + loan*R + (loan*(1+R)-x)*R

合并一下

mx = loan*(1+R) + loan*R*(1+R)-R*x = loan*(1+R)2 -R*x, 当 m = 2 时:

2x = loan*(1+R)2 - x*R

x = loan*(1+R)2/(2+R)

这时,如果我等号右边,上下都乘以R

x = loan*(1+R)2/(2+R)=loan*R*(1+R)2/2R+R2=loan*R*(1+R)2/(R+1)2-1

如果你继续算下去,以此类推:

如果分三个月还款:

x = loan*R*(1+R)3/(R+1)3-1

如果分m个月还款

x = loan*R*(1+R)m/(R+1)m-1

这也就是我们代码的第一个function的来源。

def findPayment(loan, r, m):
    """Assume: loan and r are floats, m an int
    Returns the monthly payment for a mortgage of size
    loan at a monthly rate of r for m months"""
    
    return loan*((r*(1+r)**m)/((1+r)**m-1))
class Mortgage(object):
    """Abstract class for building different kinds of mortgages"""
    def __init__(self, loan, annRate, months):
        """Create a new mortgage"""
        self.loan = loan
        self.rate = annRate/12.0
        self.months = months
        self.paid = [0.0] #还款金额用list来记录
        self.owed = [loan] #还欠款金额也用list,也就是每月的balance
        self.payment = findPayment(loan, self.rate, months)
        self.legend = None #description of mortgage

    def makePayment(self):
        """Make a payment"""
        self.paid.append(self.payment)
        reduction = self.payment - self.owed[-1]*self.rate
        self.owed.append(self.owed[-1] - reduction)
		# newBalance = 上一个Balance - 已经还的钱 + 上期Balance的利息
    def getTotalPaid(self):
        """Return the total amount paid so far"""
        return sum(self.paid)
        
    def __str__(self):
        return self.legend

之后,定义了第一个class。

# fixed-rate mortgage
class Fixed(Mortgage):
    def __init__(self, loan, r, months):
        Mortgage.__init__(self, loan, r, months)
        self.legend = 'Fixed, ' + str(r*100) + '%'

固定利率class继承自Mortgage,没什么变化

# fixed-rate mortgage with up-front points
class FixedWithPts(Fixed):
    def __init__(self, loan, r, months, pts):
        Fixed.__init__(self, loan, r, months)
        self.pts = pts
        self.paid = [loan*(pts/100.0)]
        self.legend += ', ' + str(pts) + ' points'

FixedWithPts继承自fixed,添加了pts这个参数,同时self.paid做了更改,也就是借款之后立刻归还3.5%。

# mortgage that changes interest rate after 48 months
class TwoRate(Mortgage):
    def __init__(self, loan, r, months, teaserRate, teaserMonths):
        Mortgage.__init__(self, loan, teaserRate, months)
        self.teaserMonths = teaserMonths
        self.teaserRate = teaserRate
        self.nextRate = r/12.0
        self.legend = str(teaserRate*100)\
            + '% for ' + str(self.teaserMonths)\
            + ' months, then ' + str(r*100) + '%'

    def makePayment(self):
        # 在第49个月利率和还款金额都有变。
        if len(self.paid) == self.teaserMonths + 1:
            self.rate = self.nextRate
            self.payment = findPayment(self.owed[-1], self.rate,
                                        self.months - \
                                        self.teaserMonths)
        Mortgage.makePayment(self) #1-48个月不变

第三种情况比较复杂,利率在第49个月的时候有了变化。因而makePayment要重写。

# helper function to try out alternatives
def compareMortgages(amt, years, fixedRate, pts, ptsRate,
                    varRate1, varRate2, varMonths):
    # create alternative mortgages
    totMonths = years*12
    fixed1 = Fixed(amt, fixedRate, totMonths)
    fixed2 = FixedWithPts(amt, ptsRate, totMonths, pts)
    twoRate = TwoRate(amt, varRate2, totMonths, varRate1,
                        varMonths)
    morts = [fixed1, fixed2, twoRate]

    # run experiment!
    for m in range(totMonths):
        for mort in morts:
            mort.makePayment()

    # report results
    for m in morts:
        print m
        print ' Total payments = $' + str(int(m.getTotalPaid()))

之后写了一个helper function来跑这段代码。这样可以一次性的把数据带入并得出结果。

# amount borrowed: $200,000
# term: 30 years
#   option 1: rate = 7%
#   option 2: 3.25 points upfront, rate = 5%
#   option 3: 48 months of rate = 5%, then rate = 9.5%

compareMortgages(amt=200000, years=30,
                fixedRate = 0.07,
                pts = 3.25, ptsRate=0.05,
                varRate1=0.045, varRate2=0.095, varMonths=48)

最后,把数据带入。

经过计算我们可以直接得出结果。但本节课我们学习的重点是数据可视化,也就是结合pylab,把利率这个问题以graph的形式展现出来。

import pylab

#set line width
pylab.rcParams['lines.linewidth'] = 6
#set font size for titles 
pylab.rcParams['axes.titlesize'] = 20
#set font size for labels on axes
pylab.rcParams['axes.labelsize'] = 20
#set size of numbers on x-axis
pylab.rcParams['xtick.major.size'] = 5
#set size of numbers on y-axis
pylab.rcParams['ytick.major.size'] = 5

首先作者调整了pylab的默认值。

class MortgagePlots(object):
    
    def plotPayments(self, style):
        pylab.plot(self.paid[1:], style, label = self.legend)
        
    def plotTotPd(self, style):
        totPd = [self.paid[0]]
        for i in range(1, len(self.paid)):
            totPd.append(totPd[-1] + self.paid[i])
        pylab.plot(totPd, style, label = self.legend)

然后还写了一个MorgatePlots的class,用里面的method来绘制两幅图表,一个是每月还多少钱,另一个是还钱的累加。

class Mortgage(MortgagePlots, object):
    """Abstract class for building different kinds of mortgages"""
    def __init__(self, loan, annRate, months):
        """Create a new mortgage"""
        self.loan = loan
        self.rate = annRate/12.0
        self.months = months
        self.paid = [0.0]
        self.owed = [loan]
        self.payment = findPayment(loan, self.rate, months)
        self.legend = None #description of mortgage

    def makePayment(self):
        """Make a payment"""
        self.paid.append(self.payment)
        reduction = self.payment - self.owed[-1]*self.rate
        self.owed.append(self.owed[-1] - reduction)

    def getTotalPaid(self):
        """Return the total amount paid so far"""
        return sum(self.paid)

    def __str__(self):
        return self.legend

他还把这个class写入了Mortgage的父类里,以便Mortgage的instance可以使用这些method。python允许多重继承,但寻名规则比较复杂。如果不了解规则, 建议不要在代码中轻易使用多重继承。因为复杂的多重继承不容易debug。

def plotMortgages(morts, amt):
    styles = ['b-', 'r-.', 'g:']
    payments = 0 #number to identify a figure
    cost = 1 #number to identify a figure
    pylab.figure(payments)
    pylab.title('Monthly Payments of Different $' + str(amt)
                + ' Mortgages')
    pylab.xlabel('Months')
    pylab.ylabel('Monthly Payments')
    pylab.figure(cost)
    pylab.title('Cost of Different $' + str(amt) + ' Mortgages')
    pylab.xlabel('Months')
    pylab.ylabel('Total Payments')
    for i in range(len(morts)):
        pylab.figure(payments)
        morts[i].plotPayments(styles[i])
        pylab.figure(cost)
        morts[i].plotTotPd(styles[i])
    pylab.figure(payments)
    pylab.legend(loc = 'upper center')
    pylab.figure(cost)
    pylab.legend(loc = 'best')

在其余代码不变的情况下,写了一个新的function plotMortgages来让不同的instance把图表绘制出来。

def compareMortgages(amt, years, fixedRate, pts, ptsRate,
                    varRate1, varRate2, varMonths):
    totMonths = years*12
    fixed1 = Fixed(amt, fixedRate, totMonths)
    fixed2 = FixedWithPts(amt, ptsRate, totMonths, pts)
    twoRate = TwoRate(amt, varRate2, totMonths,
                      varRate1, varMonths)
    morts = [fixed1, fixed2, twoRate]
    for m in range(totMonths):
        for mort in morts:
            mort.makePayment()
    plotMortgages(morts, amt)

compareMortgages(amt=200000, years=30, fixedRate=0.07,
                 pts = 3.25, ptsRate=0.05, varRate1=0.045,
                 varRate2=0.095, varMonths=48)
pylab.show()

最后在compareMortgages中call plotMortgagespylab.show()。这样就完成一个pylab在现实中的应用演示。

Source Code


在介绍了pylab之后,TA做了一个视频,介绍了python function中的default keywords arguments. 总结一下:

  1. python function中的parameter,如果没有默认值,在call这个function的时候要全部给出。但又时候很多参数不是每次call的时候都需要改变,这时就可以用赋值的方式把它的默认值写上。

  2. 有了默认值的function在使用的时候有一定规则。non-keyword arguments要先给出,之后再给需要修改的默认值。默认值可以按顺序赋值,也可以用=来直接赋值。

  3. 当遇到参数比较多的function时,有时在call这个function时可以把non-keyword参数也用=来赋值,这样比较容易读懂给定参数会被用在哪里。

这里顺便提一下在函数定义中经常看到的*args**kwargs*表示多。前者表示参数的长度不确定,后者表示有多个keyword arguments会给入。

拿Python官方文档做个例子

def cheeseshop(kind, *arguments, **keywords):
    print "-- Do you have any", kind, "?"
    print "-- I'm sorry, we're all out of", kind
    for arg in arguments:
        print arg
    print "-" * 40
    keys = sorted(keywords.keys())
    for kw in keys:
        print kw, ":", keywords[kw]

当我们这样call这个函数时:

cheeseshop("Limburger", "It's very runny, sir.",
           "It's really very, VERY runny, sir.",
           shopkeeper='Michael Palin',
           client="John Cleese",
           sketch="Cheese Shop Sketch")

的到的结果是:

-- Do you have any Limburger ?
-- I'm sorry, we're all out of Limburger
It's very runny, sir.
It's really very, VERY runny, sir.
----------------------------------------
client : John Cleese
shopkeeper : Michael Palin
sketch : Cheese Shop Sketch