# 机器学习排序LTR入门——线性模型

| 作者 Doug Turnbull 0 他的粉丝 ，译者 杨振涛 5 他的粉丝 发布于 2017年10月23日. 估计阅读时间: 17 分钟 | QCon上海2018 关注大数据平台技术选型、搭建、系统迁移和优化的经验。

LTR首先是一个回归问题

LTR同样是一个回归问题。你手头上有一系列评价数据，来衡量一个文档与某个查询的相关度等级。我们的相关度等级取值从A到F，更常见的情况是取值从0（完全不相关）到4（非常相关）。如果我们先考虑一个关键词搜索的查询，如下示例：

4,Rocky,rocky
0,Turner and Hootch,rocky
3,Rocky II,rocky
1,Rambo,rocky
...

1. 一个搜索关键词在标题属性中出现过多少次
2. 一个搜索关键词在摘要属性中出现过多少次

4,1,1
0,0,0
3,0,3
1,0,1

grade,titleScore,overviewScore,ratingScore,comment:#  keywords@movietitle
4,10.65,8.41,7.40,# 1366   rocky@Rocky
3,0.00,6.75,7.00,# 12412  rocky@Creed
3,8.22,9.72,6.60,# 1246   rocky@Rocky Balboa
3,8.22,8.41,0.00,# 1374   rocky@Rocky IV
3,8.22,7.68,6.90,# 1367   rocky@Rocky II
3,8.22,7.15,0.00,# 1375   rocky@Rocky V
3,8.22,5.28,0.00,# 1371   rocky@Rocky III
2,0.00,0.00,7.60,# 154019 rocky@Belarmino
2,0.00,0.00,7.10,# 1368   rocky@First Blood
2,0.00,0.00,6.70,# 13258  rocky@Son of Rambow
2,0.00,0.00,0.00,# 70808  rocky@Klitschko
2,0.00,0.00,0.00,# 64807  rocky@Grudge Match
2,0.00,0.00,0.00,# 47059  rocky@Boxing Gym
...

from sklearn.linear_model import LinearRegression
from math import sin
import numpy as np
import csv

rockyData = np.genfromtxt('rocky.csv', delimiter=',')[1:] # Remove the CSV header

rockyGrades = rockyData[:,0]   # Slice out column 0, where the grades are
rockySignals = rockyData[:,1:-1]  # Features in columns 1...all but last column (the comment)


butIRegress = LinearRegression()


butIRegress.coef_  #boost for title, boost for overview, boost for rating

array([ 0.04999419,  0.22958357,  0.00573909])

butIRegress.intercept_

0.97040804634516986


def relevanceScore(intercept, titleCoef, overviewCoef, ratingCoef, titleScore, overviewScore, movieRating):
return intercept + (titleCoef * titleScore) + (overviewCoef * overviewScore) + (ratingCoef * movieRating)


titleScore,overviewScore,movieRating,comment
12.28,9.82,6.40,# 7555  rambo@Rambo
0.00,10.76,7.10,# 1368  rambo@First Blood


# Score Rambo
relevanceScore(butIRegress.intercept_, butIRegress.coef_[0], butIRegress.coef_[1], butIRegress.coef_[2], titleScore=12.28, overviewScore=9.82, movieRating=6.40)
# Score First Blood
relevanceScore(butIRegress.intercept_, butIRegress.coef_[0], butIRegress.coef_[1], butIRegress.coef_[2], titleScore=0.00, overviewScore=10.76, movieRating=7.10)