寻路推荐-豆瓣推荐系统实践之路

wlpwlfs

贡献于2015-10-20

字数:0 关键词:

寻路推荐 ——豆瓣推荐系统实践 阿稳 豆瓣算法组 今天讲什么 三个案例 怎么讲 三步流程 • CF与 Content-based • 专家推荐 • 从算法到系统 点在哪? 困境 1 头部效应 UGC的尴尬 CF正确 地反映了数据空间,但得到了 毫无意义 的结果 为什么我就是知道它们不搭调 Genre_1 Genre_2 Genre_i CF CF CF Weighted Weighted Weighted Content-based分组 +Weighted-CF 算法改变世界 困境 2 我们需要一个新的读书豆瓣猜 它要经常变的,这样我才会不时地来看看 要推荐新的书,这样才有惊喜,红楼梦天龙八部拜托就不要推了 我的阅读兴趣有很多方面,能不能不要只推文学类的 我不相信机器,你总得告诉我为什么要推荐这个吧 冷启动?新颖度?多样性?时效性?推荐解释? CF的问题? The Wisdom of the Few ACollaborativeFilteringApproachBasedonExpertOpinionsfromtheWeb Xavier Amatriain Telefonica Research Via Augusta, 177 Barcelona 08021, Spain xar@tid.es Neal Lathia Dept. of Computer Science University College of London Gower Street London WC1E 6BT, UK n.lathia@cs.ucl.ac.uk Josep M. Pujol Telefonica Research Via Augusta, 177 Barcelona 08021, Spain jmps@tid.es Haewoon Kwak KAIST Computer Science Dept. Kuseong-dong, Yuseong-gu Daejeon 305-701, Korea haewoon@an.kaist.ac.kr Nuria Oliver Telefonica Research Via Augusta, 177 Barcelona 08021, Spain nuriao@tid.es ABSTRACT Nearest-neighbor collaborative filtering provides a successful means of generating recommendations for web users. How- ever, this approach suffers from several shortcomings, in- cluding data sparsity and noise, the cold-start problem, and scalability. In this work, we present a novel method for rec- ommending items to users based on expert opinions. Our method is a variation of traditional collaborative filtering: rather than applying a nearest neighbor algorithm to the user-rating data, predictions are computed using a set of ex- pert neighbors from an independent dataset, whose opinions are weighted according to their similarity to the user. This method promises to address some of the weaknesses in tradi- tional collaborative filtering, while maintaining comparable accuracy. We validate our approach by predicting a subset of the Netflix data set. We use ratings crawled from a web portal of expert reviews, measuring results both in terms of prediction accuracy and recommendation list precision. Fi- nally, we explore the ability of our method to generate useful recommendations, by reporting the results of a user-study where users prefer the recommendations generated by our approach. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Information Filtering General Terms Algorithms, Performance, Theory Keywords Recommender Systems, Collaborative Filtering, Experts, Co- sine Similarity, Nearest Neighbors, Top-N Recommendations Permission to make digital or hard copies of all or part of thisworkfor personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage andthatcopies bear this notice and the full citation on the first page. To copyotherwise,to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. 1. INTRODUCTION Collaborative filtering (CF) is the current mainstream ap- proach used to build web-based recommender systems [1]. CF algorithms assume that in order to recommend items to users, information can be drawn from what other similar users liked in the past. The Nearest Neighbor algorithm, for instance, does so by finding, for each user, a number of similar users whose profiles can then be used to predict rec- ommendations. However, defining similarity between users is not an easy task: it is limited by the sparsity and noise in the data and is computationally expensive. In this work, we explore how professional raters in a given domain (i.e. experts)canpredictthebehaviorofthegen- eral population. In recent work [2], we have found that a significant part of the error in explicit feedback-based CF algorithms is due to the noise in the users’ explicit feedback. Therefore, we aim at using feedback from less noisy sources (i.e. experts in the context of this work) to build recommen- dations. We define an expert as an individual that we can trust to have produced thoughtful, consistent and reliable evaluations (ratings) of items in a given domain. Our goal is not to increase CF accuracy, but rather to: (a) study how preferences of a large population can be pre- dicted by using a very small set of users; (b) understand the potential of an independent and uncorrelated data set to generate recommendations; (c) analyze whether profes- sional raters are good predictors for general users; and (d) discuss how this approach addresses some of the traditional pitfalls in CF. The contributions of this paper include: 1. Collecting and comparing, in Section 2, the charac- teristics of two datasets: the Netflix dataset1 of user- movie ratings, and the opinions collected from the web from over 150 movie critics (experts). 2. Designing an approach to predict personalized user ratings from the opinions of the experts. Section 3 outlines traditional CF and describes the proposed al- gorithm. 1http://www.netflixprize.com http://www.wentrue.net/blog/?p=1034 在大部分的场合,我们需要的并不是与自己相似的 用户的推荐,而是与自己相似的专家的推荐 侦探 穿越 choose split split split filter filter filter 不明确面目的群体:统计特征 有明确面目的个人:特征描画 解决新颖性、多样性、分众性 解决个性化匹配 算法改变世界 困境 3 怎么解决新用户问题 怎么在多个算法间进行调度 怎么评价不同算法的质量 怎么灵活地调整算法的参数 怎么平衡不同用户群在不同状态的需求 怎么根据用户实时反馈做出响应 http://www.wentrue.net/blog/?p=1034 一个算法不能解决所有问题,多个算法要解决很多问题 Duine的思路 Dispatcher 我们需要一个这样的系统 DJ Parameters DJ DJ DJDJ Profile Evaluator Feedback 算法改变世界 全站用户满意度: 0.75 新用户满意度: 0.6 表现最好的 DJ算法红心率: 18% 无数据无真相,该系统在当时的表现: 对比热门歌曲推荐 DJ( baseline),满意度: 0.4,红心率 =垃圾桶率 =5% 更多的挑战,更多的算法机会 • 尽快地让用户度过冷启动阶段 • 保持用户听歌会话的一致性 • 新歌曲进入 /劣质歌曲淘汰的闭环 • 公共频道的个性化 • 移动端的推荐算法 • 新的产品形式 随着用户成长的推荐系统 Q & A 本讲稿感谢豆瓣,感谢 http://images.google.com/,图片非本人版权,纯属借用 http://www.douban.com/people/wentrue/ http://www.wentrue.net/blog/ http://www.weibo.com/wentrue http://site.douban.com/Jobs/

下载文档,方便阅读与编辑

文档的实际排版效果,会与网站的显示效果略有不同!!

需要 6 金币 [ 分享文档获得金币 ] 1 人已下载

下载文档

相关文档