SlideShare a Scribd company logo
1 of 37
Download to read offline
Coen Stevens
Lead Recommendation Engineer
How to build a recommender system?
           Wakoopa use case
Mission:
Discover software & games
Software tracker
Windows         Mac          Linux
Your profile
Updates
Software pages
Recommendations
Building a recommender system
       Approach and challenges
Data
                      what do we have?

Usage (implicit)                       Ratings (explicit)
                             vs.


•                                  •
    Noisy                              Accurate


•                                  •
    Only positive feedback             Positive and negative
                                       feedback

•                                  •
    Easy to collect                    Hard to collect
Data
                       what do we use?


•   Active users (Tracker activity in the past month): ~9.000

•   Actively used software items (in the past month): ~10.000

•   We calculate recommendations for each OS together with
    Web applications separately
Recommender system methods
Collaborative recommendations: The user will be
recommended items that people with similar tastes and
preferences liked (used) in the past

•   Item-based collaborative filtering

•   User-based collaborative filtering (we only use for
    calculating user similarities to find people like you)

•   Combining both methods
Item-Based Collaborative Filtering
           User software usage matrix
                     Software items




             220   90         180          22

             280   12    42           80

   Users     175 210          210          45

             165         35   195     13   25

                   100   50   185          35   190

                   60         65                185
User software usage matrix [0, 1]
                      Software items




              1   1      0     1       0   1   0

              1   1      1     0       1   0   0

Users         1   1      0     1       0   1   0

              1   0      1     1       1   1   0

              0   1      1     1       0   1   1

              0   1      0     1       0   0   1
How do we predict the probability that I would like to use GMail?
                              Software items




                      1   1      0     1       0   1   0

                      1   1      1     0       1   0   0

                                 ?
         Users        1   1            1       0   1   0

                      1   0      1     1       1   1   0

                      0   1      1     1       0   1   1

                      0   1      0     1       0   0   1
Calculate the similarities between Gmail and the other software items.
                                            Software items




                                1       1       0       1    0   1   0

                                1       1       1       0    1   0   0

            Users               1       1       0       1    0   1   0

                                1       0       1       1    1   1   0

                                0       1       1       1    0   1   1

                                0       1       0       1    0   0   1


                    Cosine Similarity(Firefox, Gmail)
Calculate the similarities between Gmail and the other software items.
                                            Software items




                                1       1       0       1    0   1   0

                                1       1       1       0    1   0   0

            Users               1       1       0       1    0   1   0

                                1       0       1       1    1   1   0

                                0       1       1       1    0   1   1

                                0       1       0       1    0   0   1


                    Cosine Similarity(Firefox, Gmail)
Calculate the similarities between Gmail and the other software items.
                                            Software items




                                1       1       0       1    0   1   0

                                1       1       1       0    1   0   0
                                                                         Popularity correction,
            Users               1       1       0       1    0   1   0
                                                                            we put less trust
                                1       0       1       1    1   1   0
                                                                          in popular software
                                0       1       1       1    0   1   1

                                0       1       0       1    0   0   1


                    Cosine Similarity(Firefox, Gmail)
Item-item correlation matrix



    1    0.1   0.6   0.1   0.1   0.1   0.7

   0.2   1     0.8   0.5   0.8   0.1   0.9

   0.1   0.6   1     0.5   0.7   0.2   0.3

   0.2   0.6   0.4   1     0.8   0.2   0.3

   0.5   0.4   0.4   0.4   1     0.1   0.2

   0.5   0.5   0.3   0.5   0.3   1     0.3

   0.2   0.6   0.3   0.8   0.7   0.7   1
Item-item correlation matrix
Gmail similarities




          0.6            1    0.1   0.6   0.1   0.1   0.1   0.7

          0.8           0.2   1     0.8   0.5   0.8   0.1   0.9

          0.4           0.1   0.6   1     0.5   0.7   0.2   0.3

          0.4           0.2   0.6   0.4   1     0.8   0.2   0.3

          0.3           0.5   0.4   0.4   0.4   1     0.1   0.2

          0.3           0.5   0.5   0.3   0.5   0.3   1     0.3

                        0.2   0.6   0.3   0.8   0.7   0.7   1
K-nearest neighbor approach
Gmail similarities


                     •   Performance vs quality
          0.6
                     •   We take only the ‘K’ most similar items (say 4)
          0.8

                     •   Space complexity: O(m + Kn)
          0.4


                     •
          0.4
                         Computational complexity: O(m + n²)
          0.3

          0.3
Calculate the predicted value for Gmail
Gmail similarities   User usage




                            1
          0.6

                            1
          0.8

                            1
          0.4

          0.4

                            1
Calculate the predicted value for Gmail
Gmail similarities   User usage




                           0.9
          0.6
                                        Usage correction,
                           0.8
          0.8
                                       more usage results
                                     in a higher score [0,1]
                           0.6
          0.4

          0.4

                           0.2
Calculate the predicted value for Gmail
Gmail similarities   User usage




                           0.9
          0.6

                           0.8
          0.8

                           0.6
          0.4

          0.4

                           0.2

                                          (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6)
                                                                                    = 0.82
                                                  0.6 + 0.8 + 0.4 + 0.4
Calculate the predicted value for Gmail

                                       • User feedback
Gmail similarities   User usage


                                       • Contacts usage
                           0.9
          0.6
                                       • Commercial vs Free
                           0.8
          0.8

                           0.6
          0.4

          0.4

                           0.2

                                          (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6)
                                                                                    = 0.82
                                                  0.6 + 0.8 + 0.4 + 0.4
Calculate all unknown values and
show the Top-N recommendations to each user
                    Software items




                       ?             ?
                     ?
            1   1            1           1

                  ?1??
            1 1 1

                ?1?1?
Users       1 1

              ?1111?
            1

            ?111?11
            ?1?1??1
Explainability
             Why did I get this recommendation?


•   Overlap between the item’s (K) neighbors and your usage
User-Based Collaborative Filtering
                                 Finding people like you



                                  1   1   0   1   0   1    0

                                  1   1   1   0   1   0    0

                                  1   1   0   1   0   1    0

                                  1   1   1   1   1   1    0
Cosine Similarity(Coen, Menno)


                                  0   1   1   1   0   1    1

                                  0   1   0   1   0   0    1
Applying inverse user frequency

        log(n/ni): ni is the number of users that uses item i and n is
                  the total number of users in the database


                                    0.1   0.2   0     0.4   0     0.4   0

                                    0.1   0.2   0.6   0     0.8   0     0

                                    0.1   0.2   0     0.4   0     0.4   0

                                    0.1   0.2   0.6   0.4   0.8   0.4   0
Cosine Similarity(Coen, Menno)


                                    0     0.2   0.6   0.4   0     0.4   0.2

                                    0     0.2   0     0.4   0     0     0.2

        The fact that you both use Textmate tells you more than
                       when you both use firefox
0.1   0.2   0     0.4   0     0.4   0

                                 0.1   0.2   0.6   0     0.8   0     0

                                 0.1   0.2   0     0.4   0     0.4   0

                                 0.1   0.2   0.6   0.4   0.8   0.4   0
Cosine Similarity(Coen, Menno)


                                 0     0.2   0.6   0.4   0     0.4   0.2

                                 0     0.2   0     0.4   0     0     0.2
User-user correlation matrix



     1     0.8   0.6   0.5   0.7   0.2

     0.8   1     0.4   0.7   0.5   0.5

     0.6   0.4   1     0.4   0.9   0.1

     0.5   0.8   0.4   1     0.6   0.4

     0.8   0.5   0.9   0.6   1     0.2

     0.2   0.5   0.1   0.4   0.2   1
Performance
                 measure for success

•   Cross-validation: Train-Test split (80-20)

•   Precision and Recall:
    - precision = size(hit set) / size(total given recs)
    - recall = size(hit set) / size(test set)

•   Root mean squared error (RMSE)
Implementation

•   Ruby Enterprise Edition (garbage collection)

•   MySQL database

•   Built our own c-libraries

•   Amazon EC2:
    - Low cost
    - Flexibility
    - Ease of use

•   Open source
Future challenges


•   What is the best algorithm for Wakoopa? (or you)

•   Reducing space-time complexity (scalability):
    - Parallelization (Clojure)
    - Distributed computing (Hadoop)
1 evening, 3 speakers, 100 developers
           www.recked.org

More Related Content

What's hot

Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
Roelof van Zwol
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
Georgian Micsa
 

What's hot (20)

Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation system
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 

Viewers also liked

Design of recommender systems
Design of recommender systemsDesign of recommender systems
Design of recommender systems
Rashmi Sinha
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
NYC Predictive Analytics
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Lior Rokach
 
Impact of web 2.0 on evaluation and select solutions
Impact of web 2.0 on evaluation and select solutionsImpact of web 2.0 on evaluation and select solutions
Impact of web 2.0 on evaluation and select solutions
sarvenaz arianfar
 
Recommender Systems in E-Commerce
Recommender Systems in E-CommerceRecommender Systems in E-Commerce
Recommender Systems in E-Commerce
Roger Chen
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 

Viewers also liked (19)

Design of recommender systems
Design of recommender systemsDesign of recommender systems
Design of recommender systems
 
How to build a Recommender System
How to build a Recommender SystemHow to build a Recommender System
How to build a Recommender System
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender Engines Seminar Paper
Recommender Engines Seminar PaperRecommender Engines Seminar Paper
Recommender Engines Seminar Paper
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
How to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based FilteringHow to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based Filtering
 
genetic algorithm based music recommender system
genetic algorithm based music recommender systemgenetic algorithm based music recommender system
genetic algorithm based music recommender system
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
Wakoopa Recommendation Engine on AWS
Wakoopa Recommendation Engine on AWSWakoopa Recommendation Engine on AWS
Wakoopa Recommendation Engine on AWS
 
Impact of web 2.0 on evaluation and select solutions
Impact of web 2.0 on evaluation and select solutionsImpact of web 2.0 on evaluation and select solutions
Impact of web 2.0 on evaluation and select solutions
 
Interaction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsInteraction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender Systems
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
 
Recommender Systems in E-Commerce
Recommender Systems in E-CommerceRecommender Systems in E-Commerce
Recommender Systems in E-Commerce
 
recommender_systems
recommender_systemsrecommender_systems
recommender_systems
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Recommender system
Recommender systemRecommender system
Recommender system
 

Similar to How to build a recommender system?

Oslo Schibsted Performance Gathering
Oslo Schibsted Performance GatheringOslo Schibsted Performance Gathering
Oslo Schibsted Performance Gathering
Almudena Vivanco
 
Presentation at Hong Kong Start-Up Association Event
Presentation at Hong Kong Start-Up Association EventPresentation at Hong Kong Start-Up Association Event
Presentation at Hong Kong Start-Up Association Event
Ben Cheng
 

Similar to How to build a recommender system? (20)

Open Social Tech Talk Beijing
Open Social Tech Talk   BeijingOpen Social Tech Talk   Beijing
Open Social Tech Talk Beijing
 
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
 
DevOps goes Mobile (daho.am)
DevOps goes Mobile (daho.am)DevOps goes Mobile (daho.am)
DevOps goes Mobile (daho.am)
 
Wakoopa Recommendations Engine on AWS
Wakoopa Recommendations Engine on AWSWakoopa Recommendations Engine on AWS
Wakoopa Recommendations Engine on AWS
 
Oslo Schibsted Performance Gathering
Oslo Schibsted Performance GatheringOslo Schibsted Performance Gathering
Oslo Schibsted Performance Gathering
 
Spil games konrad
Spil games konradSpil games konrad
Spil games konrad
 
Building native apps with web components
Building native apps with web componentsBuilding native apps with web components
Building native apps with web components
 
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
 
Keeping Swift Apps Small
Keeping Swift Apps SmallKeeping Swift Apps Small
Keeping Swift Apps Small
 
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
 
Customer Showcase for AWS IoT Analytics (IOT219) - AWS re:Invent 2018
Customer Showcase for AWS IoT Analytics (IOT219) - AWS re:Invent 2018Customer Showcase for AWS IoT Analytics (IOT219) - AWS re:Invent 2018
Customer Showcase for AWS IoT Analytics (IOT219) - AWS re:Invent 2018
 
Our Favorite Admin Features in Cognos Analytics 11.1
Our Favorite Admin Features in Cognos Analytics 11.1Our Favorite Admin Features in Cognos Analytics 11.1
Our Favorite Admin Features in Cognos Analytics 11.1
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
UCLA HACKU'11
UCLA HACKU'11UCLA HACKU'11
UCLA HACKU'11
 
Introduction to GluonNLP
Introduction to GluonNLPIntroduction to GluonNLP
Introduction to GluonNLP
 
Windows10TipsandTricksBooklet
Windows10TipsandTricksBookletWindows10TipsandTricksBooklet
Windows10TipsandTricksBooklet
 
systemd and configuration management
systemd and configuration managementsystemd and configuration management
systemd and configuration management
 
Presentation at Hong Kong Start-Up Association Event
Presentation at Hong Kong Start-Up Association EventPresentation at Hong Kong Start-Up Association Event
Presentation at Hong Kong Start-Up Association Event
 
Feature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon ReviewsFeature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon Reviews
 
IoT State of the Union
IoT State of the UnionIoT State of the Union
IoT State of the Union
 

More from blueace (8)

Research & Tracking via een Social Network
Research & Tracking via een Social NetworkResearch & Tracking via een Social Network
Research & Tracking via een Social Network
 
Enhanced research via software & web tracking
Enhanced research via software & web trackingEnhanced research via software & web tracking
Enhanced research via software & web tracking
 
(Dutch) CSN: social network succes
(Dutch) CSN: social network succes(Dutch) CSN: social network succes
(Dutch) CSN: social network succes
 
Recommendations 101
Recommendations 101Recommendations 101
Recommendations 101
 
(Dutch) Web 2.0 Succesfactoren @ Overheid 2.0
(Dutch) Web 2.0 Succesfactoren @ Overheid 2.0(Dutch) Web 2.0 Succesfactoren @ Overheid 2.0
(Dutch) Web 2.0 Succesfactoren @ Overheid 2.0
 
Roomware - The operating system for interactive spaces
Roomware - The operating system for interactive spacesRoomware - The operating system for interactive spaces
Roomware - The operating system for interactive spaces
 
Wakoopa at The Next Web 2008
Wakoopa at The Next Web 2008Wakoopa at The Next Web 2008
Wakoopa at The Next Web 2008
 
How we did RoR in Wakoopa
How we did RoR in WakoopaHow we did RoR in Wakoopa
How we did RoR in Wakoopa
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

How to build a recommender system?

  • 1.
  • 3. How to build a recommender system? Wakoopa use case
  • 10. Building a recommender system Approach and challenges
  • 11. Data what do we have? Usage (implicit) Ratings (explicit) vs. • • Noisy Accurate • • Only positive feedback Positive and negative feedback • • Easy to collect Hard to collect
  • 12. Data what do we use? • Active users (Tracker activity in the past month): ~9.000 • Actively used software items (in the past month): ~10.000 • We calculate recommendations for each OS together with Web applications separately
  • 13. Recommender system methods Collaborative recommendations: The user will be recommended items that people with similar tastes and preferences liked (used) in the past • Item-based collaborative filtering • User-based collaborative filtering (we only use for calculating user similarities to find people like you) • Combining both methods
  • 14. Item-Based Collaborative Filtering User software usage matrix Software items 220 90 180 22 280 12 42 80 Users 175 210 210 45 165 35 195 13 25 100 50 185 35 190 60 65 185
  • 15. User software usage matrix [0, 1] Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  • 16. How do we predict the probability that I would like to use GMail? Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 ? Users 1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  • 17. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  • 18. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  • 19. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Popularity correction, Users 1 1 0 1 0 1 0 we put less trust 1 0 1 1 1 1 0 in popular software 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  • 20. Item-item correlation matrix 1 0.1 0.6 0.1 0.1 0.1 0.7 0.2 1 0.8 0.5 0.8 0.1 0.9 0.1 0.6 1 0.5 0.7 0.2 0.3 0.2 0.6 0.4 1 0.8 0.2 0.3 0.5 0.4 0.4 0.4 1 0.1 0.2 0.5 0.5 0.3 0.5 0.3 1 0.3 0.2 0.6 0.3 0.8 0.7 0.7 1
  • 21. Item-item correlation matrix Gmail similarities 0.6 1 0.1 0.6 0.1 0.1 0.1 0.7 0.8 0.2 1 0.8 0.5 0.8 0.1 0.9 0.4 0.1 0.6 1 0.5 0.7 0.2 0.3 0.4 0.2 0.6 0.4 1 0.8 0.2 0.3 0.3 0.5 0.4 0.4 0.4 1 0.1 0.2 0.3 0.5 0.5 0.3 0.5 0.3 1 0.3 0.2 0.6 0.3 0.8 0.7 0.7 1
  • 22. K-nearest neighbor approach Gmail similarities • Performance vs quality 0.6 • We take only the ‘K’ most similar items (say 4) 0.8 • Space complexity: O(m + Kn) 0.4 • 0.4 Computational complexity: O(m + n²) 0.3 0.3
  • 23. Calculate the predicted value for Gmail Gmail similarities User usage 1 0.6 1 0.8 1 0.4 0.4 1
  • 24. Calculate the predicted value for Gmail Gmail similarities User usage 0.9 0.6 Usage correction, 0.8 0.8 more usage results in a higher score [0,1] 0.6 0.4 0.4 0.2
  • 25. Calculate the predicted value for Gmail Gmail similarities User usage 0.9 0.6 0.8 0.8 0.6 0.4 0.4 0.2 (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6) = 0.82 0.6 + 0.8 + 0.4 + 0.4
  • 26. Calculate the predicted value for Gmail • User feedback Gmail similarities User usage • Contacts usage 0.9 0.6 • Commercial vs Free 0.8 0.8 0.6 0.4 0.4 0.2 (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6) = 0.82 0.6 + 0.8 + 0.4 + 0.4
  • 27. Calculate all unknown values and show the Top-N recommendations to each user Software items ? ? ? 1 1 1 1 ?1?? 1 1 1 ?1?1? Users 1 1 ?1111? 1 ?111?11 ?1?1??1
  • 28. Explainability Why did I get this recommendation? • Overlap between the item’s (K) neighbors and your usage
  • 29. User-Based Collaborative Filtering Finding people like you 1 1 0 1 0 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 Cosine Similarity(Coen, Menno) 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  • 30. Applying inverse user frequency log(n/ni): ni is the number of users that uses item i and n is the total number of users in the database 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0 0.8 0 0 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0.4 0.8 0.4 0 Cosine Similarity(Coen, Menno) 0 0.2 0.6 0.4 0 0.4 0.2 0 0.2 0 0.4 0 0 0.2 The fact that you both use Textmate tells you more than when you both use firefox
  • 31. 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0 0.8 0 0 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0.4 0.8 0.4 0 Cosine Similarity(Coen, Menno) 0 0.2 0.6 0.4 0 0.4 0.2 0 0.2 0 0.4 0 0 0.2
  • 32. User-user correlation matrix 1 0.8 0.6 0.5 0.7 0.2 0.8 1 0.4 0.7 0.5 0.5 0.6 0.4 1 0.4 0.9 0.1 0.5 0.8 0.4 1 0.6 0.4 0.8 0.5 0.9 0.6 1 0.2 0.2 0.5 0.1 0.4 0.2 1
  • 33. Performance measure for success • Cross-validation: Train-Test split (80-20) • Precision and Recall: - precision = size(hit set) / size(total given recs) - recall = size(hit set) / size(test set) • Root mean squared error (RMSE)
  • 34. Implementation • Ruby Enterprise Edition (garbage collection) • MySQL database • Built our own c-libraries • Amazon EC2: - Low cost - Flexibility - Ease of use • Open source
  • 35. Future challenges • What is the best algorithm for Wakoopa? (or you) • Reducing space-time complexity (scalability): - Parallelization (Clojure) - Distributed computing (Hadoop)
  • 36.
  • 37. 1 evening, 3 speakers, 100 developers www.recked.org