Overview:
- Importance of Recommender Systems: Recommender systems are vital in user-centric online services like e-commerce, media-sharing, and social networking sites. They provide personalized content suggestions, which help alleviate information overload and improve user experience, leading to increased traffic and profit for content providers.
- Role of Multimedia Data: Multimedia data (images, videos) are prevalent on the web, and leveraging the rich signals in this data can improve recommender systems, which typically rely on collaborative filtering on user behavior data.
- Challenge of Robustness: Despite advances in multimedia recommendation using deep learning, the robustness of these systems is underexplored. Small perturbations in input images can significantly decrease recommendation accuracy, highlighting a potential weakness.
- Proposed Solution: The paper introduces Adversarial Multimedia Recommendation (AMR) to enhance the robustness of multimedia recommender systems through adversarial learning. By training the model to defend against adversarial perturbations, the model's robustness and accuracy can be improved.
2. Preliminaries
- Latent Factor Model (LFM): A common approach in recommendation systems that represents users and items as vectors in a latent space. The preference score of a user for an item is estimated by the inner product of their respective latent vectors.

- Visual Bayesian Personalized Ranking (VBPR): An extension of LFM for multimedia recommendation that incorporates visual features of items into the model. VBPR combines collaborative filtering and visual features to predict user preferences more accurately.

-
- where pu and qi are the latent vectors for user u and item i, respectively, hu is the user's embedding in the image latent space, ci is the visual feature vector of item i, and E is the transformation matrix.
- Vulnerability of VBPR: VBPR, like other DNN-based models, is susceptible to adversarial attacks, where small perturbations in images can drastically alter recommendation results.
3. Adversarial Multimedia Recommendation (AMR)
- Predictive Model: AMR modifies VBPR by adding adversarial perturbations to the image features during training to enhance robustness. The model learns to minimize the impact of these perturbations on recommendation accuracy.

- Adversary Construction: Perturbations are applied to the deep image feature vector rather than the raw image pixels, making the learning process more efficient and less prone to overfitting.

- Objective Function: The training objective combines minimizing the original BPR loss with minimizing the loss induced by adversarial perturbations, effectively creating a minimax game between the model and the adversary.
