240

Update Delete

ID	240
Original Title	Can Who-Edits-What Predict Edit Survival?
Sanitized Title	canwhoeditswhatpredicteditsurvival
Clean Title	Can Who-Edits-What Predict Edit Survival?
Source ID	2
Article Id01	211984064
Article Id02	oai:infoscience.epfl.ch:255885
Corpus ID	(not set)
Dup	(not set)
Dup ID	(not set)
Url	https://core.ac.uk/outputs/211984064
Publication Url	(not set)
Download Url	https://core.ac.uk/download/211984064.pdf
Original Abstract	As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits. We view each edit as a game between the editor and the component of the project. We posit that the probability that an edit is accepted is a function of the editor's skill, of the difficulty of editing the component and of a user-component interaction term. Our model is broadly applicable, as it only requires observing data about who makes an edit, what the edit affects and whether the edit survives or not. We apply our model on Wikipedia and the Linux kernel, two examples of large-scale peer-production systems, and we seek to understand whether it can effectively predict edit survival: in both cases, we provide a positive answer. Our approach significantly outperforms those based solely on user reputation and bridges the gap with specialized predictors that use content-based features. It is simple to implement, computationally inexpensive, and in addition it enables us to discover interesting structure in the data.INDY
Clean Abstract	(not set)
Tags	(not set)
Original Full Text	Can Who-Edits-What Predict Edit Survival?Ali Batuhan Yardım∗Bilkent Universitybatuhan.yardim@ug.bilkent.edu.trVictor KristofEcole Polytechnique Fédérale de Lausannevictor.kristof@epfl.chLucas MaystreEcole Polytechnique Fédérale de Lausannelucas.maystre@epfl.chMatthias GrossglauserEcole Polytechnique Fédérale de Lausannematthias.grossglauser@epfl.chABSTRACTAs the number of contributors to online peer-production systemsgrows, it becomes increasingly important to predict whether theedits that users make will eventually be beneficial to the project.Existing solutions either rely on a user reputation system or consistof a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point inthe solution space that goes beyond user reputation but does notinvolve any content-based feature of the edits. We view each editas a game between the editor and the component of the project. Weposit that the probability that an edit is accepted is a function ofthe editor’s skill, of the difficulty of editing the component and of auser-component interaction term. Our model is broadly applicable,as it only requires observing data about who makes an edit, whatthe edit affects and whether the edit survives or not. We applyour model on Wikipedia and the Linux kernel, two examples oflarge-scale peer-production systems, and we seek to understandwhether it can effectively predict edit survival: in both cases, weprovide a positive answer. Our approach significantly outperformsthose based solely on user reputation and bridges the gap withspecialized predictors that use content-based features. It is simple toimplement, computationally inexpensive, and in addition it enablesus to discover interesting structure in the data.CCS CONCEPTS• Human-centered computing → Collaborative and socialcomputing; Reputation systems; • Information systems→Webmining; • Computing methodologies→Machine learning;KEYWORDSpeer-production systems; user-generated content; collaborativefiltering; ranking∗This work was done while the author was at EPFL.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from permissions@acm.org.KDD 2018, August 19–23, 2018, London, United Kingdom© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-5552-0/18/08. . . $15.00https://doi.org/10.1145/3219819.32199791 INTRODUCTIONOver the last two decades, the number and scale of online peer-production systems has become truly massive, driven by better in-formation networks and advances in collaborative software. At thetime of writing, 128 643 editors contribute regularly to 5+ millionarticles of the English Wikipedia [34] and over 15 600 developershave authored code for the Linux kernel [7]. On GitHub, 24 millionusers collaborate on 25.3 million active software repositories [14].In order to ensure that such projects advance towards their goals,it is necessary to identify whether edits made by users are benefi-cial. As the number of users and components of the project grows,this task becomes increasingly challenging. In response, two typesof solutions are proposed. On the one hand, some advocate theuse of user reputation systems [2, 25]. These systems are general,their predictions are easy to interpret and can be made resistant tomanipulations [10]. On the other hand, a number of highly special-ized methods are proposed to automatically predict the quality ofedits in particular peer-production systems [12, 15]. These methodscan attain excellent predictive performance [16] and usually sig-nificantly outperform predictors that are based on user reputationalone [12], but they are tailored to a particular peer-productionsystem, use domain-specific features and rely on models that aredifficult to interpret.In this work, we set out to explore another point in the solutionspace. We aim to keep the generality and simplicity of user repu-tation systems, while reaching the predictive accuracy of highlyspecialized methods. We ask the question: Can one predict theoutcome of contributions simply by observing who edits what andwhether the edits eventually survive? We address this question byproposing a novel statistical model of edit outcomes. We formalizethe notion of collaborative project as follows. N users can proposeedits onM distinct items (components of the project, such as articleson Wikipedia or a software’s modules), and we assume that there isa process for validating edits (either immediately or over time). Weobserve triplets (u, i,q) that describe a user u ∈ {1, . . . ,N } editingan item i ∈ {1, . . . ,M } and leading to outcome q ∈ {0, 1}; the out-come q = 0 represents a rejected edit, whereas q = 1 represents anaccepted, beneficial edit. Given a dataset of such observations, weseek to learn a model of the probability pui that an edit made byuser u on item i is accepted. This model can then be used to helpmoderators and project maintainers prioritize their efforts oncenew edits appear: For example, edits that are unlikely to survivecould be sent out for review immediately.Our approach borrows from probabilistic models of pairwisecomparisons [24, 36]. These models learn a real-valued score foreach object (user or item) such that the difference between twoKDD 2018, August 19–23, 2018, London, United Kingdom B. Yardım et al.objects’ scores is predictive of comparison outcomes. We take asimilar perspective and view each edit in a collaborative project asa game between the user who tries to effect change and the itemthat resists change1. Similarly to pairwise-comparison models, ourapproach learns a real-valued score for each user and each item.In addition, it also learns latent features of users and items thatcapture interaction effects.In contrast to quality-prediction methods specialized on a par-ticular peer-production system, our approach is general and can beapplied to any system in which users contribute by editing discreteitems. It does not use any explicit content-based features: instead,it simply learns by observing triplets {(u, i,q)}. Furthermore, theresulting model parameters can be interpreted easily. They enablea principled way of a) ranking users by the quality of their con-tributions, b) ranking items by the difficulty of editing them andc) understanding the main dimensions of the interaction betweenusers and items.We apply our approach on two different peer-production sys-tems. We start with Wikipedia and consider its Turkish and Frencheditions. Evaluating the accuracy of predictions on an independentset of edits, we find that our model approaches the performanceof the state of the art. More interestingly, the model parametersreveal important facets of the system. For example, we characterizearticles that are easy or difficult to edit, respectively, and we identifyclusters of articles that share common editing patterns. Next, weturn our attention to the Linux kernel. In this project, contributorsare typically highly skilled professionals, and the edits that theymake affect 394 different subsystems (kernel components). In thisinstance, our model’s predictions are more accurate than a randomforest classifier trained on domain-specific features. In addition, wegive an interesting qualitative description of subsystems based ontheir difficulty score.In short, our paper a) gives evidence that observing who editswhat can yield valuable insights into peer-production systems andb) proposes a statistically grounded and computationally inexpen-sive method to do so. The analysis of two peer-production systemswith very distinct characteristics demonstrates the generality ofthe approach.Organization of the Paper. We start by reviewing related litera-ture in Section 2. In Section 3, we describe our statistical model ofedit outcomes and briefly discuss how to efficiently learn a modelfrom data. In Sections 4 and 5, we investigate our approach in thecontext of Wikipedia and of the Linux kernel, respectively. Finally,we conclude in Section 6.2 RELATEDWORKWith the growing size and impact of online peer-production sys-tems, the task of assessing contribution quality has been extensivelystudied. We review various approaches to the problem of quantify-ing and predicting the quality of user contributions and contrastthem to our approach.1Obviously, items do not really “resist” by themselves. Instead, this notion should betaken as a proxy for the combined action of other users (e.g., project maintainers) whocan accept or reject an edit depending, among others, on standards of quality.User Reputation Systems. Reputation systems have been a long-standing topic of interest in relation to peer-production systems and,more generally, in relation to online services [25]. Adler and de Al-faro [2] propose a point-based reputation system for Wikipedia andshow that reputation scores are predictive of the future quality ofediting. As almost all edits to Wikipedia are immediately accepted,the authors define an implicit notion of edit quality by measuringhow much of the introduced changes is retained in future edits.The ideas underpinning the computation of implicit edit qualityare extended and refined in subsequent papers [3, 10]. This lineof work leads to the development of WikiTrust [11], a browseradd-on that highlights low-reputation texts in Wikipedia articles.When applying our methods to Wikipedia, we follow the sameidea of measuring quality implicitly through the state of the articleat subsequent revisions. We also demonstrate that by automati-cally learning properties of the item that a user edits (in additionto learning properties of the user, such as a reputation score) wecan substantially improve predictions of edit quality. This was alsonoted recently by Tabibian et al. [28] in a setting similar to ours,but using a temporal point process framework.Specialized Classifiers. Several authors propose quality-predictionmethods tailored to a specific peer-production system. Typically,these methods consist of a machine-learned classifier trained on alarge number of content-based and system-based features of theusers, the items and the edits themselves. Druck et al. [12] fit amaximum entropy classifier for estimating the lifespan of a givenWikipedia edit, using a definition of edit longevity similar to that ofAdler and de Alfaro [2]. They consider features based on the edit’scontent (such as: number of words added / deleted, type of change,capitalization and punctuation, etc.) as well as features based onthe user, the time of the edit and the article. Their model signifi-cantly outperforms a baseline that only uses features of the user.Other methods use support vector machines [6], random forests[6, 17] or binary logistic regression [23], with varying levels ofsuccess. In some cases, content-based features are refined usingnatural-language processing, leading to substantial performanceimprovements. However, these improvements are made to the detri-ment of general applicability. For example, competitive naturallanguage processing tools have yet to be developed for the Turkishlanguage (we investigate the Turkish Wikipedia in Section 4). Incontrast to these methods, our approach is general and broadlyapplicable. Furthermore, the use of black-box classifiers can hinderthe interpretability of predictions, whereas we propose a statisticalmodel whose parameters are straightforward to interpret.Truth Inference. In crowdsourcing, a problem related to ours con-sists of jointly estimating a) model parameters (such as user skillsor item difficulties) that are predictive of contribution quality, andb) the quality of each contribution, without ground truth [9]. Ourproblem is therefore easier, as we assume access to ground-truthinformation about the outcome (quality) of past edits. Nevertheless,some methods developed in the crowdsourcing context [31, 32, 37]provide models that can be applied to our setting as well. In Sec-tions 4 and 5, we compare our models to GLAD [32].Pairwise Comparison Models. Our approach draws inspirationfrom probabilistic models of pairwise comparisons. These haveCan Who-Edits-What Predict Edit Survival? KDD 2018, August 19–23, 2018, London, United Kingdombeen studied extensively over the last century in the context ofpsychometrics [5, 29], item response theory [24], chess rankings[13, 36], and more. The main paradigm posits that every objecti has a latent strength (skill or difficulty) parameter θi , and thatthe probability pi j of observing object i “winning” over object jincreases with the distance θi − θ j . Conceptually, our model isclosest to that of Rasch [24].Collaborative Filtering. Our method also borrows from collab-orative filtering techniques popular in the recommender systemscommunity. In particular, some parts of our model are remindful ofmatrix-factorization techniques [19]. These techniques automati-cally learn low-dimensional embeddings of users and items basedon ratings, with the purpose of producing better recommendations.Our work shows that these ideas can also be helpful in addressingthe problem of predicting outcomes of edits in peer-production sys-tems. Like collaborative-filtering methods, our approach is exposedto the cold-start problem: with no (or few) observations about agiven user or item, the predictions are notably less accurate. Inpractice, this problem can be addressed, e.g., by using additionalfeatures of users and / or items [21, 27] or by clustering users [22].3 STATISTICAL MODELSIn this section, we describe and explain two variants of a statisticalmodel of edit outcomes based on who edits what. In other words,we develop models that are predictive of the outcome q ∈ {0, 1} ofa contribution of user u on item i . To this end, we represent theprobability pui that an edit made by useru on item i is successful. Incollaborative projects of interest, most users typically interact withonly a small number of items. In order to deal with the sparsity ofinteractions, we postulate that the probabilities {pui } lie on a low-dimensional manifold and propose twomodel variants of increasingcomplexity. In both cases, the parameters of themodel have intuitiveeffects and can be interpreted easily.Basic Variant. The first variant of our model is directly inspiredby the Rasch model [24]. The probability that an edit is accepted isdefined aspui =11 + exp[−(su − di + b)] , (1)where su ∈ R is the skill of user u, di ∈ R is the difficulty of item i ,and b ∈ R is a global parameter that encodes the overall skew of thedistribution of outcomes. We call this model variant interank basic.Intuitively, the model predicts the outcome of a “game” betweenan item with inertia and a user who would like to effect change.The skill quantifies the ability of the user to enforce a contribution,whereas the difficulty quantifies how “resistant” to contributionsthe particular item is.Similarly to reputation systems [2], interank basic learns ascore for each user; this score is predictive of edit quality. However,unlike these systems, our model also takes into account that someitems might be more challenging to edit than others. For example,onWikipedia, we can expect high-traffic, controversial articles to bemore difficult to edit than less popular articles. As with user skills,the article difficulty can be inferred automatically from observedoutcomes.Full Variant. Although the basic variant is conceptually attrac-tive, it might prove to be too simplistic in some instances. In par-ticular, the basic variant implies that if user u is more skilled thanuser v , then pui > pvi for all items i . In many peer-productionsystems, users tend to have their own specializations and interests,and each item in the project might require a particular mix of skills.For example, with the Linux kernel, an engineer specialized in filesystems might be successful in editing a certain subset of softwarecomponents, but might be less proficient in contributing to, say,network drivers, whereas the situation might be exactly the oppo-site for another engineer. In order to capture the multidimensionalinteraction between users and items, we add a bilinear term to theprobability model (1). Letting xu ,yi ∈ RD for some dimensionalityD ∈ N>0, we definepui =11 + exp[−(su − di + x⊤u yi + b)]. (2)We call the corresponding model variant interank full. The vectorsxu andyi can be thought of as embedding users and items as pointsin a latent D-dimensional space. Informally, pui increases if the twopoints representing a user and an item are close to each other, andit decreases if they are far from each other (e.g., if the vectors haveopposite signs). If we slightly oversimplify, the parameteryi can beinterpreted as describing the set of skills needed to successfully edititem i , whereas xu describes the set of skills displayed by user u.The bilinear term is reminiscent ofmatrix-factorization approachesin recommender systems [19]; indeed, this variant can be seen as acollaborative-filtering method. In true collaborative-filtering fash-ion, our model is able to learn the latent feature vectors {xi } and{yi } jointly, by taking into consideration all edits and without anyadditional content-based features.Finally, note that the skill and difficulty parameters are retainedin this variant and can still be used to explain first-order effects.The bilinear term explains only the additional effect due to theuser-item interaction.3.1 Learning the ModelFrom (1) and (2), it should be clear that our probabilistic modelassumes no data other than the identity of the user and that of theitem. This makes it generally applicable to any peer-productionsystem in which users contribute to discrete items.Given a dataset ofK independent observationsD = {(uk , ik ,qk ) \|k = 1, . . . ,K }, we infer the parameters of the model by maximizingtheir likelihood under D. That is, collecting all model parame-ters into a single vector θ , we seek to minimize the negative log-likelihood−ℓ(θ ;D) =∑(u,i,q )∈D[−q logpui − (1 − q) log(1 − pui )] , (3)where pui depends on θ . In the basic variant, the negative log-likelihood is convex, and we can easily find a global maximumby using standard methods from convex optimization. In the fullvariant, the bilinear term breaks the convexity of the objectivefunction, and we can no longer guarantee that we will find param-eters that are global minimizers. In practice, we do not observe anyconvergence issues but reliably find good model parameters on alldatasets.KDD 2018, August 19–23, 2018, London, United Kingdom B. Yardım et al.Note that (3) easily generalizes from binary outcomes (q ∈ {0, 1})to continuous-valued outcomes (q ∈ [0, 1]). Continuous values canbe used to represent the fraction of the edit that is successful.Implementation. We implement the models in Python by usingthe TensorFlow library [1]. Our code is publicly available online athttps://github.com/lca4/interank. In order to avoid overfitting themodel to the training data, we add a small amount of ℓ2 regular-ization to the negative log-likelihood. We minimize the negativelog-likelihood by using stochastic gradient descent [4] with smallbatches of data. For interank full, we set the number of latentdimensions to D = 20 by cross-validation.Running Time. Our largest experiment consists of learning theparameters of interank full on the entire history of the FrenchWikipedia (c.f. Section 4), consisting of over 65 million edits by5 million users on 2 million items. In this case, our TensorFlowimplementation takes approximately 2 hours to converge on a singlemachine. In most other experiments, our implementation takesonly a few minutes to converge. This demonstrates that our modeleffortlessly scales, even to the largest peer-production systems.3.2 ApplicabilityOur approach models the difficulty of effecting change through theaffected item’s identity. As such, it applies particularly well to peer-production systems where users cooperate to improve the project,i.e., where each edit is judged independently against an item’s(latent) quality standards. This model is appropriate for a widevariety of projects, ranging from online knowledge bases (such asWikipedia, c.f. Section 4) to open source software (such as the Linuxkernel project, c.f. Section 5). In some peer-production systems,however, the contributions of different users compete against eachother, such as multiple answers to a single question on a Q&Aplatform. In these cases, our model can still be applied, but fails tocapture the fact that edit outcomes are interdependent.4 WIKIPEDIAWikipedia is a popular free online encyclopedia and arguably oneof the most successful peer-production systems. In this section, weapply our models to the French and Turkish editions of Wikipedia.4.1 Background & DatasetsThe French Wikipedia is one of the largest Wikipedia editions.At the time of writing, it ranks in third position both in terms ofnumber of edits and number of users2. In order to obtain a comple-mentary perspective, we also study the Turkish Wikipedia, whichis roughly an order of magnitude smaller. Interestingly, both theFrench and the Turkish editions score very highly on Wikipedia’sdepth scale, a measure of collaborative quality [33].The Wikimedia Foundation releases periodically and publicly adatabase dump containing the successive revisions to all articles3.In this paper, we use a dump that contains data starting from thebeginning of the edition up to the fall of 2017.2We chose the French edition over the English one because our computing infrastruc-ture could not support the ≈ 15 TB needed to store the entire history of the EnglishWikipedia. The French edition contains roughly 5× fewer edits.3See: https://dumps.wikimedia.org/.4.1.1 Computation of Edit Quality. On Wikipedia, any user’sedit is immediately incorporated into the encyclopedia4. Therefore,in order to obtain information about the quality of an edit, we haveto consider the implicit signal given by subsequent edits to thesame article. If the changes introduced by the edit are preserved,it signals that the edit was beneficial, whereas if the changes arereverted, the edit likely had a negative effect. A formalization ofthis idea is given by Adler and de Alfaro [2] and Druck et al. [12];see also de Alfaro and Adler [10] for a concise explanation. In thispaper, we essentially follow their approach.Consider a particular article and denote by vk its k-th revision(i.e., the state of the article after the k-th edit). Let d (u,v ) be theLevenshtein distance between two revisions [20]. We define thequality of edit k from the perspective of the article’s state afterℓ ≥ 1 subsequent edits asqk \|ℓ =12 +d (vk−1,vk+ℓ ) − d (vk ,vk+ℓ )2d (vk−1,vk ).By properties of distances, qk \|ℓ ∈ [0, 1]. Intuitively, the quantityqk \|ℓ captures the proportion of work done in edit k that remainsin revision k + ℓ. It can be understood as a soft measure of whetheredit k has been reverted or not. We compute the unconditionalquality of the edit by averaging over multiple future revisions:qk =1LL∑ℓ=1qk \|ℓ , (4)where L is the minimum between the number of subsequent revi-sions of the article and 10 (we empirically found that 10 revisionsis enough to accurately assess the quality of an edit). Note thateven though qk is no longer binary, our models naturally extend tocontinuous-valued qk ∈ [0, 1] (c.f. Section 3.1).In practice, we observe that edit quality is bimodal and asymmet-ric. Most edits have a quality close to either 0 or 1 and a majorityof edits are of high quality. The two rightmost columns of Table 1quantify this for the French and Turkish editions.4.1.2 Dataset Preprocessing. We consider all edits to the pagesin the main namespace (i.e., articles), including those from anony-mous contributors identified by their IP address5. Sequences ofconsecutive edits to an article by the same user are collapsed into asingle edit in order to remove bias in the computation of edit quality[2]. To evaluate methods in a realistic setting, we split the data intoa training set containing the first 90 % of edits, and we report resultson an independent validation set containing the remaining 10 %.Note that the quality is computed based on subsequent revisionsof an article: In order to guarantee that the two sets are truly inde-pendent, we make sure that we never use any revisions from thevalidation set to compute the quality of edits in the training set. Ashort summary of the data statistics after preprocessing is providedin Table 1.4.2 EvaluationIn order to facilitate the comparison of our method with competingapproaches, we evaluate the performance on a binary classification4Except for a small minority of protected articles.5Note, however, that a large majority of edits are made by registered users (82.7 %and 76.6 % for the French and Turkish editions, respectively).Can Who-Edits-What Predict Edit Survival? KDD 2018, August 19–23, 2018, London, United KingdomTable 1: Summary statistics of Wikipedia datasets after preprocessing.Edition # users N # articlesM # edits First edit Last edit % edits with q < 0.2 % edits with q > 0.8French 5 460 745 1 932 810 65 430 838 2001-08-04 2017-09-02 6.4 % 72.2 %Turkish 1 360 076 310 991 8 768 258 2002-12-05 2017-10-01 11.6 % 60.5 %task consisting of predicting whether an edit is of poor quality. Tothis end, we assign binary labels to all edits in the validation set:the label bad is assigned to every edit with q < 0.5, and the labelgood is assigned to all edits with q ≥ 0.5. The predictions of theclassifier might help Wikipedia administrators to identify edits oflow quality; these edits might then be sent to domain experts forreview.As discussed in Section 3, we consider two versions of our model.The first one, interank basic, simply learns scalar user skills andarticle difficulties. The second one, interank full, additionallyincludes a latent embedding of dimension D = 20 for each user andarticle.4.2.1 Competing Approaches. To set our results in context, wecompare them to those obtained with four different baselines.Average. The first approach always outputs the marginal proba-bility of a bad edit in the training set, i.e.,p =# bad edits in training set# edits in training setThis is a trivial baseline, and it gives an idea of what results weshould expect to achieve without any additional information onthe user, article or edit.User-Only. The second approach models the outcome of an editusing only the user’s identity. In short, the predictor learns skills{su \| u = 1, . . . ,N } and a global offset b such that, for each user u,the probabilitypu =11 + exp[−(su + b)]maximizes the likelihood of that user’s edits in the training set. Thisbaseline predictor is representative of user reputation systems suchas that of Adler and de Alfaro [2].GLAD. In the context of crowdsourcing, Whitehill et al. [32]propose the GLAD model that postulates thatpui =11 + exp(−su/di ) ,where su ∈ R and di ∈ R>0. This reflects a different assumptionon the interplay between user skill and item difficulty: under theirmodel, an item with a large difficulty value makes every user’sskill more “diffuse”. In order to make the comparison fair, we add aglobal offset parameter b to the model (similarly to interank andthe user-only baseline).ORES reverted. The fourth approach is a state-of-the-art classifierdeveloped by researchers at the Wikimedia Foundation as part ofWikipedia’s Objective Revision Evaluation Service [15]. We usethe two classification models specifically developed for the Frenchand Turkish editions. Both models use over 80 content-based andTable 2: Predictive performance on the bad edit classifica-tion task for the French and Turkish editions of Wikipedia.The best performance is highlighted in bold.Edition Model Avg. log-likelihood AUPRCFrench interank basic −0.339 0.399interank full −0.336 0.413Average −0.389 0.131User-only −0.346 0.313GLAD −0.344 0.369ORES reverted −0.469 0.453Turkish interank basic −0.380 0.494interank full −0.379 0.503Average −0.461 0.168User-only −0.390 0.410GLAD −0.387 0.471ORES reverted −0.392 0.552system-based features extracted from the user, the article and theedit to predict whether the edit will be reverted, a target whichessentially matches our operational definition of bad edit. Featuresinclude the number of vulgar words introduced by the edit, thelength of the article and of the edit, etc. This predictor is represen-tative of specialized, domain-specific approaches to modeling editquality.4.2.2 Results. Table 2 presents the average log-likelihood andthe area under the precision-recall curve (AUPRC) for each method.interank full has the highest average log-likelihood of all models,meaning that its predictive probabilities are well calibrated withrespect to the validation data.Figure 1 (left and center) presents the precision-recall curves forall methods. The analysis is qualitatively similar for both Wikipediaeditions. All non-trivial predictors perform similarly in the high-recall regime, but present significant differences in the high-precisionregime, on which we will focus. The ORES predictor performs thebest. interank comes second, reasonably close behind ORES, andthe full variant has a small edge over the basic variant. GLAD isnext, and the user-only baseline is far behind. This shows that a) in-corporating information about the article being edited is crucialfor achieving a good performance on a large portion of the preci-sion-recall trade-off, and b) modeling the outcome probability byusing the difference between skill and difficulty (interank) is betterthan by using the ratio (GLAD).We also note that in the validation set, approximately 20 % (15 %)of edits are made by users (respectively, on articles) that are neverencountered in the training set (the numbers are similar in bothKDD 2018, August 19–23, 2018, London, United Kingdom B. Yardım et al.0.0 0.2 0.4 0.6 0.8 1.0Recall0.00.20.40.60.81.0PrecisionTurkish Wikipedia0.0 0.2 0.4 0.6 0.8 1.0RecallFrench Wikipediainterank basicinterank fullAverageUser-onlyGLADORES reverted0 20 40 60 80 100User percentile020406080100ArticlepercentileFrench WP, log-likelihood−0.6 −0.5 −0.4 −0.3Figure 1: Precision-recall curves on the bad edit classification task for the Turkish and French editions of Wikipedia (left andcenter). Average log-likelihood as a function of the number of observations of the user and item in the training set (right).editions). In these cases, interank reverts to average predictions,whereas content-based methods can take advantage of other fea-tures of the edit to make an informed prediction. In order to explorethis cold-start effect in more detail, we group users and articles intobins based on the number of times they appear in the training set,and we compute the average log-likelihood of validation examplesseparately for each bin. Figure 1 (right) presents the results forthe French edition; the results for the Turkish edition are similar.Clearly, predictions for users and articles present in the training setare significantly better. In a practical deployment, several methodscan help to address this issue [21, 22, 27]. A thorough investigationof ways to mitigate the cold-start problem is beyond the scope ofthis paper.In summary, we observe that our model, which incorporatesthe articles’ identity, is able to bridge the gap between user-onlyprediction approach and a specialized predictor (ORES reverted).Furthermore, modeling the interaction between user and article(interank full) is beneficial and helps further improve predictions,particularly in the high-precision regime.4.3 Interpretation of Model ParametersThe parameters of interankmodels, in addition to being predictiveof edit outcomes, are also very interpretable. In the following, wedemonstrate how they can surface interesting characteristics of thepeer-production system.4.3.1 Controversial Articles. Intuitively, we expect an article iwhose difficulty parameter di is large to deal with topics that arepotentially controversial. We focus on the French Wikipedia andexplore a list of the ten most controversial articles given by Yasseriet al. [35]. In this 2014 study, the authors identify controversialarticles by using an ad-hoc methodology. Table 3 presents, for eacharticle identified by Yasseri et al., the percentile of the correspondingdifficulty parameter di learned by interank full. We analyze thesearticles approximately four years later, but the model still identifiesthem as some of the most difficult ones. Interestingly, the articleon Sigmund Freud, which has the lowest difficulty parameter ofTable 3: The ten most controversial articles on the FrenchWikipedia according to Yasseri et al. [35]. For each article i,we indicate the percentile of its corresponding parameterdi .Rank Title Percentile of di1 Ségolène Royal 99.840 %2 Unidentified flying object 99.229 %3 Jehovah’s Witnesses 99.709 %4 Jesus 99.953 %5 Sigmund Freud 97.841 %6 September 11 attacks 99.681 %7 Muhammad al-Durrah incident 99.806 %8 Islamophobia 99.787 %9 God in Christianity 99.712 %10 Nuclear power debate 99.304 %median 99.710 %the list, has become a featured article since Yasseri et al.’s analysis—a distinction awarded only to the most well-written and neutralarticles.4.3.2 Latent Factors. Next, we turn our attention to the param-eters {yi }. These parameters can be thought of as an embedding ofthe articles in a latent space of dimension D = 20. As we learn amodel that maximizes the likelihood of edit outcomes, we expectthese embeddings to capture latent article features that explain editoutcomes. In order to extract the one or two directions that ex-plain most of the variability in this latent space, we apply principalcomponent analysis [4] to the matrix Y = [yi ].In Table 4, we consider the Turkish Wikipedia and list a subsetof the 20 articles with the highest and lowest coordinates alongthe first principal axis of Y . We observe that this axis seems todistinguish articles about popular culture from those about “highculture” or timeless topics. This discovery supports the hypothesisthat users have a propensity to successfully edit either popularculture or high-culture articles on Wikipedia, but not both.Can Who-Edits-What Predict Edit Survival? KDD 2018, August 19–23, 2018, London, United KingdomTable 4: A selection of articles of the Turkish Wikipedia among the top-20 highest and lowest coordinates along the firstprincipal axis of the matrix Y .Direction TitlesLowest Harry Potter’s magic list, List of programs broadcasted by Star TV, Bursaspor 2011-12 season, Kral Pop TV Top 20, DeathEater, Heroes (TV series), List of programs broadcasted by TV8, Karadayı, Show TV, List of episodes of Kurtlar Vadisi Pusu.Highest Seven Wonders of the World, Thomas Edison, Cell, Mustafa Kemal Atatürk, Albert Einstein, Democracy, Isaac Newton,Mehmed the Conqueror, Leonardo da Vinci, Louis Pasteur.TV & teen cultureFrench municipalityTennis-relatedOtherJustine HeninJulie HalardVirginia WadeMarcelo Melo…William ShakespeareM. de RobespierreNelson MandelaCharlemagne…Figure 2: t-SNE visualization of 80 articles of the FrenchWikipedia with highest and lowest coordinates along thefirst and second principal axes of the matrix Y .Finally, we consider the French Wikipedia. Once again, we applyprincipal component analysis to the matrixY and keep the first twodimensions. We select the 20 articles with the highest and lowestcoordinates along the first two principal axes6. A two-dimensionalt-SNE plot [30] of the 80 articles selected using PCA is displayed inFigure 2. The plot enables identifying meaningful clusters of relatedarticles, such as articles about tennis players, French municipalities,historical figures, and TV or teen culture. These articles are repre-sentative of the latent dimensions that separate editors the most: auser skilled in editing pages about ancient Greek mathematiciansmight be less skilled in editing pages about anime, and vice versa.5 LINUX KERNELIn this section, we apply the interank model to the Linux kernelproject, a well-known open-source software project. In contrast toWikipedia, most contributors to the Linux kernel are highly skilledprofessionals who dedicate a significant portion of their time andefforts to the project.5.1 Background & DatasetThe Linux kernel has fundamental impact on technology as a whole.In fact, the Linux operating system runs 90 % of the cloud workload6Interestingly, the first dimension has a very similar interpretation to that obtained onthe Turkish edition: it can also be understood as separating popular culture from highculture.and 82 % of the smartphones [7]. To collectively improve the sourcecode, developers submit bug fixes or new features in the form ofa patch to collaborative repositories. Review and integration timedepend on the project’s structure, ranging from a few hours or daysfor Apache Server [26] to a couple of months for the Linux kernel[18]. In particular for the Linux kernel, developers submit patchesto subsystem mailing lists, where they undergo several rounds ofreviews. After suggestions are implemented and if the code is ap-proved, the patch can be committed to the subsystem maintainer’ssoftware repository. Integration conflicts are spotted at this stage byother developers monitoring the maintainer’s repository and anyissues must be fixed by the submitter. If the maintainer is satisfiedwith the patch, she commits it to Linus Torvalds’ repository, whodecides to include it or not with the next Linux release.5.1.1 Dataset Preprocessing. We use a dataset collected by Jianget al. [18] which spans Linux development activity between 2005and 2012. It consists of 670 533 patches described using 62 featuresderived from e-mails, commits to software repositories, the devel-opers’ activity and the content of the patches themselves. Jianget al. scraped patches from the various mailing lists and matchedthem with commits in the main repository. In total, they managedto trace back 75 % of the commits that appear in Linus Torvalds’repository to a patch submitted to a mailing list. A patch is labeledas accepted (q = 1) if it eventually appears in a release of the Linuxkernel, and rejected (q = 0) otherwise. We remove data points withempty subsystem and developer names, as well as all subsystemswith no accepted patches. Finally, we chronologically order thepatches according to their mailing list submission time.After preprocessing, the dataset contains K = 619 419 patchesproposed by N = 9672 developers onM = 394 subsystems. 34.12 %of these patches are accepted.We then split the data into training setcontaining the first 80 % of patches and a validation set containingthe remaining 20 %.5.1.2 Subsystem-Developer Correlation. Given the highly com-plex nature of the project, one could believe that developers tendto specialize in few, independent subsystems. Let Xu = {Xui }Mi=1be the collection of binary variables Xui indicating whether de-veloper u has an accepted patch in subsystem i . We compute thesample Pearson correlation coefficient ruv = ρ (Xu ,Xv ) betweenXu and Xv . We show in Figure 3 the correlation matrix R = [ruv ]between developers patching subsystems. Row ru corresponds todeveloper u, and we order all rows according to the subsystem eachdeveloper u contribute to the most. We order the subsystems indecreasing order by the number of submitted patches, such thatlarger subsystems appear at the top of the matrix R. Hence, theKDD 2018, August 19–23, 2018, London, United Kingdom B. Yardım et al.02000400060008000DeveloperIDsFigure 3: Correlation matrix R between developers orderedaccording to the subsystem they contribute to the most. Theblocks on the diagonal correspond to subsystems. Core sub-systems form a strong cluster (blue square).blocks on the diagonal roughly correspond to subsystems and theirsize represents the number of developers involved with the sub-system. As shown by the blocks, developers tend to specialize intoone subsystem. However, as the numerous non-zero off-diagonalentries reveal, they still tend to contribute substantially to othersubsystems. Finally, as highlighted by the dotted, blue square, sub-systems number three to six on the diagonal form a cluster. In fact,these four subsystems (include/linux, arch/x86, kernel and mm)are core subsystems of the Linux kernel.5.2 EvaluationWe consider the task of predicting whether a patch will be inte-grated into a release of the kernel. Similarly to Section 4, we useinterank basic and interank full with D = 20 latent dimensionsto learn the developers’ skills, the subsystems’ difficulty, and theinteraction between them.5.2.1 Competing Approaches. Three baselines that we consider—average, user-only and GLAD—are identical to those described inSection 4.2.1. In addition, we also compare our model to a randomforest classifier trained on domain-specific features similar to theone used by Jiang et al. [18]. In total, this classifier has access to21 features for each patch. Features include information about thedeveloper’s experience up to the time of submission (e.g., numberof accepted commits, number of patches sent), the e-mail thread(e.g., number of developers in copy of the e-mail, size of e-mail,number of e-mails in thread until the patch) and the patch itself (e.g.,number of lines changed, number of files changed). We optimizethe hyperparameters of the random forest using a grid-search. Asthe model has access to domain-specific features about each edit, itis representative of the class of specialized methods tailored to theLinux kernel peer-production system.5.2.2 Results. Table 5 displays the average log-likelihood andarea under the precision-recall curve (AUPRC). interank full per-forms best in terms of both metrics. In terms of AUPRC, it outper-forms the random forest classifier by 4.4 %, GLAD by 5 %, and theuser-only baseline by 7.3 %.Table 5: Predictive performance on the accepted patch classi-fication task for the Linux kernel. The best performance ishighlighted in bold.Model Avg. log-likelihood AUPRCinterank basic -0.589 0.525interank full -0.588 0.527Average -0.640 0.338User-only -0.601 0.491GLAD -0.598 0.502Random forest -0.599 0.5050.0 0.2 0.4 0.6 0.8 1.0Recall0.00.20.40.60.81.0PrecisionLinux kernelinterank basicinterank fullAverageUser-onlyGLADRandom ForestFigure 4: Precision-recall curves on the bad edit classifi-cation task for the Linux kernel. interank (solid orangeand red) outperforms the user-only baseline (dotted green),the random forest classifier (dashed blue), and GLAD (dash-dotted purple).We show the precision-recall curves in Figure 4. Both interankfull and interank basic perform better than the four baselines.Notably, they outperform the random forest in the high-precisionregime, even though the random forest uses content-based featuresabout developers, subsystems and patches. In the high-recall regime,the random forest attains a marginally better precision. The user-only and GLAD baselines performworse than all non-trivial models.5.3 Interpretation of Model ParametersWe show in Table 6 the top-five and bottom-five subsystems accord-ing to difficulties {di } learned by interank full. We note that eventhough patches submitted to difficult subsystems have in generallow acceptance rate, interank enables a finer ranking by takinginto account who is contributing to the subsystems. This effectis even more noticeable with the five subsystems with smallestdifficulty value.The subsystems i with largest di are core components, whoseintegrity is crucial to the system. For instance, the usr subsystem,providing code for RAM-related instructions at booting time, hasCan Who-Edits-What Predict Edit Survival? KDD 2018, August 19–23, 2018, London, United KingdomTable 6: Top-five and bottom-five subsystems according totheir difficulty di .Difficulty Subsystem % Acc. # Patch # Dev.+2.664 usr 1.88 % 796 70+1.327 include 7.79 % 398 101+1.038 lib 15.99 % 5642 707+1.013 drivers/clk 34.34 % 495 81+0.865 include/trace 17.73 % 547 81-1.194 drivers/addi-data 78.31 % 272 8-1.080 net/tipc 43.11 % 573 44-0.993 drivers/ps3 44.26 % 61 9-0.936 net/nfc 73.04 % 204 26-0.796 arch/mn10300 45.40 % 359 63barely changed in the last seven years. On the other hand, thesubsystems i with smallest di are peripheral components servingspecific devices, such as digital signal processors or gaming consoles.These components can arguably tolerate a higher rate of bugs, andhence they evolve more frequently.Jiang et al. [18] establish that a high prior subsystem churn(i.e., high number of previous commits to a subsystem) leads tolower acceptance rate. We approximate the number of commitsto a subsystem as the number of patches submitted multiplied bythe subsystem’s acceptance rate. The first quartile of subsystemsaccording to their increasing difficulty, i.e., the least difficult sub-systems, has an average churn of 687. The third quartile, i.e., themost difficult subsystems, has an average churn of 833. We verifyhence that higher churn correlates with difficult subsystems. Thiscorroborates the results obtained by Jiang et al.As shown in Figure 4, if false negatives are not a priority, inter-ank will yield a substantially higher precision. In other words, ifthe task at hand requires that the patches classified as accepted areactually the ones integrated in a future release, then interank willyield more accurate results. For instance, it would be efficient insupporting Linus Torvalds in the development of the Linux kernelby providing him with a restricted list of patches that are likely tobe integrated in the next release of the Linux kernel.6 CONCLUSIONIn this paper, we have introduced interank, a model of edit out-comes in peer-production systems. Predictions generated by ourmodel can be used to prioritize the work of project maintainers byidentifying contributions that are of high or low quality.Similarly to user reputation systems, interank is simple, easyto interpret and applicable to a wide range of domains. Whereasuser reputation systems are usually not competitive with special-ized edit quality predictors tailored to a particular peer-productionsystem, interank is able to bridge the gap between the two typesof approaches, and it attains a predictive performance that is com-petitive with the state of the art—without access to content-basedfeatures.We have demonstrated the performance of the model on twopeer-production systems exhibiting different characteristics. Be-yond predictive performance, we can also use model parameters togain insight into the system. On Wikipedia, we have shown thatthe model identifies controversial articles, and that latent dimen-sions learned by our model display interesting patterns related tocultural distinctions between articles. On the Linux kernel, we haveshown that inspecting model parameters enables to identify coresubsystems (large difficulty parameters) from peripheral compo-nents (small difficulty parameters).Future Work. In the future, we would like to investigate the ideaof using the latent embeddings learned by our model in order torecommend items to edit. Ideally, we could match items that needto be edited with users that are most suitable for the task. ForWikipedia, an ad-hoc method called “SuggestBot” was proposedby Cosley et al. [8]. We believe it would be valuable to propose amethod that is applicable to peer-production systems in general.ACKNOWLEDGMENTSWe are grateful to Yujuan Jiang for providing the Linux data and toAaron Halfaker for helping us understand ORES. We thank PatrickThiran, Brunella Spinelli, Vincent Etter, Ksenia Konyushkova, HollyCogliati-Bauereis and the anonymous reviewers for careful proof-reading and constructive feedback.REFERENCES[1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, JeffreyDean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al.2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings ofOSDI’16. Savannah, GA, USA.[2] B. Thomas Adler and Luca de Alfaro. 2007. A Content-Driven Reputation Systemfor the Wikipedia. In Proceedings of WWW’07. Banff, AB, Canada.[3] B. Thomas Adler, Luca de Alfaro, Ian Pye, and Vishwanath Raman. 2008. Measur-ing Author Contributions to the Wikipedia. In Proceedings of WikiSym’08. Porto,Portugal.[4] ChristopherM. Bishop. 2006. Pattern Recognition andMachine Learning. Springer.[5] Ralph Allan Bradley and Milton E. Terry. 1952. Rank Analysis of IncompleteBlock Designs: I. The Method of Paired Comparisons. Biometrika 39, 3/4 (1952),324–345.[6] Amit Bronner and Christof Monz. 2012. User Edits Classification Using DocumentRevision Histories. In Proceedings of EACL 2012. Avignon, France.[7] Jonathan Corbet and Greg Kroah-Hartman. 2017. 2017 Linux Kernel DevelopmentReport. Technical Report. The Linux Foundation.[8] Dan Cosley, Dan Frankowski, Loren Terveen, and John Riedl. 2007. Suggest-Bot: Using Intelligent Task Routing to Help People Find Work in Wikipedia. InProceedings of IUI’07. Honolulu, HI, USA.[9] Alexander Philip Dawid and Allan M Skene. 1979. Maximum Likelihood Esti-mation of Observer Error-rates using the EM Algorithm. Applied Statistics 28, 1(1979), 20–28.[10] Luca de Alfaro and B. Thomas Adler. 2013. Content-Driven Reputation forCollaborative Systems. In Proceedings of TGC 2013. Buenos Aires, Argentina.[11] Luca de Alfaro, Ashutosh Kulshreshtha, Ian Pye, and B. Thomas Adler. 2011.Reputation Systems for Open Collaboration. Commun. ACM 54, 8 (2011), 81–87.[12] Gregory Druck, Gerome Miklau, and Andrew McCallum. 2008. Learning toPredict the Quality of Contributions to Wikipedia. In Proceedings of WikiAI 2008.Chicago, IL, USA.[13] Arpad Elo. 1978. The Rating Of Chess Players, Past & Present. Arco Publishing.[14] GitHub. 2017. The State of the Octoverse 2017. https://octoverse.github.com/Accessed: 2017-10-27.[15] Aaron Halfaker and Dario Taraborelli. 2015. Artificial intelligence service “ORES”gives Wikipedians X-ray specs to see through bad edits. https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/ Accessed: 2017-10-27.[16] Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. 2016. Vandal-ism Detection in Wikidata. In Proceedings of CIKM’16. Indianapolis, IN, USA.[17] Sara Javanmardi, David W. McDonald, and Cristina V. Lopes. 2011. Vandalism De-tection in Wikipedia: A High-Performing, Feature-Rich Model and its ReductionThrough Lasso. In Proceedings of WikiSym’11. Mountain View, CA, USA.[18] Yujuan Jiang, Bram Adams, and Daniel M. German. 2013. Will My Patch MakeIt? And How Fast? Case Study on the Linux Kernel. In Proceedings of MSR 2013.San Francisco, CA, USA.KDD 2018, August 19–23, 2018, London, United Kingdom B. Yardım et al.[19] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Tech-niques for Recommender Systems. Computer 42, 8 (2009), 30–37.[20] Joseph B. Kruskal. 1983. An Overview of Sequence Comparison: Time Warps,String Edits, and Macromolecules. SIAM Rev. 25, 2 (1983), 201–237.[21] Xuan Nhat Lam, Thuc Vu, Trong Duc Le, and Anh Duc Duong. 2008. AddressingCold-Start Problem in Recommendation Systems. In Proceedings of ICUIMC’08.Suwon, Korea.[22] Asher Levi, Osnat Mokryn, Christophe Diot, and Nina Taft. 2012. Finding aNeedle in a Haystack of Reviews: Cold Start Context-Based Hotel RecommenderSystem. In Proceedings of RecSys’12. Dublin, Ireland.[23] Martin Potthast, Benno Stein, and Robert Gerling. 2008. Automatic VandalismDetection in Wikipedia. In Proceedings of ECIR 2008. Glasgow, Scottland.[24] Georg Rasch. 1960. Probabilistic Models for Some Intelligence and Attainment Tests.Danmarks Pædagogiske Institut.[25] Paul Resnick, Ko Kuwabara, Richard Zeckhauser, and Eric Friedman. 2000. Repu-tation systems. Commun. ACM 43, 12 (2000), 45–48.[26] Peter C. Rigby, Daniel M. German, and Margaret-Anne Storey. 2008. Open SourceSoftware Peer Review Practices: A Case Study of the Apache Server. In Proceedingsof ICSE’08. Leipzig, Germany.[27] Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock.2002. Methods and Metrics for Cold-Start Recommendations. In Proceedings ofSIGIR’02. Tampere, Finland.[28] Behzad Tabibian, Isabel Valera, Mehrdad Farajtabar, Le Song, Bernhard Schölkopf,andManuel Gomez-Rodriguez. 2017. Distilling Information Reliability and SourceTrustworthiness from Digital Traces. In Proceedings of WWW’17. Perth, WA,Australia.[29] Louis L. Thurstone. 1927. A Law of Comparative Judgment. Psychological Review34, 4 (1927), 273–286.[30] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data usingt-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579–2605.[31] PeterWelinder, Steve Branson, Pietro Perona, and Serge J Belongie. 2010. TheMul-tidimensional Wisdom of Crowds. In Advances in Neural Information ProcessingSystems 23. Vancouver, BC, Canada.[32] Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul LRuvolo. 2009. Whose Vote Should CountMore: Optimal Integration of Labels fromLabelers of Unknown Expertise. In Advances in Neural Information ProcessingSystems 22. Vancouver, BC, Canada.[33] Wikipedia. 2017. Wikipedia article depth. https://meta.wikimedia.org/wiki/Wikipedia_article_depth Accessed: 2017-10-30.[34] Wikipedia. 2017. Wikipedia:Wikipedians. https://en.wikipedia.org/wiki/Wikipedia:Wikipedians Accessed: 2017-10-27.[35] Taha Yasseri, Anselm Spoerri, Mark Graham, and János Kertész. 2014. The mostcontroversial topics in Wikipedia: A multilingual and geographical analysis. InGlobal Wikipedia: International and Cross-Cultural Issues in Online Collaboration,Pnina Fichman and Noriko Hara (Eds.). Scarecrow Press.[36] Ernst Zermelo. 1928. Die Berechnung der Turnier-Ergebnisse als ein Maxi-mumproblem der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 29, 1(1928), 436–460.[37] Denny Zhou, Sumit Basu, Yi Mao, and John C Platt. 2012. Learning from theWisdom of Crowds by Minimax Entropy. In Advances in Neural InformationProcessing Systems 25. Lake Tahoe, CA, USA.
Clean Full Text	(not set)
Language	(not set)
Doi	10.1145/3219819.3219979
Arxiv	(not set)
Mag	(not set)
Acl	(not set)
Pmid	(not set)
Pmcid	(not set)
Pub Date	2024-10-18 14:33:36
Pub Year	2024
Journal Name	(not set)
Journal Volume	(not set)
Journal Page	(not set)
Publication Types	(not set)
Tldr	(not set)
Tldr Version	(not set)
Generated Tldr	(not set)
Search Term Used	Jehovah's AND yearPublished>=2024
Reference Count	(not set)
Citation Count	(not set)
Influential Citation Count	(not set)
Last Update	2024-11-01 00:00:00
Status	0
Aws Job	(not set)
Last Checked	(not set)
Modified	2025-01-13 22:06:18
Created	2025-01-13 22:06:18