About: Multi-armed bandit

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: Multi-armed bandit Goto Sponge NotDistinct Permalink

An Entity of Type : yago:ScientificResearch100641820, within Data Space : dbpedia.demo.openlinksw.com associated with source document(s)
QRcode icon

http://dbpedia.demo.openlinksw.com/c/51iaA8L9d2

In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines (sometimes known as "one-armed bandits"), who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether

Attributes	Values
rdf:type	yago:WikicatSequentialExperiments yago:WikicatSequentialMethods yago:Ability105616246 yago:Abstraction100002137 yago:Act100030358 yago:Activity100407535 yago:Cognition100023271 yago:Event100029378 yago:Experiment100639556 yago:Investigation100633864 yago:Know-how105616786 yago:Method105660268 yago:PsychologicalFeature100023100 yago:Research100636921 yago:Work100575741 yago:YagoPermanentlyLocatedEntity disease yago:ScientificResearch100641820
rdfs:label	El problema de la màquina escurabutxaques (ca) Bandido multibrazo (es) Bandit manchot (mathématiques) (fr) Multi-armed bandit (en) 多腕バンディット問題 (ja) Багаторукий бандит (uk)
rdfs:comment	El problema de la màquina escurabutxaques es pot esquematitzar de la manera següent: * S'és davant dues màquines escurabutxaques * L'una, , està en funcionament. Retorna per tant 1 euro per fitxa amb una probabilitat coneguda. * L'altra, , està espatllada, i retorna per tant 1 euro per fitxa amb una probabilitat desconeguda. * Es disposa de fitxes. Què fer per maximitzar raonablement el guany ? (ca) 多腕バンディット問題（たわんばんでぃっともんだい、Multi-armed bandit problem）は、確率論と機械学習において、一定の限られた資源のセットを競合する選択肢間で、期待利得を最大化するように配分しなければならない問題。それぞれの選択肢の特性が、配分時には一部しか分かっておらず、時間が経過したり選択肢に資源が配分されることで理解できる可能性がある。これは、探索 exploration と搾取 exploitation のトレードオフのジレンマを例証する古典的な強化学習の問題である。この名前は、スロットマシン（単腕バンディットとも呼ばれる）の列で、どのマシンをプレイするか、各マシンを何回プレイするか、どの順番でプレイするか、現在のマシンを続けるか別のマシンを試すかを決めなければならないギャンブラーを想像することに由来している。多腕バンディット問題も、広義の確率的スケジューリングに分類される。 (ja) En teoría de la probabilidad, el problema del bandido multibrazo (también llamado (problema del bandido de N o K brazos) es un problema en el que un jugador ante una fila de tragaperras (también denominadas "bandidos de un solo brazo") tiene que decidir con qué máquinas juega, y en qué orden. Cuando juega, cada tragaperras devuelve una recompensa aleatoria derivada de la distribución de probabilidad específica de la máquina. El objetivo del jugador es maximizar la suma de las recompensas obtenidas a través de una secuencia de máquinas. * Datos: Q2882343 (es) En mathématiques, plus précisément en théorie des probabilités, le problème du bandit manchot (généralisable en problème du bandit à K bras ou problème du bandit à N bras) se formule de manière imagée de la façon suivante : un utilisateur (un agent), face à des machines à sous, doit décider quelles machines jouer. Chaque machine donne une récompense moyenne que l'utilisateur ne connait pas a priori. L'objectif est de maximiser le gain cumulé de l'utilisateur. (fr) In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines (sometimes known as "one-armed bandits"), who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether (en) У теорії ймовірностей та машинному навчанні задача багаторукого бандита (яку іноді називають задачею K- або N-рукого бандита) — це задача розподілу обмеженої множини ресурсів між конкуруючими альтернативами таким чином, щоб максимізувати очікуваний виграш, коли властивості кожного варіанту відомі лише частково на момент ухвалення рішення, і можуть стати краще зрозумілими з плином часу або шляхом розподілу ресурсів для реалізації варіанту. Це класична задача навчання з підкріпленням, яка є прикладом дилеми балансу між дослідженням та розвідкою. Назва походить від уявного гравця на низці ігрових автоматів (їх часто називають «однорукими бандитами»), який має вирішити, на яких автоматах варто грати, скільки разів варто грати на кожному автоматі та в якому порядку слід грати, і чи продовжувати (uk)
foaf:depiction
dct:subject	Sequential experiments Sequential methods Machine learning Stochastic optimization
Wikipage page ID	2854828 (xsd:integer)
Wikipage revision ID	1124037510 (xsd:integer)
Link from a Wikipage to another Wikipage	Sequential experiments Sequential methods Probability distribution Bayes' theorem Annals of Applied Probability Peter Whittle (mathematician) Ridge regression Thompson sampling Open source Search theory Gambler Germany Concept drift Optimal stopping Machine learning Slot machines Clinical trial Pharmaceutical industry Portfolio (finance) Machine learning Gittins index Adaptive routing Medical ethics Nonparametric regression Probability theory Random forest Regret (decision theory) Reinforcement learning Herbert Robbins Stochastic optimization Asymptotic John C. Gittins Collaborative filtering Bulletin of the AMS Softmax function Greedy algorithm Michael Katehakis R (programming language) World War II Markov decision process Stochastic scheduling Iterated prisoner's dilemma Singular-value decomposition Open-Source Condorcet winner Voting paradoxes dbr:Wikt:one-armed_bandit
Link from a Wikipage to an external page	https://mpatacchiola.github.io/blog/2017/08/14/dissecting-reinforcement-learning-6.html http://homes.di.unimi.it/~cesabian/Pubblicazioni/banditSurvey.pdf http://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html https://pavlov.tech/2019/03/02/animated-multi-armed-bandit-policies/ https://web.archive.org/web/20131211192714/http:/webdocs.cs.ualberta.ca/~sutton/book/the-book.html https://feynmanlectures.caltech.edu/info/exercises/Feynmans_restaurant_problem.html http://techtalks.tv/talks/54451/ http://techtalks.tv/talks/54455/ https://mloss.org/software/view/415/ http://bandit.sourceforge.net https://archive.today/20121212095047/http:/www.cs.washington.edu/research/jair/volume4/kaelbling96a-html/node6.html

Faceted Search & Find service v1.17_git147 as of Sep 06 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 08.03.3331 as of Sep 2 2024, on Linux (x86_64-generic-linux-glibc212), Single-Server Edition (378 GB total memory, 53 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software