About: Byte pair encoding

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: Byte pair encoding Goto Sponge NotDistinct Permalink

An Entity of Type : yago:Rule105846932, within Data Space : dbpedia.demo.openlinksw.com associated with source document(s)
QRcode icon

http://dbpedia.demo.openlinksw.com/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FByte_pair_encoding&invfp=IFP_OFF&sas=SAME_AS_OFF

Byte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data. The algorithm was first described publicly by Philip Gage in a February 1994 article "A New Algorithm for Data Compression" in the C Users Journal.

Attributes	Values
rdf:type	yago:WikicatLosslessCompressionAlgorithms yago:Abstraction100002137 yago:Act100030358 yago:Activity100407535 yago:Algorithm105847438 yago:Event100029378 yago:Procedure101023820 yago:PsychologicalFeature100023100 yago:YagoPermanentlyLocatedEntity yago:Rule105846932
rdfs:label	ترميز زوج البايتات (ar) Byte pair encoding (en) Codificación de pares de bytes (es) バイト対符号化 (ja) 字节对编码 (zh)
rdfs:comment	ترميز زوجين من البايت (بالإنجليزية: Byte Pair Encoding)‏ هو نموذج بسيط لضغط البيانات يقوم بستبدال زوجين من البايت على التوالي وأكثرهما شيوعاً ببايت واحد بحيث لا يتكرر البايت المستبدل في مجموعة البيانات. الزواج المستبدل يوضع في جدول لكي يتم إعادة بناء البيانات الأصلية. (ar) バイト対符号化（ばいとついふごうか、英: Byte Pair Encoding、略してBPE）は、データ圧縮法のひとつで、可逆圧縮に分類される。一般的な圧縮法と比較して圧縮速度が極端に遅いという欠点はあるが、展開速度は爆発的な速さである。また、展開ルーチンが非常に小さく作ることが可能であるという特徴を持つ。このような特徴から、性能の低いコンピュータ用のゲームソフトのデータ圧縮などに用いられることが多い。 (ja) 字节对编码是一种简单的数据压缩形式，这种方法用数据中不存的一个字节表示最常出现的连续字节数据。这样的替换需要重建全部原始数据。 (zh) Byte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data. The algorithm was first described publicly by Philip Gage in a February 1994 article "A New Algorithm for Data Compression" in the C Users Journal. (en) La codificación de pares de bytes o la codificación de digram es una forma simple de compresión de datos en la que el par más común de bytes consecutivos de datos se reemplaza con un byte que no ocurre dentro de esos datos. Se requiere una tabla de reemplazos para reconstruir los datos originales. El algoritmo fue descrito públicamente por primera vez por Philip Gage en un artículo de febrero de 1994 "Un nuevo algoritmo para la compresión de datos" en el C Users Journal. (es)
dcterms:subject	Lossless compression algorithms
Wikipage page ID	5825526 (xsd:integer)
Wikipage revision ID	1103156211 (xsd:integer)
Link from a Wikipage to another Wikipage	Natural language processing Google Lossless compression algorithms Byte Data compression GPT-3 Recursion OpenAI Re-Pair Sequitur algorithm
sameAs	Byte pair encoding Byte pair encoding Byte pair encoding Byte pair encoding Byte pair encoding Byte pair encoding Byte pair encoding Byte pair encoding
dbp:wikiPageUsesTemplate	dbt:Compression_Methods dbt:Reflist dbt:Short_description
has abstract	ترميز زوجين من البايت (بالإنجليزية: Byte Pair Encoding)‏ هو نموذج بسيط لضغط البيانات يقوم بستبدال زوجين من البايت على التوالي وأكثرهما شيوعاً ببايت واحد بحيث لا يتكرر البايت المستبدل في مجموعة البيانات. الزواج المستبدل يوضع في جدول لكي يتم إعادة بناء البيانات الأصلية. (ar) Byte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data. The algorithm was first described publicly by Philip Gage in a February 1994 article "A New Algorithm for Data Compression" in the C Users Journal. A variant of the technique has shown to be useful in several natural language processing (NLP) applications, such as Google's SentencePiece, and OpenAI's GPT-3. Here, the goal is not data compression, but encoding text in a given language as a sequence of 'tokens', using a fixed vocabulary of different tokens. Typically, most words will be encoded as a single token, while rare words will be encoded as a sequence of a few tokens, where these tokens represent meaningful word parts. This translation of text into tokens can be found by a variant of byte pair encoding. (en) La codificación de pares de bytes o la codificación de digram es una forma simple de compresión de datos en la que el par más común de bytes consecutivos de datos se reemplaza con un byte que no ocurre dentro de esos datos. Se requiere una tabla de reemplazos para reconstruir los datos originales. El algoritmo fue descrito públicamente por primera vez por Philip Gage en un artículo de febrero de 1994 "Un nuevo algoritmo para la compresión de datos" en el C Users Journal. Se ha demostrado que una variante de la técnica es útil en varias aplicaciones de procesamiento de lenguaje natural, como GPT, GPT-2 y GPT-3 de OpenAI. (es) バイト対符号化（ばいとついふごうか、英: Byte Pair Encoding、略してBPE）は、データ圧縮法のひとつで、可逆圧縮に分類される。一般的な圧縮法と比較して圧縮速度が極端に遅いという欠点はあるが、展開速度は爆発的な速さである。また、展開ルーチンが非常に小さく作ることが可能であるという特徴を持つ。このような特徴から、性能の低いコンピュータ用のゲームソフトのデータ圧縮などに用いられることが多い。 (ja) 字节对编码是一种简单的数据压缩形式，这种方法用数据中不存的一个字节表示最常出现的连续字节数据。这样的替换需要重建全部原始数据。 (zh)
gold:hypernym	Form
prov:wasDerivedFrom	wikipedia-en:Byte_pair_encoding?oldid=1103156211&ns=0
page length (characters) of wiki page	3989 (xsd:nonNegativeInteger)
foaf:isPrimaryTopicOf	wikipedia-en:Byte_pair_encoding
is Link from a Wikipage to another Wikipage of	BPE List of algorithms Byte pair compression Grammar induction GPT-3 DTE ROM hacking Straight-line grammar Transformer (machine learning model) OpenAI Re-Pair Sequitur algorithm Dual tile encoding Digram coding
is Wikipage redirect of	Byte pair compression Dual tile encoding Digram coding
is Wikipage disambiguates of	BPE
is foaf:primaryTopic of	wikipedia-en:Byte_pair_encoding

Faceted Search & Find service v1.17_git139 as of Feb 29 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 08.03.3330 as of Mar 19 2024, on Linux (x86_64-generic-linux-glibc212), Single-Server Edition (378 GB total memory, 61 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software