About: Statistically improbable phrase     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : owl:Thing, within Data Space : dbpedia.demo.openlinksw.com associated with source document(s)
QRcode icon
http://dbpedia.demo.openlinksw.com/c/ADuaNEP9RZ

A statistically improbable phrase (SIP) is a phrase or set of words that occurs more frequently in a document (or collection of documents) than in some larger corpus. Amazon.com uses this concept in determining keywords for a given book or chapter, since keywords of a book or chapter are likely to appear disproportionately within that section. Christian Rudder has also used this concept with data from online dating profiles and Twitter posts to determine the phrases most characteristic of a given race or gender in his book Dataclysm. SIPs with a linguistic density of two or three words, adjective, adjective, noun or adverb, adverb, verb, will signal the author's attitude, premise or conclusions to the reader or express an important idea.

AttributesValues
rdfs:label
  • Statistically Improbable Phrases (fr)
  • Statistically Improbable Phrases (ja)
  • Statistically improbable phrase (pt)
  • Statistically improbable phrase (en)
rdfs:comment
  • Les Statistically Improbable Phrases (SIPs) , de l'anglais signifiant littéralement « expressions statistiquement improbables », sont un outil statistique lancé en 2005 par le site web de commerce en ligne Amazon.com pour son programme d'indexation de contenu de livres Search Inside! ; il consiste à comparer le texte de tous les livres indexés, dans le but de trouver pour chacun d'eux un ensemble de syntagmes ou d'expressions qui apparaissent plus souvent que dans les autres livres. (fr)
  • A statistically improbable phrase (統計的にありそうもないフレーズ) は文書内で一部の大規模なコーパスよりも頻繁に出てくるフレーズまたは単語の集まり。本やチャプターのキーワードはセクション内では偏って現れるため、Amazon.comはこの概念を所定の本またはチャプターを決定するキーワードとして使った 。クリスチャン・ラダーは著書『Dataclysm』で一定の人種または性別の最も特徴的なフレーズを決めるためにこのコンセプトを出会い系サイトとツイッターの投稿からのデータと共に使った。 (ja)
  • Statistically improbable phrase (SIP) literalmente "Frases estatisticamente improváveis" (em inglês), são uma ferramenta estatística lançado em 2005 pelo site de e-commerce Amazon.com para o seu programa de indexação de conteúdo "busca dentro dos livros", é comparar o texto de todos os livros indexados a fim de encontrar para cada um deles um conjunto de frases que aparecem mais frequentemente do que em outros livros. (pt)
  • A statistically improbable phrase (SIP) is a phrase or set of words that occurs more frequently in a document (or collection of documents) than in some larger corpus. Amazon.com uses this concept in determining keywords for a given book or chapter, since keywords of a book or chapter are likely to appear disproportionately within that section. Christian Rudder has also used this concept with data from online dating profiles and Twitter posts to determine the phrases most characteristic of a given race or gender in his book Dataclysm. SIPs with a linguistic density of two or three words, adjective, adjective, noun or adverb, adverb, verb, will signal the author's attitude, premise or conclusions to the reader or express an important idea. (en)
dct:subject
Wikipage page ID
Wikipage revision ID
Link from a Wikipage to another Wikipage
sameAs
dbp:wikiPageUsesTemplate
has abstract
  • Les Statistically Improbable Phrases (SIPs) , de l'anglais signifiant littéralement « expressions statistiquement improbables », sont un outil statistique lancé en 2005 par le site web de commerce en ligne Amazon.com pour son programme d'indexation de contenu de livres Search Inside! ; il consiste à comparer le texte de tous les livres indexés, dans le but de trouver pour chacun d'eux un ensemble de syntagmes ou d'expressions qui apparaissent plus souvent que dans les autres livres. (fr)
  • A statistically improbable phrase (SIP) is a phrase or set of words that occurs more frequently in a document (or collection of documents) than in some larger corpus. Amazon.com uses this concept in determining keywords for a given book or chapter, since keywords of a book or chapter are likely to appear disproportionately within that section. Christian Rudder has also used this concept with data from online dating profiles and Twitter posts to determine the phrases most characteristic of a given race or gender in his book Dataclysm. SIPs with a linguistic density of two or three words, adjective, adjective, noun or adverb, adverb, verb, will signal the author's attitude, premise or conclusions to the reader or express an important idea. Another use of SIPs is as a detection tool for plagiarism. (Almost) unique combinations of words can be searched for online, and if they have appeared in a published text, the search will identify where. This method only checks those texts that have been published and that have been digitized online. garden style, praising irregularity in design. For example, a submission by, say, a student that contained the phrase "garden style, praising irregularity in design", might be searched for using Google.com and will yield the original Wikipedia article about Sir William Temple, English political figure and essayist. (en)
  • A statistically improbable phrase (統計的にありそうもないフレーズ) は文書内で一部の大規模なコーパスよりも頻繁に出てくるフレーズまたは単語の集まり。本やチャプターのキーワードはセクション内では偏って現れるため、Amazon.comはこの概念を所定の本またはチャプターを決定するキーワードとして使った 。クリスチャン・ラダーは著書『Dataclysm』で一定の人種または性別の最も特徴的なフレーズを決めるためにこのコンセプトを出会い系サイトとツイッターの投稿からのデータと共に使った。 (ja)
  • Statistically improbable phrase (SIP) literalmente "Frases estatisticamente improváveis" (em inglês), são uma ferramenta estatística lançado em 2005 pelo site de e-commerce Amazon.com para o seu programa de indexação de conteúdo "busca dentro dos livros", é comparar o texto de todos os livros indexados a fim de encontrar para cada um deles um conjunto de frases que aparecem mais frequentemente do que em outros livros. (pt)
gold:hypernym
prov:wasDerivedFrom
page length (characters) of wiki page
foaf:isPrimaryTopicOf
is Link from a Wikipage to another Wikipage of
is Wikipage redirect of
is foaf:primaryTopic of
Faceted Search & Find service v1.17_git147 as of Sep 06 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 08.03.3331 as of Sep 2 2024, on Linux (x86_64-generic-linux-glibc212), Single-Server Edition (378 GB total memory, 54 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software