Implementing minimal functionality for machine training. Should be moved to Polyglot once the repository reactivates.
This commit is contained in:
parent
fbe8f8d1c3
commit
dff10a6705
28
src/MiniDocs/Array.extension.st
Normal file
28
src/MiniDocs/Array.extension.st
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
Extension { #name : #Array }
|
||||||
|
|
||||||
|
{ #category : #'*MiniDocs' }
|
||||||
|
Array >> bagOfWordsFor: sentenceArray [
|
||||||
|
"An utility machine training little algorithm.
|
||||||
|
Inspired by https://youtu.be/8qwowmiXANQ?t=1144.
|
||||||
|
This should be moved probably to [Polyglot](https://github.com/pharo-ai/Polyglot),
|
||||||
|
but the repository is pretty innactive (with commits 2 or more years old and no reponse to issues).
|
||||||
|
Meanwhile, it will be in MiniDocs.
|
||||||
|
|
||||||
|
Given the sentence := #('hello' 'how' 'are' 'you')
|
||||||
|
and the testVocabulary := #('hi' 'hello' 'I' 'you' 'bye' 'thank' 'you')
|
||||||
|
then
|
||||||
|
|
||||||
|
testVocabulary bagOfWordsFor: sentence.
|
||||||
|
|
||||||
|
Should give: #(0 1 0 1 0 0 0)
|
||||||
|
"
|
||||||
|
| bagOfWords |
|
||||||
|
bagOfWords := Array new: self size.
|
||||||
|
bagOfWords doWithIndex: [:each :i | bagOfWords at: i put: 0 ].
|
||||||
|
sentenceArray do: [:token | |index|
|
||||||
|
index := self indexOf: token.
|
||||||
|
index > 0
|
||||||
|
ifTrue: [bagOfWords at: index put: 1]
|
||||||
|
].
|
||||||
|
^ bagOfWords
|
||||||
|
]
|
Loading…
Reference in New Issue
Block a user