Implementing minimal functionality for machine training. Should be moved to Polyglot once the repository reactivates.
This commit is contained in:
parent
fbe8f8d1c3
commit
dff10a6705
28
src/MiniDocs/Array.extension.st
Normal file
28
src/MiniDocs/Array.extension.st
Normal file
@ -0,0 +1,28 @@
|
||||
Extension { #name : #Array }
|
||||
|
||||
{ #category : #'*MiniDocs' }
|
||||
Array >> bagOfWordsFor: sentenceArray [
|
||||
"An utility machine training little algorithm.
|
||||
Inspired by https://youtu.be/8qwowmiXANQ?t=1144.
|
||||
This should be moved probably to [Polyglot](https://github.com/pharo-ai/Polyglot),
|
||||
but the repository is pretty innactive (with commits 2 or more years old and no reponse to issues).
|
||||
Meanwhile, it will be in MiniDocs.
|
||||
|
||||
Given the sentence := #('hello' 'how' 'are' 'you')
|
||||
and the testVocabulary := #('hi' 'hello' 'I' 'you' 'bye' 'thank' 'you')
|
||||
then
|
||||
|
||||
testVocabulary bagOfWordsFor: sentence.
|
||||
|
||||
Should give: #(0 1 0 1 0 0 0)
|
||||
"
|
||||
| bagOfWords |
|
||||
bagOfWords := Array new: self size.
|
||||
bagOfWords doWithIndex: [:each :i | bagOfWords at: i put: 0 ].
|
||||
sentenceArray do: [:token | |index|
|
||||
index := self indexOf: token.
|
||||
index > 0
|
||||
ifTrue: [bagOfWords at: index put: 1]
|
||||
].
|
||||
^ bagOfWords
|
||||
]
|
Loading…
Reference in New Issue
Block a user