2022-04-28 04:51:53 +00:00
|
|
|
accessing
|
2022-05-15 13:23:21 +00:00
|
|
|
getPagesContentsFrom: anURL upTo: anInteger
|
2022-04-28 04:51:53 +00:00
|
|
|
"I retroactively get all pages contents until a specified page number.
|
|
|
|
|
|
|
|
TO DO: should this be splitted back to two methods, one getting the page urls and other its content?
|
|
|
|
or do we always be getting the cursor urls and its contents all the time.
|
|
|
|
[ ] Benchmark alternative approaches."
|
|
|
|
| response nextPageLink previousPageLink |
|
|
|
|
|
|
|
|
response := OrderedDictionary new.
|
|
|
|
response at: anURL put: (self documentTreeFor: anURL).
|
|
|
|
previousPageLink := anURL.
|
|
|
|
anInteger - 1 timesRepeat: [ | pageCursor |
|
|
|
|
pageCursor := self pageCursorFor:previousPageLink.
|
|
|
|
nextPageLink := self userNameLink, '/with_replies', pageCursor keys first.
|
|
|
|
response at: nextPageLink put: (XMLHTMLParser parse:nextPageLink asUrl retrieveContents).
|
|
|
|
previousPageLink := nextPageLink
|
|
|
|
].
|
|
|
|
^ response
|