Extends the RowContainer functionality to provide random access
getAt(i)
.
It extends RowContainer behavior in the following ways:
- You must continue to call first to signal the transition from writing to the
Container to reading from it.
- As rows are being added, positions at which a spill occurs is captured as a
BlockInfo object. At this point it captures the offset in the File at which the current
Block will be written.
- When first is called: we associate with each BlockInfo the File Split that it
occurs in.
- So in order to read a random row from the Container we do the following:
- Convert the row index into a block number. This is easy because all blocks are
the same size, given by the
blockSize
- The corresponding BlockInfo tells us the Split that this block starts in. Also
by looking at the next Block in the BlockInfos list, we know which Split this block ends in.
- So we arrange to read all the Splits that contain rows for this block. For the first
Split we seek to the startOffset that we captured in BlockInfo.
- So after reading the Splits, all rows in this block are in the 'currentReadBlock'
- We track the span of the currentReadBlock, using
currentReadBlockStartRow,blockSize
. So if a row is requested in this span,
we don't need to read rows from disk.
- If the requested row is in the 'last' block; we point the currentReadBlock to
the currentWriteBlock; the same as what RowContainer does.
- the
getAt
leaves the Container in the same state as a
next
call; so a getAt and next calls can be interspersed.