-
Notifications
You must be signed in to change notification settings - Fork 9
Move Dataset API from telemetry-batch-view to its own package on maven #1
Conversation
Does the Heka-reading code work with the gzipped format a-la this pr and bug 1302264? |
} | ||
|
||
it can "read gzipped files" in { | ||
/* Not supported yet https://github.com/jubos/fake-s3/pull/52 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The referenced PR was closed recently, so perhaps gzip is supported? I am guessing fake-s3 is your answer to the more general testing issues discussed in mozilla/telemetry-batch-view#126.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the testing infrastructure to address the more general testing issues with telemetry-batch-view.
While the Dataset API supports gzipped files (it's the same code we are using in telemetry-batch-view) fake-s3 doesn't just yet. In other words we can't write the test for it but we will be able to do so very soon.
9932be4
to
e7baf9f
Compare
Codecov Report
@@ Coverage Diff @@
## master #1 +/- ##
=========================================
Coverage ? 99.15%
=========================================
Files ? 5
Lines ? 119
Branches ? 21
=========================================
Hits ? 118
Misses ? 1
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, but there are a couple of things that worry me:
1- from a licence standpoint it may be easier to use the moto standalone server rather than fakeS3
2- iiuc users of the library need to run the fakeS3 server by hand before they can run the tests. This should be at least put in a README file and eventually automated as part of the test suite setup. The latter can probably wait though
Can we add a link somewhere - either in the Heka-related code or in the README - to where the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, added a few nits, +1 on Mauro's comment about documenting the fake S3 server for tests.
} | ||
|
||
if (!schema.dimensions.contains(Dimension(dimension))) { | ||
throw new Exception(s"The dimension $dimension doesn't exists") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/exists/exist/
import java.io.InputStream | ||
import org.xerial.snappy.Snappy | ||
|
||
object File{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add a space before the {
(and can we add a style check for that?)
import org.apache.spark.{SparkConf, SparkContext} | ||
import org.scalatest.{BeforeAndAfterAll, FlatSpec, Matchers} | ||
|
||
class DatasetTest extends FlatSpec with Matchers with BeforeAndAfterAll{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add a space before {
scalastyle-config.xml
Outdated
<check level="warning" class="org.scalastyle.file.FileLineLengthChecker" enabled="true"> | ||
<parameters> | ||
<parameter name="maxLineLength"><![CDATA[160]]></parameter> | ||
<parameter name="tabSize"><![CDATA[4]]></parameter> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation in the scala code is all 2 spaces - should we set the tab size to 2 as well?
373987b
to
3679330
Compare
All done. |
See Bug 1283446. Since I was at it I completely rewrote the test suite using fakes3. I am planning to add CI integration before this gets merged.