Skip to content
This repository was archived by the owner on Feb 4, 2021. It is now read-only.

Move Dataset API from telemetry-batch-view to its own package on maven #1

Merged
merged 1 commit into from
Feb 7, 2017

Conversation

vitillo
Copy link
Contributor

@vitillo vitillo commented Jan 27, 2017

See Bug 1283446. Since I was at it I completely rewrote the test suite using fakes3. I am planning to add CI integration before this gets merged.

@mreid-moz
Copy link
Collaborator

Does the Heka-reading code work with the gzipped format a-la this pr and bug 1302264?

}

it can "read gzipped files" in {
/* Not supported yet https://github.com/jubos/fake-s3/pull/52
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced PR was closed recently, so perhaps gzip is supported? I am guessing fake-s3 is your answer to the more general testing issues discussed in mozilla/telemetry-batch-view#126.

Copy link
Contributor Author

@vitillo vitillo Jan 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the testing infrastructure to address the more general testing issues with telemetry-batch-view.

While the Dataset API supports gzipped files (it's the same code we are using in telemetry-batch-view) fake-s3 doesn't just yet. In other words we can't write the test for it but we will be able to do so very soon.

@vitillo vitillo force-pushed the second branch 8 times, most recently from 9932be4 to e7baf9f Compare February 2, 2017 14:22
@codecov-io
Copy link

codecov-io commented Feb 2, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@22473f2). Click here to learn what that means.

@@            Coverage Diff            @@
##             master       #1   +/-   ##
=========================================
  Coverage          ?   99.15%           
=========================================
  Files             ?        5           
  Lines             ?      119           
  Branches          ?       21           
=========================================
  Hits              ?      118           
  Misses            ?        1           
  Partials          ?        0
Impacted Files Coverage Δ
...ain/scala/com/mozilla/telemetry/heka/Dataset.scala 100% <100%> (ø)
...c/main/scala/com/mozilla/telemetry/heka/File.scala 100% <100%> (ø)
...rc/main/scala/com/mozilla/telemetry/utils/S3.scala 100% <100%> (ø)
...la/com/mozilla/telemetry/utils/ObjectSummary.scala 100% <100%> (ø)
...ain/scala/com/mozilla/telemetry/heka/package.scala 90.9% <90.9%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22473f2...e5ad631. Read the comment docs.

Copy link

@maurodoglio maurodoglio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, but there are a couple of things that worry me:
1- from a licence standpoint it may be easier to use the moto standalone server rather than fakeS3
2- iiuc users of the library need to run the fakeS3 server by hand before they can run the tests. This should be at least put in a README file and eventually automated as part of the test suite setup. The latter can probably wait though

@mreid-moz
Copy link
Collaborator

Can we add a link somewhere - either in the Heka-related code or in the README - to where the src/main/protobuf/heka.proto comes from?

Copy link
Collaborator

@mreid-moz mreid-moz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added a few nits, +1 on Mauro's comment about documenting the fake S3 server for tests.

}

if (!schema.dimensions.contains(Dimension(dimension))) {
throw new Exception(s"The dimension $dimension doesn't exists")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/exists/exist/

import java.io.InputStream
import org.xerial.snappy.Snappy

object File{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a space before the { (and can we add a style check for that?)

import org.apache.spark.{SparkConf, SparkContext}
import org.scalatest.{BeforeAndAfterAll, FlatSpec, Matchers}

class DatasetTest extends FlatSpec with Matchers with BeforeAndAfterAll{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add a space before {

<check level="warning" class="org.scalastyle.file.FileLineLengthChecker" enabled="true">
<parameters>
<parameter name="maxLineLength"><![CDATA[160]]></parameter>
<parameter name="tabSize"><![CDATA[4]]></parameter>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation in the scala code is all 2 spaces - should we set the tab size to 2 as well?

@vitillo vitillo force-pushed the second branch 6 times, most recently from 373987b to 3679330 Compare February 7, 2017 14:18
@vitillo
Copy link
Contributor Author

vitillo commented Feb 7, 2017

All done.

@vitillo vitillo merged commit 1f0e1d1 into master Feb 7, 2017
@vitillo vitillo self-assigned this Feb 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants