Skip to content

ffprobe.wasm #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
loretoparisi opened this issue Nov 24, 2020 · 18 comments
Open

ffprobe.wasm #121

loretoparisi opened this issue Nov 24, 2020 · 18 comments
Labels
enhancement New feature or request

Comments

@loretoparisi
Copy link

Is your feature request related to a problem? Please describe.
Implement ffprobe wasm version.

Describe the solution you'd like
ffprobe is the necessary companion of ffmpeg, needed to analyze media file before processing .

Describe alternatives you've considered
In this simple case I'm using the command line ffprobe via execFile to probe the file

probe = function (fpath) {
      var self = this;
      return new Promise((resolve, reject) => {
        var loglevel = self.logger.isDebug() ? 'debug' : 'warning';
        const args = [
          '-v', 'quiet',
          '-loglevel', loglevel,
          '-print_format', 'json',
          '-show_format',
          '-show_streams',
          '-i', fpath
        ];
        const opts = {
          cwd: self._options.tempDir
        };
        const cb = (error, stdout) => {
          if (error)
            return reject(error);
          try {
            const outputObj = JSON.parse(stdout);
            return resolve(outputObj);
          } catch (ex) {
            self.logger.error("MediaHelper.probe failed %s", ex);
            return reject(ex);
          }
        };
        cp.execFile('ffprobe', args, opts, cb)
          .on('error', reject);
      });
    }//probe

or in this case to seek to position in the media file:

seek = function (fpath, seconds) {
      var self = this;
      return new Promise((resolve, reject) => {
        var loglevel = self.logger.isDebug() ? 'debug' : 'panic';
        const args = [
          '-hide_banner',
          '-loglevel', loglevel,
          '-show_frames',//Display information about each frame
          '-show_entries', 'frame=pkt_pos',// Display only information about byte position
          '-of', 'default=noprint_wrappers=1:nokey=1',//Don't want to print the key and the section header and footer
          '-read_intervals', seconds + '%+#1', //Read only 1 packet after seeking to position 01:23
          '-print_format', 'json',
          '-v', 'quiet',
          '-i', fpath
        ];
        const opts = {
          cwd: self._options.tempDir
        };
        const cb = (error, stdout) => {
          if (error)
            return reject(error);
          try {
            const outputObj = JSON.parse(stdout);
            return resolve(outputObj);
          } catch (ex) {
            self.logger.error("MediaHelper.probe failed %s", ex);
            return reject(ex);
          }
        };
        cp.execFile('ffprobe', args, opts, cb)
          .on('error', reject);
      });
    }//seek

Additional context
Probe media files before processing; seek to media position;

@jeromewu jeromewu added the enhancement New feature or request label Nov 26, 2020
@alexcarol
Copy link

@loretoparisi have you tried using https://github.com/alfg/ffprobe-wasm ?
It probably needs a little polishing, but it should get the job done.

@jaefloo
Copy link

jaefloo commented Feb 7, 2021

I'm currently working on a NodeJS project that needs ffprobe.
Thank you @loretoparisi for the workaround.

@goatandsheep
Copy link

goatandsheep commented May 25, 2021

You don't really need ffprobe to get the info. As long as we can get ffmpeg -i video.mp4, be able to write the output log to a txt file, then parse to JSON, that would be nice. Unfortunately I haven't even been able to do that.

@loretoparisi
Copy link
Author

@goatandsheep I'm not sure that ffmpeg -i can output all info as ffprobe, may be yes.

@captn3m0
Copy link

I was looking to get the "file checksum" for Audible AAX files and while it doesn't show up with ffmpeg -i, it does work with ffmpeg -v info -i *.aax (or other verbosity values - quiet,panic,fatal,error,warning,info,verbose,debug,trace). So if what you're looking for isn't in the default output, I'd suggest dialing it up.

Is there a way to get ffmpeg -i output as a nice JSON? That would be nice.

@loretoparisi
Copy link
Author

loretoparisi commented Aug 24, 2021

@captn3m0 thanks I will have a look at -v info. For JSON, in ffprobe it's '-print_format', 'json', in ffmpeg I never tried id.

@captn3m0
Copy link

-print_format is a ffprobe only option :(

@brunomsantiago
Copy link

Any plan on support that?
It would be awesome to get format and stream metadata on browser. Something like that:
ffprobe -hide_banner -loglevel fatal -show_error -show_format -show_streams -print_format json video.mp4
ffprobe is so much faster than ffmpeg because it don't try to read the entire file.

@crazoter
Copy link

crazoter commented Apr 28, 2022

Recently I had a use case where I needed to perform a duration check on media files on the browser before uploading to the server. I'll put my approach here while I built my POC as it is somewhat related.

Use case
For context, my use case is as follows:

  • User uploads a video/audio file into a file input.
  • Browser takes file and somehow derives its duration.

ffprobe-wasm is not the full ffprobe program

  • My first idea is to use ffprobe-wasm, but I quickly discovered that the program that is being executed is not actually ffprobe, but ffprobe-wasm-wrapper.cpp which is essentially an attempt to rewrite ffprobe to be more emscripten friendly, but only contains a fraction of the utility that ffprobe offers. The application as-is was insufficient for my use case as I needed to verify audio files as well.
  • I decided against enhancing ffprobe-wasm-wrapper.cpp at the time because it would essentially mean manually porting ffprobe to wasm, something I lacked both the time and expertise to do. What I instead explored is to compile the entire ffprobe into wasm, something which I managed to successfully accomplish. My fork of the ffprobe-wasm repo can be found here: https://github.com/crazoter/ffprobe-wasm. The messy & uncleaned steps I took are as follows:
  1. First, I cloned https://github.com/alfg/ffprobe-wasm. If you're trying to replicate the steps, you should refer to my fork which includes the updated dockerfile.
  2. I noticed that the ffmpeg version was very old. I updated the dockerfile to resolve that and use the latest ffmpeg (now using git instead of the snapshotted tarball). I then built their wasm module via docker-compose run ffprobe-wasm make.
  3. I then jumped into the running docker container with an interactive bash. I navigated to the ffmpeg directory that was downloaded to the tmp file, and manually compiled ffprobe.o and fftools.cmdutils. If I remember correctly, I executed:
emmake make fftools/ffprobe.o fftools/cmdutils.o
emcc --bind \
	-O3 \
	-L/opt/ffmpeg/lib \
	-I/opt/ffmpeg/include/ \
	-s EXTRA_EXPORTED_RUNTIME_METHODS="[FS, cwrap, ccall, getValue, setValue, writeAsciiToMemory]" \
	-s INITIAL_MEMORY=268435456 \
	-lavcodec -lavformat -lavfilter -lavdevice -lswresample -lswscale -lavutil -lm -lx264 \
	-pthread \
	-lworkerfs.js \
	-o ffprobe.js \
        ffprobe.o cmdutils.o
  • You can also perform emmake make build to build everything.
  • I am not too familiar with the flags to be honest, so this may be sub-optimal.
  • The instructions above may not be completely correct as I did not refine the process and rerun it to verify it. As a precaution I added the resulting files into the repo in my-dist.
  1. To test the files, I adapted emscripten's https://github.com/emscripten-core/emscripten/blob/main/src/shell_minimal.html to use the compiled ffprobe_g and made some modifications to the generated JS code to call the main function directly (ffprobe_g.max.js is beautified from ffprobe_g. As to why there's ffprobe_g and ffprobe, I did not investigate the reason). To run the file locally, I used Servez.
  • However, as I was not proficient at wasm, there were a few problems with my prototype. I imagine someone with more experience will be able to resolve these issues:
    1. There is no way to "reset" the args passed into ffprobe. Once the args are passed into the application, passing a non-empty args array into subsequent calls to main (without refreshing the page) will "stack" the new args with the old args, causing issues. My workaround was to use the same file name & flags for all main calls, and not pass args in subsequent main calls, which worked for our use case. YMMV.
    1. The interface was not as clean as ffmpeg-wasm as the logs are async and there is no indicator to specify when the application has finished running.
    1. The generated code assumed the existence of SharedArrayBuffer even though it was not necessary to process most files with it. It is thus necessary to guard parts of the code using typeof SharedArrayBuffer !== "undefined" to prevent the code from failing if you intended to use ffprobe without having to change your https headers.
  • Still, for anyone interested in porting ffprobe to wasm, I think this is a step in the right direction and can be worth exploring. I am actually quite curious why the original authors of ffprobe-wasm didn't just compile the whole file.

What I ended up using

  • Due to the uncertain reliability of my (somewhat successful) ffprobe prototype, I decided to go with using ffmpeg-wasm instead.
  • Handling async issues with ffmpeg-wasm was easier even though the data was coming separately from the logger as you could await for the execution to be completed.
  • The concern however is that I'd have to read the entire file into memory. Using 1GB of memory to read a 1GB audio file on the browser is unacceptable for my use case, even if the memory is released immediately afterward. This is a problem independent of ffmpeg-wasm, but instead caused by how the emscripten file system is used. After all, we'd have to somehow bring the file into MEMFS before ffmpeg can even start processing it, and normally we just bring the whole file into MEMFS. What if we just bring in a slice of that?
  • So I decided to instead use the Blob.slice API to obtain the first 5MB (arbitrary number) of data from the file, and then pass that into the emscripten file system using the fetchFile API provided by ffmpeg. The idea is that the metadata would be at the start of the file, and then we'll have some excess data for ffmpeg to guess the format of the file if necessary.
// Toy example
const maxSliceLength = Math.min(1024*1024*5, oFiles[nFileId].size);
const slicedData = oFiles[nFileId].slice(0, maxSliceLength);
(async () => {
  ffmpeg.FS('writeFile', 'testfile', await fetchFile(slicedData));
  await ffmpeg.run('-i', 'testfile', '-hide_banner');
  ffmpeg.FS('unlink', 'testfile');
})();
  • This resolved the memory issue, but introduced a new problem; since ffmpeg is only seeing the first 5MB of the file, it has to guess the duration of some files using bitrate. This thus involved a bit more engineering to identify if the estimation is performed, and if so, perform the estimation ourselves using the actual file size:
    • One way is to estimate by bitrate. Personally this is a last resort because the difference in estimated & actual file size can be ridiculous.
    • Second (more reliable) way is to take the estimated duration from ffmpeg and multiply it by maxSlicedLength / file.size.
  • edit: ffmpeg & ffprobe will throw an error for some files if it can't read the whole file (e.g. mp4, 3gp). More specifically, the dreaded Invalid data found when processing input For these types of files, there are 2 options currently available:
  • I settled for this solution as I didn't need an exact value for the duration (a malicious actor would be able to bypass a browser-based check anyway).

Hopefully this write-up will benefit someone looking for a similar solution, or someone hoping to port ffprobe to wasm.

@brunomsantiago
Copy link

@crazoter What amazing post! Thank you so much. Got thrilled at each paragraph for in the end discover this amazing mediainfo.js, which apparently suits perfect for my needs. I am very happy now!

@alfg
Copy link

alfg commented Apr 28, 2022

@crazoter Nice writeup! Thanks for checking out ffprobe-wasm:

Still, for anyone interested in porting ffprobe to wasm, I think this is a step in the right direction and can be worth exploring. I am actually quite curious why the original authors of ffprobe-wasm didn't just compile the whole file.

I chose not to compile the FFprobe program, but instead to use libav directly to re-implement the functionality of FFprobe as an API via Wasm as an experiement, rather than the CLI through the browser. A different approach since you can interface with libavcodec and libavformat directly and provide minimal results. Though it's a bit more work to re-implement the full functionality of FFProbe, of course.

@tfoxy
Copy link

tfoxy commented May 26, 2022

Hi everyone! I created an npm package a few months back. Repo is here: https://github.com/tfoxy/ffprobe-wasm . It comes with TS definitions.

I needed to use ffprobe in browser and Node.js so that I could read metadata without being affected by file size, so I tried to package the code at https://github.com/alfg/ffprobe-wasm so that it could be used as a library. The output tries to mimic the command

ffprobe -hide_banner -loglevel fatal -show_format -show_streams -show_chapters -show_private_data -print_format json

I don't know much about Emscripten or libavcodec/libavformat, so there are some properties that are missing. But hopefully this can be enough for some people.

EDIT: @crazoter thanks for providing those alternatives. In one project I only need the duration, so that solution of using HTMLMediaElement.duration is great! Also didn't know about mediainfo.js. Only thing that is not clear to me is if it needs to read the whole file to extract some of the metadata.

@jaruba
Copy link

jaruba commented Dec 10, 2022

@tfoxy mediainfo.js does not need to read the entire file, but imo acts strangely in regards to how much it needs to read, see: buzz/mediainfo.js#108
I see that your ffprobe-wasm project only supports FS, but not HTTP(s)? Are there any plans to support HTTP(s) too? (my interest is in retrieving chapter data of videos in nodejs, not the browser)

@piesuke
Copy link

piesuke commented Feb 22, 2024

How is the progress of this issue?

@loretoparisi
Copy link
Author

loretoparisi commented Mar 8, 2024

Hi everyone! I created an npm package a few months back. Repo is here: https://github.com/tfoxy/ffprobe-wasm . It comes with TS definitions.

I needed to use ffprobe in browser and Node.js so that I could read metadata without being affected by file size, so I tried to package the code at https://github.com/alfg/ffprobe-wasm so that it could be used as a library. The output tries to mimic the command

ffprobe -hide_banner -loglevel fatal -show_format -show_streams -show_chapters -show_private_data -print_format json

I don't know much about Emscripten or libavcodec/libavformat, so there are some properties that are missing. But hopefully this can be enough for some people.

EDIT: @crazoter thanks for providing those alternatives. In one project I only need the duration, so that solution of using HTMLMediaElement.duration is great! Also didn't know about mediainfo.js. Only thing that is not clear to me is if it needs to read the whole file to extract some of the metadata.

That's amazing! I was able to build and run the docker container and the application. But I'm struggling to import the generated module ffprobe-wasm.js:

.
├── ffprobe-wasm.js
├── ffprobe-wasm.wasm
├── ffprobe-wasm.worker.js

within NodeJS, In fact if I try to load the module as usal

const Module = require('./ffprobe-wasm.js');
    const versions = {
        libavutil:  Module.avutil_version(),
        libavcodec:  Module.avcodec_version(),
        libavformat:  Module.avformat_version(),
    };

I get an TypeError: Module.avutil_version is not a function error

@izogfif
Copy link

izogfif commented Aug 29, 2024

So... any news? In my case, I need to get the exact timestamps of key frames (I-frames or whatever they're called). I was only able to find solution that requires ffprobe

ffprobe -select_streams v -show_packets -show_entries packet=pts_time,flags -of compact=p=0 -v quiet tua.mkv | grep flags=K > tua.frames.txt

There is a mention that ffmpeg can do something with key frames via command

ffmpeg -skip_frame nokey -i test.mp4 -vsync vfr -frame_pts true out-%02d.jpeg

but extracting, converting to JPEG, and then deleting thousands of files only because you need their file names seems silly to me.

@izogfif
Copy link

izogfif commented Aug 29, 2024

*Insert "Fine, I'll do it myself" picture*
@loretoparisi Here, take a look: you can now call ffmpeg.ffprobe command (just the way you would call ffmpeg.exec one).

@EliotSlevin
Copy link

I got suuuuuper stuck on how to actually get data out of .ffprobe, and figured others also stuck might end up here as well. Basically, you can't just call .ffprobe like a log and expect a string back, you need to go through the motions of reading the output from a file.

To actually get the JSON out you need to do something like this:

    await ffmpeg.ffprobe([
      "-print_format",
      "json",
      "-show_format",
      "input_song_or_whatever.mp3",
      "-o",
      "info.json",
    ]);
    const infoData = (await ffmpeg.readFile("info.json"));
    const convertedInfoBlob = new Blob([infoData.buffer]);
    const info = JSON.parse(await convertedInfoBlob.text());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests