Parse a srt file

olihya's icon

Hello Maxers!!!
Does someone have a idea to how to parse a subtitle file in srt format like this:
1
00:00:20,430 --> 00:00:24,850Un lapin à diner

2
00:00:54,870 --> 00:00:56,420Comment faire !!!

Best regards

Ben Bracken's icon

check out jit.textfile and jit.str.regexp. There is an example in the jit.str.regexp help patch that does parsing of html.

You could also probably do all of this in javascript too.

-Ben

roccapl's icon

You are lucky.
I'm just working on a patch that parses an srt file using javascript.

Max Patch
Copy patch and select New From Clipboard in Max.

Here it is:

and here's the javascript file: save as "srt.js":

autowatch = 1;
outlets = 3;
var framerate = 25;
var subfile = "";

function setFrameRate(fr) {
    framerate = fr;
    outlet(2, "framerate: " + framerate);
}

function setSubFile(sf) {
    subfile = sf;
    outlet(2, "subtitle file: " + subfile);
}

function readSrt() {
    var reading = "";
    var f = new File(subfile);
    f.open();
    if (f.isopen) {
        reading = "";
        while(f.position < f.eof) {
            reading += f.readline(800) + "n";
        }
    }
    f.close();
    parseSrt(reading);
}

function parseSrt(data) {
    // trim white space start and end
    srt = data.replace(/^s+|s+$/g, '');
    var caplist = [];
    caplist = srt.split('nn');
    for (var i = 0; i < caplist.length; i=i+1) {
        //get timecode, i love regular expressions
        //to learn regexp -> http://www.regular-expressions.info/
        var timecode = caplist[i].match(/(d+):(d+):(d+),(d+)s-->s(d+):(d+):(d+),(d+)/g).join();
        //get subtitle
        subtitle = caplist[i].replace(/(d+)s(d+):(d+):(d+),(d+)s-->s(d+):(d+):(d+),(d+)/g, "").replace(/n/, "");
        //store timecode numbers in an array
        //split(/s/); strips out a whitespace at the beginning
        var burp = timecode.replace(/[:]|(s-->s)|,/g, " ").split(/s/);
        /*convert h+m+s+ms to framecount
        burp[0]/burp[4] --> hour
        burp[1]/burp[5] --> minute
        burp[2]/burp[6] --> second
        burp[3]/burp[7] --> millisecond
        */
        var startTime = Math.round(((burp[0] * 3600000) + (burp[1] * 60000) + (burp[2] * 1000) + Math.round(burp[3] * 1)) / (1000 / framerate));
        var endTime = Math.round(((burp[4] * 3600000) + (burp[5] * 60000) + (burp[6] * 1000) + Math.round(burp[7])) / (1000 / framerate));
        outlet(0, startTime);
        outlet(1, subtitle);
        outlet(0, endTime);
        outlet(1, "");
    }
}

I've tested it with the open movie Elephant's Dream
http://orange.blender.org/download

At present, the js file does not parse any tag inside the subtitles (for italic, bold etc...).

roccapl's icon

Note:

First save the javascript somewhere on you hard drive, then create an empty file on the same location. Save the empty file. Then you can paste the patcher code, or you will loose some patch chords between the js and other objects.

roccapl's icon

another note: you can send the message "leadscale 2.5" to the jit.gl.text3d object to adjust the line height

olihya's icon

Wahoo!!! Bravo Roccapl
Yes i'm lucky.
Thank you very much.
There's two months that i search to parse the srt file.
I try to make a dubbing application.
I will post it on the forum when finished.
Have a nice day.

olihya's icon

Hello dear Maxers,
I have try to make a dubbing software (with your help!!!).
And now i post my work.
Here is the link to download the film: http://video.blendertestbuilds.de/download.blender.org/ED/elephantsdream-480-h264-st-aac.mov
I hope that it can help.
Best regards

1295.DubbingProto.zip
zip
Julien Bayle's icon

@roccapl: can I borrow some part of your patch for a project I have ?

roccapl's icon

@julien bayle: of course, it would be a great pleasure

Julien Bayle's icon

:)
I'm trying to make it working.
It doesn't work because I probably forgot something.

Julien Bayle's icon

little problem in opening my srt file :-(
grrr
dealing with path etc.

Julien Bayle's icon

hello, it worked and works very fine.

I just have a little problem.
The character encoding of my .srt files is very important and makes different render.

I used notepad++ in order to make some tests.
* when the character encoding is ANSI, all my accents (éàÔ etc) aren't rendered well ! the most strange thing: in the coll, when I edit the text, accents are rendered correctly.

* when the character encoding is UTF-8, all my accents (éàÔ etc) are rendered well !
In that case too, in the coll, when I edit the text, accents are rendered correctly and the coll is the same than before.

any ideas?

Matt Raftman's icon

Hi @roccapl , thank you for your script

it works great on PC, however now i'm trying to run it on a mac, but now and then it crashes max.

I found out this crash occurs only when the SRT file contained a degree symbol, which on mac shows up
as a infinity symbol.

replacing them by hand is not a option, since i'm batch parsing the SRT files

so.. i'm looking for some kind of code in the javascript which replaces or removes the infinity symbol

any thoughts ?

roccapl's icon

Hi Matt,
The problem seems related to the character encoding of your srt file. Most srt files are encoded in ISO-8859-1, try UTF-8 for maximum compatibility.
Here are some links:
http://superuser.com/questions/151972/determining-the-encoding-of-a-file-on-mac-os-x
http://superuser.com/questions/151981/converting-the-encoding-of-a-text-file-mac-os-x

Matt Raftman's icon

Thanx @roccapl

this seems to work

iconv -f iso-8859-1 -t utf-8 < inputfilename.srt > outputfilename.srt