EZRSS Alternative en

By Wiethoofd on Tuesday 16 August 2011 22:00 - Comments (6)
Categories: English, Handig, Views: 7.762

A few of you might know me from the large amount of TV shows I follow. Most users config Sick Beard to download and archive all their episodes, some use EZRSS, the RSS feed provided by EZTV, to use in their favorite torrent client. But when EZRSS is down nothing is downloaded automatically any more. That's why I wrote my own EZTV scraper.

How does it work?
Like most scrapers it fetches the HTML from the specified url, parses it and then returns a RSS feed containing the torrent download links provided by EZTV. I will provide you with the PHP code so you can host it on your own server, add caching etc. whatever you think is necessary. The code can scrape the index page as well as individual show pages (/shows/###/showname on EZTV) with the ?show=### suffix, where the # are the identifying number for the show in the URI.

You need a server with cURL, otherwise you need to change the curl_get_contents to use file_get_contents instead (on line 27).

Show me the magic
The code is 'as is', no guarantees:

PHP: EZRSS Alternative
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
<?php
function match($regex, $str, $i=0) { return (preg_match($regex, $str, $match) == 1) ? $match[$i] : false; }
function curl_get_contents($url) {
 if(empty($url))
    return false;

 $ch = curl_init($url);
 curl_setopt($ch, CURLOPT_TIMEOUT, 15);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
 $data = curl_exec($ch);
 curl_close($ch);
 return ($data) ? $data : false;
}

$ur = 'http://eztv.it';

if(isset($_GET['show']) && preg_match('_^[0-9]+$_', ($show_id = $_GET['show']))) {
 $url         .= '/shows/'.$show_id.'/';
 $body_split    = '#<!-- cache -->#';
 $table      = 4;
} else {
 $body_split    = "#<br />\n<!-- cache -->#";
 $table      = 1;
}

if($html = curl_get_contents($url)) {
 $body_end     = preg_split($body_split, $html, 2);
 $body_only    = preg_split('#<div id="gap"></div>#', $body_end[1], 2);
 $tables       = explode('</table>', $body_only[0]);
 $showtable = $tables[$table];
 $rows = explode('</tr>', $showtable);
 foreach($rows as $i => $row) {
    $tds = explode('</td>', $row);
    foreach($tds as $td) {
        $td = trim(str_replace('<tr name="hover" class="forum_header_border">', '', $td));
        if(!empty($td))
            $tr[$i][] = trim(preg_replace('#<td(.*)class="forum_thread_post(_end)?">#msU', '', $td));
    }
 } unset($tr[0]); unset($tr[1]);

 foreach($tr as $i => $cols) {
    if((count($cols) == 5) && ($i>1)) {
        $title_match         = '#<a href="(.*)" title="(.*) \(([0-9.]+ (G|M)B)\)"#msU';
        $tor[$i]['title']   = match($title_match, $cols[1], 2);
        $size               = match($title_match, $cols[1], 3);
        if(preg_match('_GB$_i', $size))
            $tor[$i]['size'] = floor((1024*1024*1024*trim(str_replace('GB', '', $size))));
        elseif(preg_match('_MB$_i', $size))
            $tor[$i]['size'] = floor((1024*1024*trim(str_replace('MB', '', $size))));
        $tor[$i]['time']     = date('r', strtotime(str_replace(array('d', 'h', 'm', 's', '&gt;'), array(' day', ' hour', ' minute', ' seconds', ''), $cols[3]).' ago'));
        $tor[$i]['downl']   = match('#<a href="(.*)"#msU', $cols[2], 1);
        $tor[$i]['link']     = 'http://eztv.it'.match($title_match, $cols[1], 1);
    }
 }

 $rss[] = '<?xml version="1.0" encoding="UTF-8"?>';
 $rss[] = '<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">';
 $rss[] = '<channel>';
 $rss[] = '<title>EZRSS Alternative</title>';
 $rss[] = '<link>http://eztv.it</link>';
 $rss[] = '<description>Alternative EZRSS feed which scrapes the EZTV website.</description>';
 $rss[] = '<language>en-us</language>';
 foreach($tor as $arr) {
    $r = array();
    $r[] = "\t<item>";
    $r[] = "\t<title><![CDATA[".$arr['title'].']]></title>';
    $r[] = "\t<pubDate>".$arr['time'].'</pubDate>';
    $r[] = "\t<link>".$arr['link'].'</link>';
    $r[] = "\t<enclosure url=\"".$arr['downl'].'" length="'.$arr['size'].'" type="application/x-bittorrent" />';
    $r[] = "\t<guid isPermaLink=\"true\">".$arr['link'].'</guid>';
    $r[] = "</item>";
    $rss[] = implode(PHP_EOL."\t", $r);
 }
 $rss[] = '</channel>'.PHP_EOL.'</rss>';

 header('Content-Type: text/xml; charset=UTF-8');
 echo implode(PHP_EOL, $rss);
} else {
 echo 'No website fetched';
}
?>

Alternatively a nice download link if you're too lazy c/p'ing the code: Download

Last words
I know regex matching HTML is far from future proof, but as long as EZRSS is down this is the best alternative to manually downloading torrents from EZTV! There are alternative feeds, but they don't always contain every torrent/episode which is listed on EZTV.

Volgende: Verbeterde paginanavigatie op (Gathering of) Tweakers.net 10-'11 Verbeterde paginanavigatie op (Gathering of) Tweakers.net
Volgende: The Stanley Parable 08-'11 The Stanley Parable

Comments


By Tweakers user KaptKoek, Tuesday 16 August 2011 20:20

If you cannot guarantee that code work correctly, I not use. For all I know you put a bunch of virus in there and I will never know and when I confront you about it you say it was 'as is'.

By Tweakers user sebastius, Tuesday 16 August 2011 21:01

KaptKoek wrote on Tuesday 16 August 2011 @ 20:20:
If you cannot guarantee that code work correctly, I not use. For all I know you put a bunch of virus in there and I will never know and when I confront you about it you say it was 'as is'.
Het is nog geen 80 regels code, in het vrij goed leesbare PHP. Dat zou je best moeten kunnen screenen. Maar de poster is Wiethoofd, een vrij bekende Tweaker met een prima reputatie. Het zou geen probleem moeten zijn.

It's only about 80 lines of code, in the easily readable PHP syntax. You should be able to screen it for any nasty bits. But seeing that the poster is Wiethoofd, a quite well-known Tweaker with a fine reputation, it shouldn't be a problem.

By Tweakers user Precision, Tuesday 16 August 2011 21:48

KaptKoek wrote on Tuesday 16 August 2011 @ 20:20:
If you cannot guarantee that code work correctly, I not use. For all I know you put a bunch of virus in there and I will never know and when I confront you about it you say it was 'as is'.
What wiethoofd does is basically scraping the content of a website. If tomorrow, they decide to update the layout of their website wiethoofd can't garantee it will still be working. It is just simple php code, assuming you didn't know that, you were not going to install it anyway. I use grabit a program with its own search feature for which you have to pay: http://www.shemes.com/index.php?p=usenetservice (no affiliate link nor am I connected with them.)
@KaptKoek, the code is clean ;)

[Comment edited on Tuesday 16 August 2011 21:49]


By Richard, Tuesday 16 August 2011 23:26

Hey, you can check out http://www.dailytvtorrents.org/ as an alternative (my site). It has some advanced configurable RSS feeds, check out the blog...

By Jorgen, Thursday 18 August 2011 21:11

@Richard: geweldige site !!!!
ByeBye EzTvRss - hoop dat je de traffic aan kan

By Tweakers user bartmans99, Monday 23 July 2012 21:10

Topic is alweer van jaartje geleden. Maar aangezien ezrss alweer dagen plat ligt, vroeg ik me af of een slimme programmeur deze PHP code niet naar Python kan omwerken, zodat we deze Scraper vanuit Sickbeard kunnen gebruiken? Of zijn er bezwaren om dat niet te doen?

Comments are closed