First batch job on Podcastpedia.org with EasyBatch

Remember the first batch job for Podcastpedia.org, presented in Spring Batch Tutorial with Spring Boot and Java Configuration… There, I would read submitted podcasts from a .csv file to add them to the Podcastpedia.org directory (database). Well today I will present how I automated the creation of this kind of input file, with the help of Easy Batch. Why EasyBatch? Because, after seeing my initial post, I was contacted by its founder, Mahmoud Ben Hassine, to have a look at Easy Batch and give it a try. I did, and I am happy about that. Read on to find out why…

1. Job description

The batch job is fairly simple: reads database entries containing the submitted podcasts from one table and generates a properly formatted .csv file

2. Project setup

Reading from a database and writing to a flat file requires the following libraries in the classpath, which also bring the transitive dependency easybatch-core:

<dependency>
	<groupId>org.easybatch</groupId>
	<artifactId>easybatch-flatfile</artifactId>
	<version>${easybatch.version}</version>
</dependency>
<dependency>
	<groupId>org.easybatch</groupId>
	<artifactId>easybatch-jdbc</artifactId>
	<version>${easybatch.version}</version>
</dependency>

The current version is 2.2.0.

3. Implementation

3.1. Job launcher

For simplicity I chose to launch the job from a main method:

package org.podcastpedia.batch.jobs.generatefilefromsuggestions;

import java.io.File;
import java.io.FileWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

import org.easybatch.core.api.EasyBatchReport;
import org.easybatch.core.impl.EasyBatchEngine;
import org.easybatch.core.impl.EasyBatchEngineBuilder;
import org.easybatch.jdbc.JdbcRecordReader;

public class JobLauncher {

	private static final String OUTPUT_FILE_HEADER = "FEED_URL; IDENTIFIER_ON_PODCASTPEDIA; CATEGORIES; LANGUAGE; MEDIA_TYPE; UPDATE_FREQUENCY; KEYWORDS; FB_PAGE; TWITTER_PAGE; GPLUS_PAGE; NAME_SUBMITTER; EMAIL_SUBMITTER";

	public static void main(String[] args) throws Exception {

		 //connect to MySql Database
		Class.forName("com.mysql.jdbc.Driver").newInstance();
		Connection connection = DriverManager.getConnection(System.getProperty("db.url"), System.getProperty("db.user"), System.getProperty("db.pwd"));

		FileWriter fileWriter = new FileWriter(getOutputFilePath());
		fileWriter.write(OUTPUT_FILE_HEADER + "\n");

		// Build an easy batch engine
		EasyBatchEngine easyBatchEngine = new EasyBatchEngineBuilder()
		.registerRecordReader(new JdbcRecordReader(connection, "SELECT * FROM ui_suggested_podcasts WHERE insertion_date >= STR_TO_DATE(\'" + args[0] + "\', \'%Y-%m-%d %H:%i\')" ))
		.registerRecordMapper(new CustomMapper())
		.registerRecordProcessor(new Processor(fileWriter))
		.build();

		// Run easy batch engine
		EasyBatchReport easyBatchReport = easyBatchEngine.call();

		//close file writer
		fileWriter.close();
		System.out.println(easyBatchReport);
	}

	private static String getOutputFilePath() throws Exception {

		//create if not existent a "weeknum" directory in the given "output.directory.base" directory
		Date now = new Date();
		Calendar calendar = Calendar.getInstance();
		calendar.setTime(now);
		int weeknum = calendar.get(Calendar.WEEK_OF_YEAR);
		String targetDirPath = System.getProperty("output.directory.base") + String.valueOf(weeknum);		
		File targetDirectory = new File(targetDirPath);
		if(!targetDirectory.exists()){
			boolean created = targetDirectory.mkdir();
			if(!created){
				throw new Exception("Target directory could not be created");
			}
		}

		//build the file name based on current time to be placed in the "weeknum" directory  
		DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH.mm");
		String outputFileName = "suggestedPodcasts " + dateFormat.format(now) + ".csv";

		String filePath = targetDirPath + "/" + outputFileName;		
		return filePath;
	}

}

Let’s have a look at the different components from the Launcher class:

3.2. Connect to MySQL

Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection connection = DriverManager.getConnection(System.getProperty("db.url"), System.getProperty("db.user"), System.getProperty("db.pwd"));

When using the JDBC outside of an application server, the DriverManager class manages the establishment of connections. You have to:

“Specify to the DriverManager which JDBC drivers to try to make Connections with. The easiest way to do this is to use Class.forName() on the class that implements the java.sql.Driver interface. With MySQL Connector/J, the name of this class is com.mysql.jdbc.Driver. With this method, you could use an external configuration file to supply the driver class name and driver parameters to use when connecting to a database.” [2]

Once the MySQL driver has been registered, you can obtain a Connection to the database by calling the DriverManager.getConnection() method with given MySQL database URL.

Note – make sure you also have the MySQL JDBC connector in your classpath:

<!-- MySQL JDBC connector -->
<dependency>
	<groupId>mysql</groupId>
	<artifactId>mysql-connector-java</artifactId>
	<version>5.1.31</version>
</dependency>

3.3. Create an Easy Batch engine

Creating an Easy Batch engine is straightforward and can be done through the EasyBatchEngineBuilder API as follows

// Build an easy batch engine
EasyBatchEngine easyBatchEngine = new EasyBatchEngineBuilder()
.registerRecordReader(new JdbcRecordReader(connection, "SELECT * FROM ui_suggested_podcasts WHERE insertion_date >= STR_TO_DATE(\'" + args[0] + "\', \'%Y-%m-%d %H:%i\')" ))
.registerRecordMapper(new CustomMapper())
.registerRecordProcessor(new Processor(fileWriter))
.build();

This is actually the whole batch configuration for the job. Short and clear:

  • register a record reader, in our case a JdbcRecordReader, for which you need to specify the connection created earlier and SQL string to execute against it
  • register a custom record mapper
  • register a processor

Note: You don’t need to iterate over the Jdbc ResultSet, Easy Batch will do it for you.

3.3.1. Mapping the database records

To map the database object to the domain object I defined a CustomMapper:

public class CustomMapper implements RecordMapper<SuggestedPodcast>{

	@SuppressWarnings("rawtypes")
	@Override
	public SuggestedPodcast mapRecord(Record record) throws Exception {
        JdbcRecord jdbcRecord = (JdbcRecord) record;
        ResultSet resultSet = jdbcRecord.getRawContent();

        SuggestedPodcast response = new SuggestedPodcast();
        response.setMetadataLine(resultSet.getString("metadata_line"));

		return response;
	}

}

For that I had to implement the RecordMapper interface with its single mapRecord() method.

3.3.2. Processing records

Easy Batch lets you define your batch processing business logic through the RecordProcessor interface. This is where you define what to do for each record. The processor is registered in the line:

.registerRecordProcessor(new Processor(fileWriter))

My custom processor  extends the AbstractRecordProcessor, which is a abstract record processor implementation to extend by clients that do not need to implement RecordProcessor.getEasyBatchResult():

public class Processor extends AbstractRecordProcessor<SuggestedPodcast>{

	 private FileWriter fileWriter;

	 public Processor(FileWriter fileWriter) {
		 this.fileWriter = fileWriter;
	 }

	@Override
	public void processRecord(SuggestedPodcast record) throws Exception {
		 fileWriter.write(record.getMetadataLine() + "\n");
		 fileWriter.flush();		
	}

}

The “logic” is very simple, it just writes new lines at the end of the file.

3.4. Execution and reporting

Easy Batch engine records several metrics during record processing and provides a complete report at the end of execution. This report is an instance of the EasyBatchReport class and contains the following information:

  •     The batch start and end times
  •     The batch duration
  •     The data source name
  •     The total records number
  •     The number of filtered, ignored and rejected records
  •     The number of records processed with errors
  •     The number of records successfully processed
  •     The record processing time average
  •     And the computation result if any

You obtain an Easy Batch report when running the Easy Batch engine:

// Run easy batch engine
EasyBatchReport easyBatchReport = easyBatchEngine.call();

Check out the Easy Batch user guide for other report formatting options.

Conclusion

For this easy job I had to implement, Easy Batch proved to be a simple, yet powerful batch framework, with good samples and documentation. Before starting my next batch job I will definetely have a look at Easy Batch first, before considering the “mightier” Spring Batch framework. But, to quote the author from Easy Batch, from Spring Batch vs Easy Batch: a Hello World comparison

“Choose the right tool for the right job! If your application requires advanced features like retry on failure, remoting or flows, then go for Spring Batch (or an implementation of JSR352). If you don’t need all this advanced stuff; then Easy Batch can be very handy to simplify your batch application development. “

Resources

Source Code – GitHub

  • podcastpedia-easybatch – project presented in this tutorial. Please make a pull request for any improvement proposals

Web

  1. EasyBatch.org
    1. Tutorials
    2. User guide
    3. Customers ETL tutorial
  2. Connecting to MySQL Using the JDBC DriverManager Interface

Podcastpedia image

Adrian Matei

Creator of Podcastpedia.org and Codingpedia.org, computer science engineer, husband, father, curious and passionate about science, computers, software, education, economics, social equity, philosophy - but these are just outside labels and not that important, deep inside we are all just consciousness, right?

How to redirect domain to www url with nginx

Snippet from nginx config file that redirects all requests (http and https) to the www URL Continue reading