How to Read UTF8 Strings with RandomAccessFile

4 years ago by in Design, How To, IO, Tips, Utilities Tagged: , , , , ,

Reading a UTF-8 string using RandomAccessFile (Java Doc) is easier than one thinks as shown in the following coder fragment.

  public static String lockAndReadFile(final File file, final String encoding, final int bufferSize) throws IOException {
    // The approximate number of bytes required
    final int approxBufferSize = (int) Math.min(Integer.MAX_VALUE, file.length());

    // We need to open this file in read/write mode to be able to lock it
    try (RandomAccessFile raf = new RandomAccessFile(file, "rw");
        FileLock lock = raf.getChannel().lock();
        final ByteArrayOutputStream out = new ByteArrayOutputStream(approxBufferSize)) {

      final byte[] buffer = new byte[bufferSize];
      for (int length; (length = raf.read(buffer)) != -1;) {
        out.write(buffer, 0, length);
      }

      return new String(out.toByteArray(), encoding);
    }
  }

The above example, does the following:

  1. Calculates the file length in order to create a byte buffer of the right size, and minimise the number of times the buffer has to resize.
        final int approxBufferSize = (int) Math.min(Integer.MAX_VALUE, file.length());
    
  2. Creates an instance of the RandomAccessFile and acquires a FileLock (Java Doc).

        try (RandomAccessFile raf = new RandomAccessFile(file, "rw");
            FileLock lock = raf.getChannel().lock();
    

    This is done from within the try with resources (Tutorial) so that we do not have to worry about releasing and closing the FileLock and RandomAccessFile

  3. Creates an instance of ByteArrayOutputStream (Java Doc), which will be used to saved the file contents.

            final ByteArrayOutputStream out = new ByteArrayOutputStream(approxBufferSize)) {
    
  4. Read all file into the buffered array (instance of ByteArrayOutputStream).

          final byte[] buffer = new byte[bufferSize];
          for (int length; (length = raf.read(buffer)) != -1;) {
            out.write(buffer, 0, length);
          }
    

    We cannot read parts of the file as some special characters, such as ö, are represented by more than one byte. Therefore if we happen to read half of this letter, the we will corrupt the output. Please note the word Köln uses 5 bytes and not 4 bytes as many think.

    This has a limitation of the file size. We cannot read very large files using this method.

  5. Finally create a new string with the bytes read before using the given encoding.

          return new String(out.toByteArray(), encoding);
    

    The try with resources will close all three resources before exiting.

The code listed above is available at: https://java-creed-examples.googlecode.com/svn/io/How to Read UTF8 Strings with RandomAccessFile/. The example does not contain the whole code. The readers can download or view all code from the above link.

Albert Attard

Albert Attard is a Java passionate and technical lead at a research group. You can find him on . Over the past years Albert worked on various Java projects including traditional server/client applications, modular applications, large data handling applications and concurrent data manipulation applications to name a few. He has a BSc degree from the University of London (Homepage) and an MSc Information Security with the same university. His MSc thesis (Book) received the 2012 SearchSecurity.co.UK award (Website).

Leave a Comment


Time limit is exhausted. Please reload the CAPTCHA.