Archive

Archive for the ‘programming’ Category

Simple HashMap

July 9th, 2009 Daniello No comments

I’ve implemented HashMapJS, a simple hash map like implementation.
The project is located at github.

Here is the code with the usage examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
/*
=====================================================================
   @license MIT
   @author Daniel Kwiecinski <daniel.kwiecinski@lambder.com>
   @copyright 2009 Daniel Kwiecinski.
   @end
=====================================================================
*/
var HashMap = function() {
  this.initialize();
}
 
HashMap.prototype = {
  hashkey_prefix: "<#HashMapHashkeyPerfix>",
  hashcode_field: "<#HashMapHashcodeField>",
  hashmap_instance_id: 0,
 
  initialize: function() {
    this.backing_hash = {};
    this.code = 0;
    this.hashmap_instance_id += 1;
    this.instance_id = this.hashmap_instance_id;
  },
 
  hashcodeField: function() {
    return this.hashcode_field + this.instance_id;
  },
  /*
   maps value to key returning previous assocciation
   */
  put: function(key, value) {
    var prev;
 
    if (key && value) {
      var hashCode;
      if (typeof(key) === "number" || typeof(key) === "string") {
        hashCode = key;
      } else {
        hashCode = key[this.hashcodeField()];
      }
      if (hashCode) {
        prev = this.backing_hash[hashCode];
      } else {
        this.code += 1;
        hashCode = this.hashkey_prefix + this.code;
        key[this.hashcodeField()] = hashCode;
      }
      this.backing_hash[hashCode] = value;
    }
    return prev;
  },
  /*
   returns value associated with given key
   */
  get: function(key) {
    var value;
    if (key) {
      var hashCode;
      if (typeof(key) === "number" || typeof(key) === "string") {
        hashCode = key;
      } else {
        hashCode = key[this.hashcodeField()];
      }
      if (hashCode) {
        value = this.backing_hash[hashCode];
      }
    }
    return value;
  },
  /*
   deletes association by given key.
   Returns true if the assocciation existed, false otherwise
   */
  del: function(key) {
    var success = false;
    if (key) {
      var hashCode;
      if (typeof(key) === "number" || typeof(key) === "string") {
        hashCode = key;
      } else {
        hashCode = key[this.hashcodeField()];
      }
      if (hashCode) {
        var prev = this.backing_hash[hashCode];
        this.backing_hash[hashCode] = undefined;
        if (prev !== undefined)
          success = true;
      }
    }
    return success;
  }
}
 
/*
 =====================================================================
       usage
 =====================================================================
*/
 
 
// creation
 
var my_map = new HashMap();
 
// insertion
 
var a_key = {};
var a_value = {struct: "structA"};
var b_key = {};
var b_value = {struct: "structB"};
var c_key = {};
var c_value = {struct: "structC"};
 
my_map.put(a_key, a_value);
my_map.put(b_key, b_value);
var prev_b = my_map.put(b_key, c_value);
 
// retrieval
 
if(my_map.get(a_key) !== a_value){
  throw("fail1")
}
if(my_map.get(b_key) !== c_value){
  throw("fail2")
}
if(prev_b !== b_value){
  throw("fail3")
}
 
// deletion
 
var a_existed = my_map.del(a_key);
var c_existed = my_map.del(c_key);
var a2_existed = my_map.del(a_key);
 
if(a_existed !== true){
  throw("fail4")
}
if(c_existed !== false){
  throw("fail5")
}
if(a2_existed !== false){
  throw("fail6")
}
 
 
// primitive types keys 
var d_value = {struct: "structD"};
my_map.put(1, d_value);
 
if (my_map.get(1) !== d_value) {
  throw("fail7")
}

The keys and values can be arbitrary javascript objects.
There is no requirements on objects used as keys or values.

The mechanism is trivial. For every key there is generated unique id (per HashMap instance).
That id is injected to the key object under high unlikely to collide field name ;)

That id is then used to keying in the underlying baking standard javascript association object.

Notes:
1. This HashMap implementation is not thread safe.
OK I know javascript is not multithreaded but it can have concurrent activities (e.g. setTimeout)
This is enough to have race conditions.

2. The mentioned high unlikely to collide name is not a problem as log as there is no
massively concurrent activities.

Roadmap:
1. Make the HashMap implementation thread safe by applying The Wallace Variation of Lamport’s bakery algorithm.
http://www.polyglotinc.com/AJAXscratch/Mutex/mutualExclusion.html

Bon Appétit,
Daniel Kwiecinski

Categories: programming Tags:

Erlang’s Dynamism

June 30th, 2009 Daniello 3 comments

This is more verbose answer to @bubbafat twitt-question. It started lkie this:

  • @bubbafat: Functions used by spawn to start a process must be exported. Why doesn’t erl compiler error when this is missed?  #erlang
  • @danielllo: @bubbafat ‘cos it is possible to define and load  new modules later on during runtime in #erlang
  • @bubbafat: @danielllo Thx. Do you mean redefining the module at runtime or that the func might resolve in a different module loaded later? #erlang
  • @danielllo: You have many ways of referencing #erlang module:function not known at compile time.
    1. you can load the precompiled module at later time from a path not being provided to compiler.
    2. you can use M:F(Args) function invocation in #erlang, any of M, F, Args being variables dynamically referencing module, function and argument list.
    3. you can construct erlang AST tree programmaticaly, compile it at runtime and load the resulting beam.
    4. you can acheive the p.3 result using a helper tools such as LFE, Smerl or  “Dynamic” module generation with compile-time macros
    5. … and I’m sure there is more ;)
Categories: erlang, programming Tags:

How to redirect in webmachine

June 17th, 2009 Daniello 2 comments

Recently I was looking for an example on how to do redirect in webmachine. Unfortunately I haven’t found one. So I started figuring it out by myself. After try and error using brilliant wmtrace_resource It turned out to be trivial ;). Here is the example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
%% @doc Example of redirect webmachine_resource.
 
-module(redirect_resource).
-export([init/1, resource_exists/2, moved_temporarily/2, previously_existed/2]).
 
-include_lib("webmachine/include/webmachine.hrl").
 
init([]) -> {ok, undefined}.
 
moved_temporarily(ReqData, Context) ->
  Site = wrq:path_info(site, ReqData),
  Location = base64:decode(Site),
  {{true, Location}, ReqData, Context}.
 
previously_existed(ReqData, Context) -> {true, ReqData, Context}.
 
resource_exists(ReqData, Context) -> {false, ReqData, Context}.

The resource is mapped in my dispatch.conf as
{["redirect", site], redirect_resource, []}.
You can do a request like http://host.com/redirect/aHR0cDovL2xhbWJkZXIuY29t
The redirect_resource will interpret the last token in the path as base64 encoded location to redirect to.

Scraping google results in Erlang - the sequel (how to do it securely)

June 13th, 2009 Daniello No comments

In the last post I presented how to use google search service from erlang. But what if we want to do it securely and anonymously? We can use TOR. We can use TOR for that purpose even without installing TOR on our machine by using scroogle.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-module(scroogle_scrapper).
 
-compile(export_all).
 
-define(SCROOGLE_URL, "https://ssl.scroogle.org/cgi-bin/nbbw.cgi").
-define(SCROOGLE_PEM, "[PATH_TO_SSL_SCROOGLE_ORG_PEM_CERTIFICATE]").
 
start() ->
  inets:start(),
  ssl:start().
 
fetch_scroogle_results(Q) ->
  % We want binary as a result
  Options = [{body_format, binary}],
  HTTPOptions = [{ssl, [{cacertfile, ?SCROOGLE_PEM},{verify, 2}]}],
  ReqBody = "Gw="++url_encoder:encode(Q)++"&n=1",
  Request = {?SCROOGLE_URL, [], "application/x-www-form-urlencoded", ReqBody},
  case http:request(post, Request, HTTPOptions, Options) of
    {ok, {{"HTTP/1.1",200,"OK"}, _, Body}} -> Body;
    {error,Error} -> {error,Error}
  end.
 
 
parse(B, RE, Fun) ->
  case re:run(B, RE, [global, caseless, unicode, dotall, multiline, {capture, all, binary}]) of
    {match, Matches} ->
      lists:map(
        fun(Match) -> Fun(Match) end,
      Matches);
    nomatch -> []
  end.
 
parse_results(B) ->
  RE = "[0-9]+?\.[[:space:]]+(<a href=.+?</a>)",
  parse(B, RE, fun parse_result/1).
 
 
parse_result(GResult) ->
  RE = "<a href="(.*?)".*?>(.*?)</a>",
  Fun = fun([_,Href,Name]) ->
    {Href,Name}
  end,
  parse(GResult, RE, Fun).

You can use it as described in previous post.

Scraping google results in Erlang

June 11th, 2009 Daniello 1 comment

Currently from both legal and technical reasons my full music albums search app (to be published soon) is using external search indices rather building its own.
Among those I plan to use is google search engine. The app needs it to get links to pages containing links to mp3 streams my app is passing to the user.
So effectively I’m going to build google search result scrapper. Here is how it could look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-module(google_scrapper).
 
-compile(export_all).
 
-define(GOOGLE_URL, "http://www.google.co.uk/search?hl=en&btnG=Search&meta=&q=").
 
start() -> inets:start().
 
fetch_google_results(Q) ->
  % In case of redirect lets erlang take care of this for us
  HTTPOptions = [{autoredirect, true}],
  % We want binary as a result
  Options = [{body_format, binary}],
  Headers = [
    % Let's be Firefox ;)
    {"User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.10) Gecko/2009042315 Firefox/3.0.10"},
    {"Accept" , "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"},
    % I want the result be UTF-8 encoded
    {"Accept-Charset", "utf-8;q=0.7,*;q=0.7"}
  ],
  Request = {?GOOGLE_URL++url_encoder:encode(Q), Headers},
  case http:request(get, Request, HTTPOptions, Options) of
    {ok, {{"HTTP/1.1",200,"OK"}, _, Body}} -> Body;
    {error,Error} -> {error,Error}
  end.
 
 
parse(B, RE, Fun) ->
  case re:run(B, RE, [global, caseless, unicode, dotall, multiline, {capture, all, binary}]) of
    {match, Matches} ->
      lists:map(
        fun(Match) -> Fun(Match) end,
      Matches);
    nomatch -> []
  end.
 
parse_google_results(B) ->
  RE = "<\!--m-->(.*?)<\!--n-->",
  parse(B, RE, fun parse_google_result/1).
 
 
parse_google_result(GResult) ->
  RE = "<li class=g.*?<h3.*?<a href="(.*?)".*?>(.*?)</a>",
  Fun = fun([_,Href,Name]) ->
    {Href,Name}
  end,
  parse(GResult, RE, Fun).

As you could notice I’ve used url_encoder:encode/1 function. The standard OTP doesn’t contain one but you can get it either from gist.github.com/127917 ,
ibrowse or
yaws

You can use it by typing:

google_scrapper:start().
B = google_scrapper:fetch_google_results("google images copyright").
R = google_scrapper:parse_google_results(B).

The result should be similar to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[
...
[{<<"http://www.lawdit.co.uk/reading_room/room/view_article.asp?name=../articles/Google%20Sued%20"...>>,
<<"<em>Google</em> Sued for <em>Copyright</em> Infringement through Use of &#39;<em>Google<"...>>},
{<<"http://www.lawdit.co.uk/reading_room/room/view_article.asp?name=../articles/Google%20Sue"...>>,
<<"<em>Google</em> Sued for <em>Copyright</em> Infringement through Use of &#39;<em>Goo"...>>}],
[{<<"http://www.mahalo.com/Google_Images_Copyright_Infringement">>,
<<"<em>Google Images Copyright</em> Infringement - Mahalo">>},
{<<"http://www.mahalo.com/Google_Images_Copyright_Infringement">>,
<<"<em>Google Images Copyright</em> Infringement - Mahalo">>}],
[{<<"http://www.goossip.com/2008/10/google-images-loses-two-copyright-cases.html">>,
<<"<em>Google Images</em> loses two <em>copyright</em> cases in Germany - Goossip <"...>>},
{<<"http://www.goossip.com/2008/10/google-images-loses-two-copyright-cases.html">>,
<<"<em>Google Images</em> loses two <em>copyright</em> cases in Germany - Gooss"...>>}]
…]
Categories: erlang, programming Tags:

providing static content in webmachine

June 2nd, 2009 Daniello 3 comments

Recently I was migrating my web app from plain vanilla mochiweb into webmachine. One caveat I met was how to serve static content. Mochiweb as a default serves static content from its ./priv/www directory but I haven’t found similar convention in webmachine. Indeed, there is no one. Instead one have to create static content resource. I created the webmachine static resource just for that purpose feel free to use it in your projects.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
%% @author author <daniel.kwiecinski@lambder.com>
%% @copyright Daniel Kwiecinski.
%% @doc Static webmachine resource.
 
-module(static_resource).
-export([init/1, allowed_methods/2,
         content_types_provided/2, resource_exists/2, last_modified/2, provide_content/2]).
 
-include_lib("webmachine/include/webmachine.hrl").
-include_lib("kernel/include/file.hrl").
-record(context, {docroot,fullpath,fileinfo}).
 
init(DocRoot) -> {ok, #context{docroot=DocRoot}}.
 
resource_exists(ReqData, Context) ->
  case get_full_path(Context#context.docroot, wrq:disp_path(ReqData)) of
    undefined -> {false, ReqData, Context};
    Path ->
      case filelib:is_regular(Path) of
        true ->
          case file:read_file_info(Path) of
            {ok, FileInfo} ->
              {true, ReqData, Context#context{fileinfo=FileInfo}};
            {error, _} ->
              {false, ReqData, Context}
          end;
        _ -> {false, ReqData, Context}
      end
  end.
 
content_types_provided(ReqData, Context) ->
    Path = get_full_path(Context#context.docroot, wrq:disp_path(ReqData)),
    {[{webmachine_util:guess_mime(Path), provide_content}], ReqData, Context#context{fullpath=Path}}.
 
allowed_methods(ReqData, Context) -> {['HEAD', 'GET'], ReqData, Context}.
 
last_modified(ReqData, Context) ->
  {(Context#context.fileinfo)#file_info.mtime, ReqData, Context}.
 
provide_content(ReqData, Context) ->
  util:puts(Context),
  {ok, Value} = file:read_file(Context#context.fullpath),
  {Value, ReqData, Context}.
% ------------------ PRIVATE ------------------------
 
 
get_full_path(DocRoot, Path) ->
   case mochiweb_util:safe_relative_path(Path) of
     undefined -> undefined;
     RelPath ->
      FullPath = filename:join([DocRoot, RelPath]),
      case filelib:is_dir(FullPath) of
        true ->
          filename:join([FullPath, "index.html"]);
        false ->
          FullPath
      end
    end.

Bon Appétit

JRuby + Clojure’s Immutable Data Structures = Easy to maintain, application data-model.

January 22nd, 2009 Daniello 2 comments

Implementing an application with rich data-model which can be updated by multiple UI controls, many concurrent threads with undo/redo functionality may be somewhat cumbersome. In order to ease this task, the functional programming paradigm with the immutable data structures turned out to be useful.

Because all good developers are lazy, one should seek for reuse rather than reinventing required tools, especially when there is good existing one. I tried to follow that path. Since we are using JRuby as our language of choice here at Trampoline, I decided to look more closely at clojure’s immutable data structures. It is straightforward to use Java classes from JRuby which is described in many places on the web already (here, here & here). The unknown to me was how can I use clojure’s objects from Jruby. Apparently clojure data structures are delivered as pre-compiled java classes and no runtime interpretation/compilation of clojure scripts is needed. The task turned out to be very easy.

The simple implementation of graph data structure with no deletion functionality looks as simple as:

The simple implementation of graph data structure with no deletion functionality looks as simple as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module Java::ClojureLang::IPersistentMap
  def keys
    keySet()
  end
  def has_key?(key)
    containsKey(key)
  end
  def include?(key)
    containsKey(key)
  end
  def key?(key)
    containsKey(key)
  end
  def member?(key)
    containsKey(key)
  end
end

In order to have Clojure collections look more like Ruby ones one can define aliases for their methods:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class BasicGraph
 
  include_package 'clojure.lang'
 
  attr_reader :nodes, :edges
 
  def self.empty
    BasicGraph.new(PersistentHashMap::EMPTY, PersistentHashMap::EMPTY)
  end
 
  def add_node(node, id)
    BasicGraph.new(@nodes.assoc(id, node), @edges)
  end
 
  def add_edge(source_id, target_id, edge)
    if (@nodes.containsKey(source_id) && @nodes.containsKey(target_id))
      BasicGraph.new(@nodes, @edges.assoc([source_id, target_id], edge))
    else
      self
    end
  end
 
 
  private
 
  attr_accessor :nodes, :edges
 
  def initialize(nodes, edges)
    @nodes = nodes
    @edges = edges
  end
end

Unfortunately (or fortunately due to different contract) we can not do it with all the methods. Particularly with mutating ones. That’s because Ruby’s = (assign operator) semantics is to return the value being assign. It is analogous to []= method as well. So even if we redefine the []=(key, val) method so that the method returns the updated version of the collection, the Ruby interpreter will step into the scene and wrap the whole method, so that it eventually returns val. Anyway, whether this is good or bad is the topic for a whole other post.