Friday, October 16, 2009

Easy MD5 password hashing in C#

There are a ton of members in System.Security.Cryptography and some of them will compute a hash for you based on whatever algorithm you wish. A common one to use for hashing passwords is MD5.


You'll want to store your passwords in your database as hashed values - don't store the passwords in plain text.
An easy class to use in .NET to hash any string is not in System.Security.Cryptography but rather in System.Web.Security.FormsAuthentication.
Use it like so:

string hashedPassword = FormsAuthentication.HashPasswordForStoringInConfigFile(myPassword, "MD5");

Then, just compare this hashed value with the one you've stored in your database. If they match, the user is authenticated.

Now, before you go asking me how to decrypt the hash and get the original password, you can't. It's a one-way operation. You only hash the password for the purposes of comparing it to an already hashed value.

Hashed passwords are still vulnerable to so-called 'dictionary' attacks, whereby the hacker simply computes the hash for every word in the dictionary and tries all of them. If your password is a simple english word, this sort of brute force attack will work. To thwart this, add a 'salt' value to your password before you hash it and store it in your user database. In other words, instead of storing the hash of 'password' you would store the hash of 'blahpassword' where 'blah' is the salt. This works against dictionary attacks because 'blahpassword' is not a real word. This is even more effective if your salt is a garble, like '4^e#t'. Add the same salt to the user's inputted password before you hash it and compare it to your stored hash.

You can use this same function to hash passwords as you save them in the database. If you manually create the passwords for the users, you could make a simple application to take the password and return to you the hashed value. If you automatically create the password, create a function to accept the password and return the hash string before you save it to the database.

Tuesday, January 27, 2009

Make my structure sortable

In a previous entry I showed how to create a structure. The object I described with my structure is an area of an image on the screen that I'm calling a "Zone". Today I've discovered the need to sort my Zones by the "ordinal" property, but the Zone objects don't inherently know how to be sorted. The solution is to implement an interface, namely IComparable. Any object that implements IComparable can be sorted on anything you want. It's quite easy. First inside the structure, right after the structure declaration and before any variable declaration, you add Implements IComparable. As soon as you add this line, Visual Studio will tell you that need to include a CompareTo method, which goes something like this:


Public Function CompareTo(ByVal obj As Object) As Integer _
Implements System.IComparable.CompareTo
If Not TypeOf obj Is Zone Then
Throw New Exception("Object is not a Zone")
End If
Dim Compare As Zone = CType(obj, Zone)
Dim result As Integer = Me.Ordinal.CompareTo(Compare.Ordinal)

If result = 0 Then
result = Me.Ordinal.CompareTo(Compare.Ordinal)
End If
Return result
End Function


Now I can call the .sort method on any collection of Zone objects.

The code for my CompareTo method came from http://www.knowdotnet.com/articles/sortarraylistofobjects.html

Tuesday, January 20, 2009

Convert a Word 2007 docx to PDF with VB.NET

I recently had to write a program to convert about 500,000 text files into PDFs. My initial thought was to buy a third-party .NET component to produce the PDFs, but I found that Word 2007 has a free plug-in to enable output to PDF or XPS format, and that those functions can be automated with VB.NET. My strategy is to convert each text file into a .docx file so I can apply some style and format and then do the PDF conversion. Here's the basic method:

Your machine must have Word 2007 and the free PDF/XPS converter plugin installed. Download the plug-in HERE. You need not have Adobe Acrobat installed on the machine.


In your VB project, set a reference to the .NET assembly Microsoft.Office.Interop.Word. Put an IMPORTS Microsoft.Office.Interop.Word at the top of your code file to save you some typing.


Assuming you already have a Word 2007 doc on disk, all you have to do is invoke ExportAsFixedFormat on the Word Document object that you create. Here's a bit of sample code:

Dim wordApp As ApplicationClass = New ApplicationClass()
Dim wordDoc As Document = Nothing
Dim strSource As String = "C:\Temp\Test.docx"

Try
wordDoc = wordApp.Documents.Open(strSource)
wordDoc .ExportAsFixedFormat("C:\Temp\Test.pdf", _
WdExportFormat.wdExportFormatPDF, False, _
WdExportOptimizeFor.wdExportOptimizeForOnScreen, _
WdExportRange.wdExportAllDocument)

Catch ex As Exception
MessageBox.Show(ex.Message)
End Try


It works like a charm! It takes a few seconds for each file so for my 500,000 files I am going to have to crank up several machines to get it all done in a reasonable amount of time.

Thursday, January 15, 2009

Structures 101

Also referred to as User-defined Types, or structs, structures are similar to classes. Structures contain composites of other types and are a useful way to represent more complex objects, but simpler than a full-blown class. Sometimes structures are referred to as lightweight classes. The primary difference, though, is that a class defines a reference type and a structure defines a value type. Being value types, structures are stored on the stack instead of the heap. Structures cannot be inherited, and only inherit from System.Valuetype. Classes can inherit from any other reference type. A much better discussion of the differences can be found HERE so I won't go any further on that lest I get in over my head.

The classic example from the .NET Foundations book is the Point structure. It represents a point on the screen and is composed of just two integers, X and Y.

Ok, so here's a little more elaborate example that I used recently. I needed to define a Zone, or an area of an image. Just a rectangle won't cut it because I need my zone to have a couple other attributes. First I need it to have an ordinal value, so that I can process them in a given order and be able to change that order. Second, I need the zone to have a string value I'm calling "type" (not to be confused with a .NET Type), and it may or may not also have a value in a "FigureText" property. Finally, I want to define the default "ToString" method to give me the FigureText, and just for the heck of it I'll define a "ToRectangle" method as well that will give me a rectangle object. Whatever, the details aren't important, but you'll see how to declare one of these beasties.




Structure Zone
Dim _rect As Rectangle
Dim _ordinal As Integer
Dim _type As String
Dim _figureText As String

Public Sub New(ByVal rect As Rectangle, ByVal ordinal As Integer, ByVal type As String, Optional ByVal figureText As String = Nothing)
_rect = rect : _ordinal = ordinal : _type = type : _figureText = figureText
End Sub

Public Property Ordinal() As Integer
Get
Return _ordinal
End Get
Set(ByVal value As Integer)
_ordinal = value
End Set
End Property

Public Property Rect() As Rectangle
Get
Return _rect
End Get
Set(ByVal value As Rectangle)
_rect = value
End Set
End Property

Public Property Type() As String
Get
Return _type
End Get
Set(ByVal value As String)
_type = value
End Set
End Property

Public Property FigureText() As String
Get
Return _figureText
End Get
Set(ByVal value As String)
_figureText = value
End Set
End Property

Public Overrides Function ToString() As String
Return FigureText.ToString
End Function

Public Function ToInteger() As Integer
Return Ordinal
End Function

Public Function ToRectangle() As Rectangle
Return Rect
End Function
End Structure



Note that you have to overridde the ToString method, but you don't have to override ToInteger or ToRectangle because those methods are not defined by default.

Later in my code when I need to make a Zone, I simply declare it like an object and pass in some values for the constructor, which is defined in the struct in Public Sub New

Dim myZone As New Zone(myRectangle, myOrdinal, myType, myFigureText)

I could have used a class and it would work exactly the same but I'm going to be defining hundreds of these beasties and I think the structure will be more efficient.

As always, be careful with the line breaks in my sample code as the blog doesn't format things just right.

Thursday, November 13, 2008

Fix for annoying VS2008 hanging problem

I've been experiencing this problem for a few months now and finally went searching for a solution. Here's the issue - when debugging an application in VS2008 on a machine that is NOT connected to the internet, after closing your application VS hangs for 5-10 seconds and randomly minimizes and maximizes itself, and then often get stuck as "always on top". It's maddening I tell you. Anyway, I found this thread on the msdn forums.

To save you some reading, here's the solution.
Open up your hosts file in windows\system32\drivers\etc\ and add an entry like this:

127.0.0.1      crl.microsoft.com

Hope it works for you. It worked like a charm for me.


 

Sunday, October 12, 2008

My top 11 SQL Server tips

Ok, this is not really .NET but should be useful if you have any involvement with developing database apps. I can't take credit for all of these. I picked most of them up at the Tampa Code Camp event last year. Here goes:

  • 1. Avoid nested views - a view that references another view in its definition. This can have big implications for performance because the SQL under the covers gets very ugly.
  • 2. Avoid "select *" - aside from most likely returning more data than you really need, it's also slower as it makes the optimizer to more work.
  • 3. Use output parameters - rather than return a dataset with one column, or just a few columns, and one row, use output parameters. Less overhead. For one single value, you can use a scalar function as well, but output params allow you to return more than one value
  • 4. Pick the right data type. Mainly to avoid storing and transmitting more data than you need. Avoid Unicode datatypes (nvarchar, nchar, etc.) unless you need them because they take up twice as much space. Also use a smalldatetime unless you need the additional precision of the datetime. Smalldatetime is 4 bytes vs 8 bytes for datetime. Every byte counts, especially when you start talking about very large datasets with billions or trillions of rows.
  • 5. Avoid loops. If you HAVE to use a loop, a while loop is better than a cursor, but if you can find a way to do the job without any loop at all, that is the best option.
  • 6. Avoid dynamic SQL. This is a security thing, too, but from a performance standpoint parameterized SQL is optimized better by the engine. Take two examples - "select * from person where first_name = 'Lucy'" and "select * from person where first_name = @first_name". The first one is less optimizable by the SQL engine. It will get cached but is not reusable, so when the next query comes through as "select * from person where first_name = 'Bill'" it is seen as a different query and the optimizer has to make a new execution plan and put another query in the cache. The parameterized version will be reused for @first_name = 'Bill'" and the engine does not put another copy in the cache or construct a new execution plan. In the worst case, you can end up with many thousands of similar yet slightly different queries in the cache and it can really eat up system resources.
  • 7. Avoid "NOT IN". This is a really useful construct, but you're far better off with using an outer join. With NOT IN, the engine has to do a compare on every record in the second table. Also NOT EXISTS is also a lot better.
  • 8. In SQL Server 2005, if you use GUID's as indentities, use NewSequentialID. This generates the GUID in sequential order and thus it makes a suitable clustered index. Personally, I don't recommend you use GUID's for identities unless you have a specific business reason for doing so. I like the good old int, although it does have it's drawbacks (for instance, someone can easily guess other identities in your table.
  • 9. Use table variables rather than temp tables.
  • 10. Learn to use the profiler, and learn to evaluate execution plans in Server Management Studio.
  • 11. Use indexes wisely.


Hope these help!

Wednesday, September 17, 2008

Forms Recognition and Zonal OCR

So I've been working hard on a project here at OTHS, to create an application to do automated forms recognition and zonal OCR. I wish I were smart enough to do all this on my own, but I'm using the SDK from Pegasus Imaging. It's a very powerful suite of components and very easy to work with thanks to all the sample code they provide. I plan to post more about it in this space in the future, but for right now I just wanted to post about what I'm working on. It's pretty cool stuff.