Friday, July 26, 2013

Puppet ~ a beginners concept guide (Part 4) ~ Where is my Data

parts of the "Puppet ~ Beginner's Concept Guide" to read before ~
Part#1: intro to puppet, Part#2: intro to modules, and Part#3: modules much more.

Puppet
beginners concept guide (Part 4)


Where is my Data?

When I started my Puppet-ry, the examples I used to see had all configuration data buried inside the DSL code of manifests, people were trying to use inheritance to push down data. Then got to see a design pattern in puppet manifests keeping out separate parameters manifest for configuration variables. Then came along the External Data lookup via CSV files as a Puppet function. Then with enhancements in puppet and other modules came along more.

Below are few usable to fine ways utilizing separate data sources within your manifests,


Here, we will see usage styles of data for Puppet Manifests, Extlookup CSV, Hiera, Plug-in Facts and PuppetDB.

params-manifest:


It is the very basic way of separating out data from your functionality code, and the preferred way for in-future growing value-set type of data. It will keep it separate from the code since start. Once the requirement is at a level to have varied value to inferred based on environment/domain/fqdn/operatingsystem/[any-facter], it can be extracted to any preferred ways given below and just looked-up here. That would avoid changing the main (sub)module-code.
[ Gist-Set Externalize into Params Manifest: https://gist.github.com/3683955 ]
Say you are providing httpd::git sub-module for httpd module placing a template generated config file using params placed data...
```

File: httpd/manifests/git.pp
it includes the params submodule to access the data

File: httpd/templates/mynode.conf.erb

File: httpd/manifests/params.pp
it actually is just another submodule to only handle data

Use it: run_puppet.sh

```
_

extlookup-csv:


If you think your data would suit to a (key,value) CSV format being extracted to data files.Puppet need to be told the location for CSV files need to be looked up for key, and fetch the value assigned to it in that file.
Names given to these CSV files would matter to Puppet while looking up the values from all present CSV files. Puppet need to be given hierarchy order for these file-names to look for the key and the order could involve variable names.

For E.g. say you have a CSV by name of HOSTNAME, ENVIRONMENT and a common file, with hierarchy specified in respective order too. Then Puppet will first look for the queried Key in CSV by HOSTNAME, if not found looks up in ENVIRONMENT named file and after not finding it there goes looking into common file. If it doesn't find the key in any of those files, it returns the default value if specified in the 'extlookup(key, default_value)' method like this. If there is no default value also, Puppet will raise an exception for no value to return.

[ Gist-Set Externalize into Hierarchical CSV Files: https://gist.github.com/3684010 ]

It's the same example as for params with a flavor of ExtData. Here you'll notice a 'common.csv' external data file providing a default set of values. Then there is also a 'env_testnode.csv' file overriding the only required changed value. Now as in 'site.pp' file, precedence of 'env_%{environment}' file is higher than 'common', the 'httpd::git' would look-up all values first from 'env_testnode.csv' and if not found there would goto 'common.csv'. Hence would end-up overriding 'httpd_git_url' value from 'env_testnode.csv'.
```
```

extlookup() method used here is available as a Puppet Parser Function, you would read more in Part#5 Custom Puppet Plug-Ins on how to create your own functions
_


Hiera is a pluggable hierarchical data storage for Puppet. It was started to provide a better external data storage support than Ext-lookup feature with data formats other than CSV too.

This brings in the essence of ENC for data retrieval without having to write one.

Data look-up happens in a hierarchy provided by configuration with self scope resolution mechanism.

It enables Puppet to fetch data from varied external data sources using it's different backends (like local files, redis, http protocol) which can be added on to if needed.
The 'http' backend in turn enables support for data store from any service (couchdb, riak, web-app or so) to provide data.

File "hiera.yaml" from Gist below is an example of hiera configuration to be placed in puppet's configuration directory. The highlights of this configuration are ":backends:", backend source and ":hierarchy:". Multiple backend can be used at same time, their order of listing mark their order of look-up. Hierarchy configures the order for data look-up by scope.

Then depending on what backend you have added, you need to add their source/config to look-up data at.
Here we can see configuration for using local "yaml", "json" files. Look-up data from Redis server (it will set-up datasets for redis usage for current example) with authentication in place. And looking up data from any "http" service with hierarchy as the ":paths:" value.
You can even use GPG protected data as backend, but that is a bit messy to use.

Placing ".yaml" and ".json" from Gist at intended provider location.
The running "use_hiera.sh" would make you show the magic from this example on hiera.

```Gist
```
[Gist-Set Using Hiera with Masterless Puppet set-up: https://gist.github.com/abhishekkr/6133012 ]
_

plugin-facts:


Every system has its own set of information facter (
http://projects.puppetlabs.com/projects/facter) by default made available to puppet. Puppet also enable DevOps people to set custom facter to be used in modules.
The power of these computed Facters is they can use full ruby-power to use local/remote plain/encrypted data over REST/Database/API/anyway available channel.
These require the power of Puppet Custom Plug-Ins (http://docs.puppetlabs.com/guides/custom_facts.html). The ruby file doing this would go at 'MODULE/lib/puppet/facter' and would get loaded by the 'pluginsync=true' in action.
Way to set a Facter in such Ruby code is just...
my_value = 'all ruby code to compure it'
Facter.add(mykey) do
  setcode do
    my_value
  end
end
.....all the rest of code there need to compute the value to be set, or even key-set.

[Gist-Set Externalize Data receiving as Facter: https://gist.github.com/3684968 ]

Same 'httpd::git' example revamped to use Custom Facter as 
```
```
There is also another way to provide a Facter in Puppet Catalog, that can be done by providing an Environment variable with capitalized Facter name pre-fixed by 'FACTER_' and the value which it's supposed to have.
For E.g. # FACTER_MYKEY=$my_value puppet apply --modulepath=$MODULEPATH -e "include httpd::git"
_

puppetdb:


It's a beautiful addition to Puppet component set. Something that have been missing for long and possibly the thing because of which I delayed this post by half year.
It enables the 'storeconfig' power without the Master, provides a support of trusted DB for infrastructure-related data needs and thus best suited of all.

To set-up 'puppetdb' on a node follow the PuppetLabs has a nice documentation.
To set-up a decent example for master-less puppet mode, follow the given steps

Place the 2 '.conf' and 1 '.yaml' file in Puppet's configuration directory.
The shell script would prepare the node with PuppetDB service for masterless puppet usage scenario.

Puppet config setting storeconfig to 'puppetdb' enables saving of exported resources to it. The 'reports' config their would push the puppet apply reports to the database.
PuppetDB config makes Puppet aware of the host and port to connect database at.
The facts setting on routes.yaml enable PuppetDB to be used in a masterless mode.

```
```

[Gist-Set Using PuppetDB with Masterless Puppet set-up: https://gist.github.com/abhishekkr/6114760 ]

Now running anything say like...
puppet apply -e 'package{"vim": }'
and beautiful to that 'export resources' would work like a charm using PuppetDB.
The puppet.conf accompanied will make reports dumped to PuppetDB as well.

_


There's a fine article on the same by PuppetLabs...

Friday, May 31, 2013

Testing Chaos with Automated Configuration Management solutions


No noise making.

But let's be real, think of the count of community contributed (or mysterious closed-and-sold 3rd Party) services, frameworks, library and modules put to use for managing your ultra-cool self-healing self-reliant scalable Infrastructure requirements. Now with so many cogs collaborating in the infra-machine, a check on their collaboration seems rather mandatory like any other integration test for your in-house managed service. 
After all that was key idea behind having automated configuration management itself.

Now the utilities like Puppet/Chef have been out there accepted and used by dev & ops folks for quite some time now.
But the issue with the initially seen amateur testing styles is it evolved from the non-matching frame of 'Product' oriented unit/integration/performance testing. 'Product' oriented testing focus more on what happens inside the coded logic and less on how user gets affected by product.
Most of the initial tools released for testing logic developed in Chef/Puppet were RSpec/Cucumber inspired Product testing pieces. Now for the major part of installing a package, restarting a service or pushing artifacts these tests are almost non-required as the main functionality for per-say installing package_abc is already tested inside the framework being used.
So coding to "ask" to install package_abc and testing if it has been asked seems futile.

That's the shift. The logic developed for Infrastructure acts as a glue to all other applications created in house and 3rd party. Here in Infrastructure feature development there is more to test for the effect it has on the it's users (software/hardware) and less on internal changes (dependencies and dynamic content). Now the stuff in parentheses here means a lot more than seems... let's get into detail of it.

Real usability of Testing is based on keeping sanctity of WHAT needs to be tested WHERE.


Software/Hardware services that collaborate with the help of Automated Infrastructure logic needs major focus of testing. These services can be varying from the
  • in-house 'Product', that is the central component you are developing
  • 3rd Party services it collaborates with,
  • external services it utilizes for what it doesn't host,
  • operating system that it supports and Ops-knows what not.

Internal changes mainly revolve around
  • Resources/Dependencies getting called in right order and grouped for specific state.
  • It also relates to correct generation/purging of dynamic content, that content can itself range as
    • non-corrupt configuration files generated of a template
    • format of sent configuration data from one Infra-component to another for reflected changes
    • dynamically creating/destroying service instances in case of auto-scalable infrastructure


One can decide HOW, on ease and efficiency basis.


Unit Tests work for the major portion of 'Internal Changes' mentioned before using chefspecrspec-chef, rspec-puppet like libraries are good enough. They can very well test the dependency order and grouping management as well as the different data effect on non-corrupt configuration generation from templates.


Integration Tests in this perspective are a of a bit interesting and evolutionary nature. Here we have to ensure the "glue" functionality we talked about for Software/Hardware service is working properly. These will confirm that every type of required machine role/state can be achieved flawlessly, call them 'State Generation Test'. They also need to confirm the 'Reflected Changes Test' across Infra-component as mentioned in Internal changes.
Now utilities like test-kitchen/docker in collaboration with vagrant, docker, etc. help placing them in your Continuous Integration pipeline. This would even help in testing same service across multiple linux distros if that's the plan to support.
Library 'ServerSpec' is also a little nifty piece to write quick final state check scripts.
Then final set of Integration Testing is implemented in form of Monitoring on your all managed/affecting Infrastructure components. This is the final and ever-running Integration Test.


Performance Tests, yes even they are required for it. Tools like ChaosMonkey enable you to enable your Infra to be self-healing and auto-scalable. Should be load-test noticing dynamic containers count and behavior if auto-scalability is a desired functionality too.

Wednesday, April 24, 2013

Beginner's Guide to OpenStack : Basics of Nova [Part 2]

parts of Beginner's Guide to OpenStack to read before this ~

[Part.2 Basics of Nova] Beginner's Guide to OpenStack

# Nova?
It's the main fabric controller for IaaS providing Cloud Computing Service by OpenStack. Took its first baby steps in NASA. Contributed to OpenSource and became most important component of OpenStack.
It built of multiple components performing different tasks turning End User's API request into a virtual machine service. All these components run in a non-blocking message based architecture, and can be run off from same or different locations with just access to same message queue service.

---

# Components?

Nova stores states of virtual machines in a central database. It's optimal for small deployments. Nova is moving towards multiple data stores with aggregation for high scale requirements.








  • Nova API : supports OpenStack Compute API, Amazon's EC2 API and powerful Admin API (for privileged users). It's used to initiate most of orchestration activities and policies (like Quota). It gets communicated over HTTP, converts the requests to commands further contacting other components via Message Broker and HTTP for ObjectStore. It's a WSGI application which routes and authenticates requests.
  • Nova Compute : worker daemon taking orders from its Message Broker and perform virtual machine create/delete tasks using Hypervisor's API. It also updates status of its tasks in Database.
  • Nova Scheduler : decides which Nova Compute Host to allot for virtual machine request.
  • Network Manager : worker daemon picking network related tasks from its Message Broker and performing those. OpenStack's Quantum now with Grizzly release can be opted instead of nova-network. Tasks like maintaining IP Forwarding, Network Bridges and VLANs get covered.
  • Volume Manager : handles attach/detach of persistent block storage volumes to virtual machines (similar to Amazon's EBS). This functionality has been extracted to OpenStack's Cinder. It's an ISCSI solution utilizing Logical Volume Manager. Network Manager doesn't interfere in Cinder's tasks but need to be setup for Cinder to be used.
  • Authorization Manager : interfaces authorized APIs usage for Users, Projects and Roles. It communicates with OpenStack's KeyStone for details.
  • WebUI : OpenStack's Horizon communicates with Nova API for Dashboard interfacing.
  • Message Broker : All components of Nova communicate with each other in a non-blocking callback-oriented manner using AMQP protocol well supported by RabbitMQ, Apache QPid. There is also emerging support for ZeroMQ integration as Message Queue. It's like central task list shared and updated by all Nova components.
  • ObjectStore : It's a simple file-based storage (like Amazon's S3) for images. This can be replaced with OpenStack's Glance.
  • Database : used to gather build times, run states of virtual machines. It has details around instance types available, networks available (if nova-network), and projects. Any database supported by SQLAlchemy can be used. It's central information hub for all Nova components.


---

# API Style
Interface is mostly RESTful. Routes (python re-implementation of Rails route system) packages maps URIs to action methods on controller classes.
Each HTTP Request to Compute requires specific authentication credentials required. Multiple authentication schemes can be allowed for a Compute node, provider determines the one to be used.

---

# Threading Model
Uses Green Thread implementation by design using eventlet and greenlet libraries. This results into single process thread for O.S. with it's blocking I/O issues. Though single reduces race conditions to great extent, to eliminate them further in suspicious scenarios use decorator @lockutils.synchronized('lock_name') over methods to be protected from it.
If any action is long-running, it should have methods with desired process-state location triggering eventlet context switch. Placing something like following code-piece will switch context to waiting threads, if any. And will continue on current thread without any delay if there is no other thread in wait.
from eventlet import greenthread
greenthread.sleep(0)
MySQL query uses drivers blocking main process thread. In Diablo release a thread pool was implemented but removed because of trade-off for advantages over bugs.

---

# Filtering Scheduler
In short it's the mechanism used by 'nova-scheduler' to choose the worthy nova-compute host for new required virtual machine to be spawned upon. It prepares a dictionary of unfiltered hosts and weigh their costing for creating required virtual machine(s) request. Then it chooses the least costly host.
Hosts are weighted based on the configuration options for virtual machines.
It's a better practice for customer to ask for large count of required instances together as each request computes weight.

---

# Message Queue Usage
Nova components use RPC to communicate each other via Message Broker using PubSub. Nova implements rpc.call (request/response, API acts as consumer) and rpc.cast (one way, API acts as publisher).
Nova API and Scheduler uses message queue as Invoker, whereas Network and Compute act as workers. Invoker pattern sends messages via rpc.call or rpc.cast. Worker pattern receives messages from queue and respond back to rpc.call with appropriate response.
Nova uses Kombu library when interfacing with RabbitMQ.

---

# Hooks
Enable developers to extend Nova capabilities by adding named hooks to Nova code as decorator that will lazily load plug-in code matching hook name (using setuptools entrypoints, it's an extension mechanism). The hook's class definition should have pre and post method.
Don't use hooks when stability is a factor, internal APIs may change.

---

# Dev Bootstrap
To get started with contributing... read this (OpenStack Wiki on HowToContribute) in detail.

To get rolling with Nova wheels, system will need to have libvirt and one of the hypervisors (xen/kvm preferred for linux hosts) present.
$ git clone git://github.com/openstack/nova.git
$ cd nova
$ python ./tools/install_venv.py
this will prepare your copy of nova codebase with virtualenv required, now any command you wanna run on this in context of required codebase
$ ./tools/with_venv.sh

---

# Run My Tests
to run the nose tests and pep8 checker, when you are done with virtualenv setup (or that will be initiated first here)... inside 'nova' codebase
$ ./run_tests.sh

---

# Terminology

  • Server: Virtual Machines created inside Compute System, required Flavor & Image detail.
  • Flavor: Represents unique hardware configurations with disk space, memory and CPU time priority
  • Image: System Image File used to create/rebuild a Server
  • Reboot: Soft Server Reboot sends a graceful shutdown signal. Hard Reboot does power reset.
  • Rebuild: Removes all data on Server and replaces it with specified image. Server's IP Address and ID remains same.
  • Resize: Converts existing server to a different flavor. All resize need to be explicitly confirmed, only then the original server is removed. After 24hrs. delay, there is an automated confirmation.


Wednesday, April 17, 2013

Beginner's Guide to OpenStack : Basics [Part 1]

# OpenStack?

OpenStack (http://www.openstack.org/) is an OpenSource cloud computing platform that can be used to build up a Public and Private cloud. As in weaving of various technological components to provide a capability to build a cloud service supporting any use-case and scale.

Once upon a time RackSpace came into Cloud Services. In some parallel beautiful world, few Pythonistas at NASA started building there own Nova Cloud Compute to handle there own instances. RackSpace bought SliceHost which worked 'somewhat' fine. RackSpace came along with their Swift Object Storage Service and weaved in Nova with few more components around it. More other companies like HP, RedHat, Canonical  etc. came along to contribute and benefit from OpenSource cloud.

It's all Open it can be. Open Source. Open Design. Ope Development. Open Community.

---

# Quick Hands-On

DevStack (http://devstack.org/) gives you the easiest fastest way to get all OpenStack components installed, configured and started on any supported O.S. platform.
You can trial-run your app-code in an OpenStack environment at TryStack (http://trystack.org/).
RedHat RDO (http://openstack.redhat.com/Main_Page) is also coming in soon making it super easy to get OpenStack running on RHEL-based distros.
---

# Components?


OpenStack Cloud Platform constitutes of mainly following components:
  • Compute: Nova
    Brings up and maintains operations related to virtual server as per requirement.
    ~like aws ec2
  • Storage: Swift
    Allows you to store, retrieve & remove objects (files).
    ~like aws s3
  • Image Registry/Delivery: Glance
    Processes metadata for disk images, manages read/write/delete for actual image files using  'Swift' or similar scalable file storage service.
    ~like aws ami
  • Network Management: Quantum/Melange
    Provides all the networking mechanisms required in any instance or environment as a service. Handels network interface cards plug/un-plug actions, ip allocation procedures along-with capability enhancement possible to virtual switches.
  • Block Storage: Cinder
    Enables to attach volumes for persistent usage. Detach them, snapshot them.
    ~like aws ebs
  • WebUI: Horizon
    Provides usability improvement for users or projects for managing compute nodes, object storage resources, quota usages and more in a detailed web-app way.
    ~like aws web dashboard
  • Authentication: Keystone
    Identity management system, providing apis to all other OpenStack components to query for authorization.
  • Billing Service: Ceilometer (preview)
    Analyzes quantity, cost-priority and hence billing of all the tasks performed at cloud.
  • Cloud Template: Heat (under construction)
    Build your entire desired cloud setup providing OpenStack a Template for it.
    ~like aws cloudformation
  • OpenStack CommonOSLO (tenure code)
    Supposed to contain all common libraries of shared infrastructure code in OpenStack.

Hypervisors are software/firmware/hardware that enables to create, run and monitor virtual machines. OpenStack Compute supports multiple hypervisors like KVM, LXC, QEMU, XEN, VMWARE & more.

Message Queue Service is used by most of the OpenStack Compute services to communicate with each other using AMQP (Advanced Message Queue Protocol) supporting async calls and callbacks.

---

# Weaving of Components


asciigram: openstack ~ evolution mode, how varied components are connected
~~~~~

~~~~~










~~~~~












~~~~~













~~~~~















---

More Links:


Sunday, February 3, 2013

MessQ : message queue for quickly trying any idea

Past some time while trying up some set-up based on Message Queue at infrastructure... needed a quick to set-up, localhost friendly, network available Message Queue service to try out ideas.
So here is Mess(age)Q(ueue). Something quickly thrown together. Would work later to get it more performance oriented, good to go with smaller projects.

@GitHub:       https://github.com/abhishekkr/messQ
@RubyGems: https://rubygems.org/gems/messQ
_________________________

A Quick Tryout

[+] Install
$ gem install messQ --no-ri --no-rdoc
[+] Start Server (starts at 0.0.0.0 on port#5566)
$ messQ --start
[+] Enqueue user-id & home value to the Queue
$ messQ -enq $USER
$ messQ --enqueue $HOME
[+] Dequeue 2 values from Queue
$ messQ -deq
$ messq --dequeue
[+] Stop Server
$ messQ --stop
_________________________

Via Code

[+] Install
$ gem install messQ --no-ri --no-rdoc
or add following to your Gemfile
gem 'messQ'
require 'messQ'

[+] Start Server
MessQ.host = '127.0.0.1' # default is 0.0.0.0
MessQ.port = 8888 # default is 5566
MessQ.messQ_server

[+] Enqueue user-id & home value to the Queue
MessQ.host = '127.0.0.1' # default is 0.0.0.0
MessQ.port = 8888 # default is 5566
MessQ::Agent.enqueue(ENV['USER'])
MessQ::Agent.enqueue(ENV['HOME'])

[+] Dequeue 2 values from Queue
MessQ.host = '127.0.0.1' # default is 0.0.0.0
MessQ.port = 8888 # default is 5566
puts MessQ::Agent.dequeue
puts MessQ::Agent.dequeue

[+] Stop Server
MessQ::Server.stop

Wednesday, September 19, 2012

ci-go-nfo v0.0.1 : console util for ThoughtWorks' Go CI Server



ci-go-nfo v0.0.1



Just a rubygem console utility to get focussed INFO about your Go Continuous Integration pipeline easily, no more switching again to browsers.

@RubyGems: https://rubygems.org/gems/ci-go-nfo

@GitHubhttps://github.com/abhishekkr/ci-go-nfo


Installation 

$ gem install ci-go-nfo



Usage Ci-Go-Nfo ver.0.0.1 

to set-up credential config for your go-ci
$ ci-go-nfo setup
it asks for
(a.) the location where you want to store your configuration file
(b.) the URL for your Go Server like http://my.go.server:8153
(c.) then username and password (create a read-only a/c for it)



to show go-ci info of all runs
$ ci-go-nfo

to show go-ci info of failed runs
$ ci-go-nfo fail

to show go-ci info of passed runs
$ ci-go-nfo pass

_____

.....more to come


output example:

 $ ci-go-nfo setup
 Store sensitive Go Configs in file {current file: /home/myuser/.go.abril}:

 Enter Base URL of Go Server {like http://:8153}:
                                                           http://my.go.server:8153


 This is better to be ReadOnly account details...

 Enter Log-in UserName: go_user

 Password: restrictedpassword


 $ ci-go-nfo pass
  my_pipeline -> specs -> specs
  Success  for run#2 at 2012-09-19T04:24:38
  details at http://my.go.server:8153/go/tab/build/detail/my_pipeline/10/specs/2/specs

  my_pipeline -> package ->gemify
  Success  for run#1 at 2012-09-19T07:04:39
  details at http://my.go.server:8153/go/tab/build/detail/my_pipeline/10/package/1/gemify

 $ ci-go-nfo fail
  your_pipeline -> smoke -> cukes
  Failure  for run#5 at 2012-09-19T04:24:38
  details at http://my.go.server:8153/go/tab/build/detail/your_pipeline/7/smoke/5/cukes

 $ ci-go-nfo
  my_pipeline -> specs -> specs
  Success  for run#2 at 2012-09-19T04:24:38
  details at http://my.go.server:8153/go/tab/build/detail/my_pipeline/10/specs/2/specs

  my_pipeline -> package ->gemify
  Success  for run#1 at 2012-09-19T07:04:39
  details at http://my.go.server:8153/go/tab/build/detail/my_pipeline/10/package/1/gemify

  your_pipeline -> smoke -> cukes   Failure  for run#5 at 2012-09-19T04:24:38   details at http://my.go.server:8153/go/tab/build/detail/your_pipeline/7/smoke/5/cukes

Sunday, August 5, 2012

Puppet ~ a beginners concept guide (Part 3) ~ Modules much more


you might prefer first reading guide Part#1(intro to puppet), & Part#2(intro to modules)
the section after this Part#4(Where is my Data?) discussing how to handle configuration data

Puppet
beginners concept guide (Part 3)

Modules with More

here some time on the practices to prefer while writing most of your modules

[] HowTo Write Good Puppet Modules  
(so everyone could use it and you could use it everywhere)

  • platform-agnostic
    With change in Operating System distro; module also might require difference in package names, configuration file locations, device port names, system commands and more.
    Obviously, it's not expected to test each and every module against each and every distro and get it full-proof for community usage. But what's expected is to use case $operatingsystem{...} statements for whatever distros you can and let the users get notified in case they gotta add something for their distro by fail(""), and might also contribute back..... like following

    case $operatingsystem {
      centos, redhat: {
        $libxml2_development = 'libxml2-devel'
      }
      ubntu, debian: {
        $libxml2_development = 'libxml2-dev'
      }
      default: {
        fail("Unrecognized libxml2 development header package name for your O.S. $operatingsystem")
      }
    }

    ~
  • untangled puppet strings
    You are writing puppet modules. Good chance is you have a client or personal environment to manage for which you had a go at it.
    That means there gonna be your environment specific some client or personal code &/or configuration that is 'for your eyes only'. This will prohibit you from placing any of your module in Community.
    It's wrong on two main fronts. First, you'll end up using so much from OpenSource and give back nothing. Second, your modules will miss on the community update/comment action.
    So, untangle all your modules into atomic service level modules. Further modularize those modules into service puppet-ization requirement. That will be like sub-modules for install, configure, service and whatever more you can extract out. Now these sub-modules can be clubbed together to and we can move bottom-up gradually.
    Now you can just keep your private service modules to yourself, go ahead and use the community trusted and available modules for whatever you can..... try  making minor updates to those and also contribute these updates. Write the others that you don't find out in the wild and contribute those too for community to use, update and improve.
    ~
  • no data in c~o~d~e
    Now when you are delivering 'configuration as a code', adapt the good coding practices applicable in this domain. One of those is keeping data separate than the code, as in no db-name, db-user-name, db-password, etc. details stored directly in the module's manifest intending to create the db-config file.
    There will be a detailed section later over different external data usage involving separate parameter manifest setting up values when included, extlookup loading values from CSVs, puppetDB, hiera data-store and custom facts file to load up key-values.
    ~
  • puppet-lint
    To keep the modules adhere to dsl-syntactic correct and beautiful code writing practice. So the DSL and the community contributors, both find it easy to understand your manifests. It's suggested to have it added to rake default of your project to check all the manifests, ran before every repo-check-in.
    ~
  • do-undo-redo
    It's suggested to have a undo-manifest ready for all the changes made by a module. It mainly comes in handy for infrastructures/situations where creating and destroying a node is not under your administrative tasks or consumes hell lot of time.
    Obviously, in case getting new node is easier..... that's the way to go instead of wasting time in undo-ing all the changes (and also relying on that).
    Those are just there for the dry-days when there is no 'cloud'.
    ~



[] More about Modules  (moreover.....)
Where to get new:http://forge.puppetlabs.com/ is the community-popular home for most of the Puppet Modules.
Where to contribute:
Can manage your public module at GitHub or similar online free repository like 
puppetlabs-kvm.
Then you can push your module to forge.puppetlabs.com.